date:20230525

Re: [PATCH] RISC-V: Optimize TARGET_XTHEADCONDMOV

2023-05-25 Thread Jeff Law via Gcc-patches





On 5/25/23 20:43, Kito Cheng wrote:

I would defer this to vrull or t-head folks :)
Given the overlap between where this is going and how I think we should 
be handling Zicondops, I'll take it.  It overlaps with work I've had 
Raphael doing recently.


jeff

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-25 Thread Prathamesh Kulkarni via Gcc-patches

On Thu, 25 May 2023 at 15:26, Prathamesh Kulkarni
 wrote:
>
> On Thu, 25 May 2023 at 13:04, Richard Sandiford
>  wrote:
> >
> > LGTM, just a couple of comment tweaks:
> >
> > Prathamesh Kulkarni  writes:
> > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > > index d6fc94015fa..db7ca4c28c3 100644
> > > --- a/gcc/config/aarch64/aarch64.cc
> > > +++ b/gcc/config/aarch64/aarch64.cc
> > > @@ -22332,6 +22332,46 @@ aarch64_unzip_vector_init (machine_mode mode, 
> > > rtx vals, bool even_p)
> > >return gen_rtx_PARALLEL (new_mode, vec);
> > >  }
> > >
> > > +/* Return true if INSN is a scalar move.  */
> >
> > s/INSN/SET/
> >
> > > +
> > > +static bool
> > > +scalar_move_insn_p (rtx set)
> > > +{
> > > +  rtx src = SET_SRC (set);
> > > +  rtx dest = SET_DEST (set);
> > > +  return (is_a (GET_MODE (dest))
> > > +   && aarch64_mov_operand (src, GET_MODE (dest)));
> > > +}
> > > +
> > > +/* Similar to seq_cost, but ignore cost for scalar moves.  This function
> > > +   is called from aarch64_expand_vector_init.  */
> >
> > Probably best to drop the second sentence.
> >
> > OK with those changes, thanks (no need to retest).
> Thanks, committed as ea9154dbc8fc86d4c617503ca5e6f02fed3a6a56.
Hi Richard,
The s32 case for single constant patch doesn't regress now after the
above commit.
Bootstrapped+tested on aarch64-linux-gnu, and verified that the new
tests pass for aarch64_be-linux-gnu.
Is it OK to commit ?

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Richard
[aarch64] Improve code-gen for vector initialization with single constant 
element.

gcc/ChangeLog:
* config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak condition
if (n_var == n_elts && n_elts <= 16) to allow a single constant,
and if maxv == 1, use constant element for duplicating into register.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vec-init-single-const.c: New test.
* gcc.target/aarch64/vec-init-single-const-be.c: Likewise.
* gcc.target/aarch64/vec-init-single-const-2.c: Likewise.

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5b046d32b37..30d6e3e8d83 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22192,7 +22192,7 @@ aarch64_expand_vector_init_fallback (rtx target, rtx 
vals)
  and matches[X][1] with the count of duplicate elements (if X is the
  earliest element which has duplicates).  */
 
-  if (n_var == n_elts && n_elts <= 16)
+  if (n_var >= n_elts - 1 && n_elts <= 16)
 {
   int matches[16][2] = {0};
   for (int i = 0; i < n_elts; i++)
@@ -22209,12 +22209,23 @@ aarch64_expand_vector_init_fallback (rtx target, rtx 
vals)
}
   int maxelement = 0;
   int maxv = 0;
+  rtx const_elem = NULL_RTX;
+  int const_elem_pos = 0;
+
   for (int i = 0; i < n_elts; i++)
-   if (matches[i][1] > maxv)
- {
-   maxelement = i;
-   maxv = matches[i][1];
- }
+   {
+ if (matches[i][1] > maxv)
+   {
+ maxelement = i;
+ maxv = matches[i][1];
+   }
+ if (CONST_INT_P (XVECEXP (vals, 0, i))
+ || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
+   {
+ const_elem_pos = i;
+ const_elem = XVECEXP (vals, 0, i);
+   }
+   }
 
   /* Create a duplicate of the most common element, unless all elements
 are equally useless to us, in which case just immediately set the
@@ -22252,8 +22263,19 @@ aarch64_expand_vector_init_fallback (rtx target, rtx 
vals)
 vector register.  For big-endian we want that position to hold
 the last element of VALS.  */
  maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
- rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
- aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
+
+ /* If we have a single constant element, use that for duplicating
+instead.  */
+ if (const_elem)
+   {
+ maxelement = const_elem_pos;
+ aarch64_emit_move (target, gen_vec_duplicate (mode, const_elem));
+   }
+ else
+   {
+ rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+ aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
+   }
}
   else
{
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-single-const-2.c 
b/gcc/testsuite/gcc.target/aarch64/vec-init-single-const-2.c
new file mode 100644
index 000..f4dcab429c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-single-const-2.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include 
+
+/* In case where there are no duplicate elements in vector initializer,
+   check that the constant is used for duplication.  */
+
+int8x16_t f_s8(int8_t a0, int8_t a1, int8_t a2, int8_t a3, int8_t a4,

[PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization

2023-05-25 Thread juzhe . zhong

From: Juzhe-Zhong 

Fix bug reported here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109974

PR target/109974

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (source_equal_p): Fix ICE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr109974.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 30 ++-
 .../gcc.target/riscv/rvv/vsetvl/pr109974.c| 17 +++
 2 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 9847d649d1d..fe55f4ccd30 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1138,7 +1138,35 @@ source_equal_p (insn_info *insn1, insn_info *insn2)
 return false;
   if (!rtx_equal_p (SET_SRC (single_set1), SET_SRC (single_set2)))
 return false;
-  gcc_assert (insn1->uses ().size () == insn2->uses ().size ());
+  /* RTL_SSA uses include REG_NOTE. Consider this following case:
+
+ insn1 RTL:
+   (insn 41 39 42 4 (set (reg:DI 26 s10 [orig:159 loop_len_46 ] [159])
+ (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201])
+   (reg:DI 14 a4 [276]))) 408 {*umindi3}
+   (expr_list:REG_EQUAL (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201])
+   (const_int 2 [0x2]))
+   (nil)))
+ The RTL_SSA uses of this instruction has 2 uses:
+   1. (reg:DI 15 a5 [orig:201 _149 ] [201]) - twice.
+   2. (reg:DI 14 a4 [276]) - once.
+
+ insn2 RTL:
+   (insn 38 353 351 4 (set (reg:DI 27 s11 [orig:160 loop_len_47 ] [160])
+ (umin:DI (reg:DI 15 a5 [orig:199 _146 ] [199])
+   (reg:DI 14 a4 [276]))) 408 {*umindi3}
+   (expr_list:REG_EQUAL (umin:DI (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200])
+   (const_int 2 [0x2]))
+   (nil)))
+  The RTL_SSA uses of this instruction has 3 uses:
+   1. (reg:DI 15 a5 [orig:199 _146 ] [199]) - once
+   2. (reg:DI 14 a4 [276]) - once
+   3. (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200]) - once
+
+  Return false when insn1->uses ().size () != insn2->uses ().size ()
+  */
+  if (insn1->uses ().size () != insn2->uses ().size ())
+return false;
   for (size_t i = 0; i < insn1->uses ().size (); i++)
 if (insn1->uses ()[i] != insn2->uses ()[i])
   return false;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c
new file mode 100644
index 000..06a8562ebab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv_zbb -mabi=ilp32d --param 
riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include 
+
+void
+func (int8_t *__restrict x, int64_t *__restrict y, int n)
+{
+  for (int i = 0, j = 0; i < n; i++, j +=2 )
+  {
+x[i + 0] += 1;
+y[j + 0] += 1;
+y[j + 1] += 2;
+  }
+}
+
+/* { dg-final { scan-assembler {vsetvli} { target { no-opts "-O0" no-opts 
"-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } 
*/
-- 
2.36.3

[PATCH] genmatch: Emit debug message right before "return x" instead of earlier

2023-05-25 Thread Andrew Pinski via Gcc-patches

This is based on the review of 
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619342.html .
Instead of emitting debug message even if we don't apply a pattern, this fixes 
the issue
by only emitting it if it the pattern finally succeeded.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* genmatch.cc (emit_debug_printf): New function.
(dt_simplify::gen_1): Emit printf into the code
before the `return true` or returning the folded result
instead of emitting it always.
---
 gcc/genmatch.cc | 33 ++---
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 177c13d87cb..bd6ce3a28f8 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -3359,6 +3359,21 @@ dt_operand::gen (FILE *f, int indent, bool gimple, int 
depth)
 }
 }
 
+/* Emit a fprintf to the debug file to the file F, with the INDENT from
+   either the RESULT location or the S's match location if RESULT is null. */
+static void
+emit_debug_printf (FILE *f, int indent, class simplify *s, operand *result)
+{
+  fprintf_indent (f, indent, "if (UNLIKELY (debug_dump)) "
+  "fprintf (dump_file, \"%s ",
+  s->kind == simplify::SIMPLIFY
+  ? "Applying pattern" : "Matching expression");
+  fprintf (f, "%%s:%%d, %%s:%%d\\n\", ");
+  output_line_directive (f,
+result ? result->location : s->match->location, true,
+true);
+  fprintf (f, ", __FILE__, __LINE__);\n");
+}
 
 /* Generate code for the '(if ...)', '(with ..)' and actual transform
step of a '(simplify ...)' or '(match ...)'.  This handles everything
@@ -3488,21 +3503,12 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   needs_label = true;
 }
 
-  fprintf_indent (f, indent, "if (UNLIKELY (debug_dump)) "
-  "fprintf (dump_file, \"%s ",
-  s->kind == simplify::SIMPLIFY
-  ? "Applying pattern" : "Matching expression");
-  fprintf (f, "%%s:%%d, %%s:%%d\\n\", ");
-  output_line_directive (f,
-result ? result->location : s->match->location, true,
-true);
-  fprintf (f, ", __FILE__, __LINE__);\n");
-
   fprintf_indent (f, indent, "{\n");
   indent += 2;
   if (!result)
 {
   /* If there is no result then this is a predicate implementation.  */
+  emit_debug_printf (f, indent, s, result);
   fprintf_indent (f, indent, "return true;\n");
 }
   else if (gimple)
@@ -3593,6 +3599,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
}
   else
gcc_unreachable ();
+  emit_debug_printf (f, indent, s, result);
   fprintf_indent (f, indent, "return true;\n");
 }
   else /* GENERIC */
@@ -3646,7 +3653,10 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
, indexes);
}
  if (is_predicate)
-   fprintf_indent (f, indent, "return true;\n");
+   {
+ emit_debug_printf (f, indent, s, result);
+ fprintf_indent (f, indent, "return true;\n");
+   }
  else
{
  fprintf_indent (f, indent, "tree _r;\n");
@@ -3712,6 +3722,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
  i);
}
}
+ emit_debug_printf (f, indent, s, result);
  fprintf_indent (f, indent, "return _r;\n");
}
 }
-- 
2.31.1

[PATCHv3, rs6000] Splat vector small V2DI constants with ISA 2.07 instructions [PR104124]

2023-05-25 Thread HAO CHEN GUI via Gcc-patches

Hi,
  This patch adds a new insn for vector splat with small V2DI constants on P8.
If the value of constant is in RANGE (-16, 15) and not 0 or -1, it can be loaded
with vspltisw and vupkhsw on P8. It should be efficient than loading vector from
memory.

  Compared to last version, the main change is to set a default value for third
parameter of vspltisw_vupkhsw_constant_p and call the function with 2 arguments
when the third one doesn't matter.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.

Thanks
Gui Haochen

ChangeLog
2023-05-26  Haochen Gui 

gcc/
PR target/104124
* config/rs6000/altivec.md (*altivec_vupkhs_direct): Rename
to...
(altivec_vupkhs_direct): ...this.
* config/rs6000/constraints.md (wT constraint): New constant for a
vector constraint that can be loaded with vspltisw and vupkhsw.
* config/rs6000/predicates.md (vspltisw_vupkhsw_constant_split): New
predicate for wT constraint.
(easy_vector_constant): Call vspltisw_vupkhsw_constant_p to Check if
a vector constant can be synthesized with a vspltisw and a vupkhsw.
* config/rs6000/rs6000-protos.h (vspltisw_vupkhsw_constant_p): Declare.
* config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p): Call
* (vspltisw_vupkhsw_constant_p): New function to return true if OP
mode is V2DI and can be synthesized with vupkhsw and vspltisw.
* config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up
constants with vspltisw and vupkhsw.

gcc/testsuite/
PR target/104124
* gcc.target/powerpc/pr104124.c: New.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 49b0c964f4d..2c932854c33 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2542,7 +2542,7 @@ (define_insn "altivec_vupkhs"
 }
   [(set_attr "type" "vecperm")])

-(define_insn "*altivec_vupkhs_direct"
+(define_insn "altivec_vupkhs_direct"
   [(set (match_operand:VP 0 "register_operand" "=v")
(unspec:VP [(match_operand: 1 "register_operand" "v")]
 UNSPEC_VUNPACK_HI_SIGN_DIRECT))]
diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c4a6ccf4efb..e7f185660c0 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -144,6 +144,10 @@ (define_constraint "wS"
   "@internal Vector constant that can be loaded with XXSPLTIB & sign 
extension."
   (match_test "xxspltib_constant_split (op, mode)"))

+(define_constraint "wT"
+  "@internal Vector constant that can be loaded with vspltisw & vupkhsw."
+  (match_test "vspltisw_vupkhsw_constant_split (op, mode)"))
+
 ;; ISA 3.0 DS-form instruction that has the bottom 2 bits 0 and no update form.
 ;; Used by LXSD/STXSD/LXSSP/STXSSP.  In contrast to "Y", the multiple-of-four
 ;; offset is enforced for 32-bit too.
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 52c65534e51..1ed770bffa6 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -694,6 +694,14 @@ (define_predicate "xxspltib_constant_split"
   return num_insns > 1;
 })

+;; Return true if the operand is a constant that can be loaded with a vspltisw
+;; instruction and then a vupkhsw instruction.
+
+(define_predicate "vspltisw_vupkhsw_constant_split"
+  (match_code "const_vector")
+{
+  return vspltisw_vupkhsw_constant_p (op, mode);
+})

 ;; Return 1 if the operand is constant that can loaded directly with a XXSPLTIB
 ;; instruction.
@@ -742,6 +750,11 @@ (define_predicate "easy_vector_constant"
   && xxspltib_constant_p (op, mode, _insns, ))
return true;

+  /* V2DI constant within RANGE (-16, 15) can be synthesized with a
+vspltisw and a vupkhsw.  */
+  if (vspltisw_vupkhsw_constant_p (op, mode, ))
+   return true;
+
   return easy_altivec_constant (op, mode);
 }

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 1a4fc1df668..00cb2d82953 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, 
rtx, int, int, int,

 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern bool vspltisw_vupkhsw_constant_p (rtx, machine_mode, int * = nullptr);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 3be5860dd9b..ae34a02b282 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -6638,6 +6638,36 @@ xxspltib_constant_p (rtx op,
   return true;
 }

+/* Return true if OP mode is V2DI and can be synthesized with ISA 2.07
+   instructions vupkhsw and vspltisw.
+
+

[PATCH] Disable avoid_false_dep_for_bmi for atom and icelake(and later) core processors.

2023-05-25 Thread liuhongt via Gcc-patches

lzcnt/tzcnt has been fixed since skylake, popcnt has been fixed since
icelake. At least for icelake and later intel Core processors, the
errata tune is not needed. And the tune isn't need for ATOM either.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready to push to trunk.


gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI):
Remove ATOM and ICELAKER(and later) core processors.
---
 gcc/config/i386/x86-tune.def | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 9d603cc84e4..e1c72cddf1f 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -335,7 +335,8 @@ DEF_TUNE (X86_TUNE_USE_BT, "use_bt",
 /* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency
for bit-manipulation instructions.  */
 DEF_TUNE (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI, "avoid_false_dep_for_bmi",
- m_SANDYBRIDGE | m_CORE_AVX2 | m_TREMONT | m_ALDERLAKE | m_CORE_ATOM
+ m_SANDYBRIDGE | m_HASWELL | m_SKYLAKE | m_SKYLAKE_AVX512
+ | m_CANNONLAKE | m_CASCADELAKE | m_COOPERLAKE
  | m_LUJIAZUI | m_GENERIC)
 
 /* X86_TUNE_ADJUST_UNROLL: This enables adjusting the unroll factor based
-- 
2.39.1.388.g2fc9e9ca3c

Re: [PATCH] RISC-V: Optimize TARGET_XTHEADCONDMOV

2023-05-25 Thread Kito Cheng via Gcc-patches

I would defer this to vrull or t-head folks :)

Die Li  於 2023年5月26日 週五 08:53 寫道：

> This patch allows less instructions to be used when TARGET_XTHEADCONDMOV
> is enabled.
>
> Provide an example from the existing testcases.
>
> Testcase:
> int ConEmv_imm_imm_reg(int x, int y){
>   if (x == 1000) return 10;
>   return y;
> }
>
> Cflags:
> -O2 -march=rv64gc_xtheadcondmov -mabi=lp64d
>
> before patch:
> ConEmv_imm_imm_reg:
> addia5,a0,-1000
> li  a0,10
> th.mvneza0,zero,a5
> th.mveqza1,zero,a5
> or  a0,a0,a1
> ret
>
> after patch:
> ConEmv_imm_imm_reg:
> addia5,a0,-1000
> li  a0,10
> th.mvneza0,a1,a5
> ret
>
> Signed-off-by: Die Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_expand_conditional_move_onesided):
> Delete.
> (riscv_expand_conditional_move):  Reuse the TARGET_SFB_ALU expand
> process for TARGET_XTHEADCONDMOV
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Update the
> output.
> * gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Likewise.
> ---
>  gcc/config/riscv/riscv.cc | 44 +++--
>  .../riscv/xtheadcondmov-indirect-rv32.c   | 48 +++
>  .../riscv/xtheadcondmov-indirect-rv64.c   | 48 +++
>  3 files changed, 42 insertions(+), 98 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 09fc9e5d95e..8b8ac9181ba 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -3442,37 +3442,6 @@ riscv_expand_conditional_branch (rtx label,
> rtx_code code, rtx op0, rtx op1)
>emit_jump_insn (gen_condjump (condition, label));
>  }
>
> -/* Helper to emit two one-sided conditional moves for the movecc.  */
> -
> -static void
> -riscv_expand_conditional_move_onesided (rtx dest, rtx cons, rtx alt,
> -   rtx_code code, rtx op0, rtx op1)
> -{
> -  machine_mode mode = GET_MODE (dest);
> -
> -  gcc_assert (GET_MODE_CLASS (mode) == MODE_INT);
> -  gcc_assert (reg_or_0_operand (cons, mode));
> -  gcc_assert (reg_or_0_operand (alt, mode));
> -
> -  riscv_emit_int_compare (, , , true);
> -  rtx cond = gen_rtx_fmt_ee (code, mode, op0, op1);
> -
> -  rtx tmp1 = gen_reg_rtx (mode);
> -  rtx tmp2 = gen_reg_rtx (mode);
> -
> -  emit_insn (gen_rtx_SET (tmp1, gen_rtx_IF_THEN_ELSE (mode, cond,
> - cons, const0_rtx)));
> -
> -  /* We need to expand a sequence for both blocks and we do that such,
> - that the second conditional move will use the inverted condition.
> - We use temporaries that are or'd to the dest register.  */
> -  cond = gen_rtx_fmt_ee ((code == EQ) ? NE : EQ, mode, op0, op1);
> -  emit_insn (gen_rtx_SET (tmp2, gen_rtx_IF_THEN_ELSE (mode, cond,
> - alt, const0_rtx)));
> -
> -  emit_insn (gen_rtx_SET (dest, gen_rtx_IOR (mode, tmp1, tmp2)));
> - }
> -
>  /* Emit a cond move: If OP holds, move CONS to DEST; else move ALT to
> DEST.
> Return 0 if expansion failed.  */
>
> @@ -3483,6 +3452,7 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx
> cons, rtx alt)
>rtx_code code = GET_CODE (op);
>rtx op0 = XEXP (op, 0);
>rtx op1 = XEXP (op, 1);
> +  bool need_eq_ne_p = false;
>
>if (TARGET_XTHEADCONDMOV
>&& GET_MODE_CLASS (mode) == MODE_INT
> @@ -3492,14 +3462,12 @@ riscv_expand_conditional_move (rtx dest, rtx op,
> rtx cons, rtx alt)
>&& GET_MODE (op0) == mode
>&& GET_MODE (op1) == mode
>&& (code == EQ || code == NE))
> +need_eq_ne_p = true;
> +
> +  if (need_eq_ne_p || (TARGET_SFB_ALU
> +  && GET_MODE (op0) == word_mode))
>  {
> -  riscv_expand_conditional_move_onesided (dest, cons, alt, code, op0,
> op1);
> -  return true;
> -}
> -  else if (TARGET_SFB_ALU
> -  && GET_MODE (op0) == word_mode)
> -{
> -  riscv_emit_int_compare (, , );
> +  riscv_emit_int_compare (, , , need_eq_ne_p);
>rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
>
>/* The expander allows (const_int 0) for CONS for the benefit of
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
> b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
> index 9afdc2eabfd..e2b135f3d00 100644
> --- a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
> @@ -1,15 +1,13 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -march=rv32gc_xtheadcondmov -mabi=ilp32
> -mriscv-attribute" } */
> -/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" } } */
> +/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-O3" "-Oz"
> "-flto"} } */
>  /* { dg-final { check-function-bodies "**" ""  } } */
>
>  /*
>  **ConEmv_imm_imm_reg:
>  ** addia5,a0,-1000
>  **

RE: [COMMITTED] i386: Use 2x-wider modes when emulating QImode vector instructions

2023-05-25 Thread Jiang, Haochen via Gcc-patches

> gcc/ChangeLog:
> 
> * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
> Rewrite to expand to 2x-wider (e.g. V16QI -> V16HImode)
> instructions when available.  Emulate truncation via
> ix86_expand_vec_perm_const_1 when native truncate insn
> is not available.
> (ix86_expand_vecop_qihi_partial) : Use pmovzx
> when available.  Trivially rename some variables.
> (ix86_expand_vecop_qihi): Unconditionally call ix86_expand_vecop_qihi2.

Hi Uros,

I suppose you pushed wrong patch to trunk.

On trunk, we see this:

@@ -23409,9 +23457,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
   && ix86_expand_vec_shift_qihi_constant (code, dest, op1, op2))
 return;

-  if (TARGET_AVX512BW
-  && VECTOR_MODE_P (GET_MODE (op2))
-  && ix86_expand_vecop_qihi2 (code, dest, op1, op2))
+  if (0 && ix86_expand_vecop_qihi2 (code, dest, op1, op2))
 return;

   switch (qimode)

It should not be if (0 && ix86_expand_vecop_qihi2 (code, dest, op1, op2))

The patch in this thread is correct, where is:

@@ -23409,9 +23457,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
   && ix86_expand_vec_shift_qihi_constant (code, dest, op1, op2))
 return;
 
-  if (TARGET_AVX512BW
-  && VECTOR_MODE_P (GET_MODE (op2))
-  && ix86_expand_vecop_qihi2 (code, dest, op1, op2))
+  if (ix86_expand_vecop_qihi2 (code, dest, op1, op2))
 return;
 
   switch (qimode)

Thx,
Haochen

> * config/i386/i386.cc (ix86_multiplication_cost): Rewrite cost
> calculation of V*QImode emulations to account for generation of
> 2x-wider mode instructions.
> (ix86_shift_rotate_cost): Update cost calculation of V*QImode
> emulations to account for generation of 2x-wider mode instructions.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/i386/avx512vl-pr95488-1.c: Revert 2023-05-18 change.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> Uros.

[PATCH] [x86] Split notl + pbraodcast + pand to pbroadcast + pandn more modes.

2023-05-25 Thread liuhongt via Gcc-patches

r12-5595-gc39d77f252e895306ef88c1efb3eff04e4232554 adds 2 splitter to
transform notl + pbroadcast + pand to pbroadcast + pandn for
VI124_AVX2 which leaves out all DI-element-size ones as
well as all 512-bit ones.
This patch extend the splitter to VI_AVX2 which will handle DImode for
AVX2, and V64QImode,V32HImode,V16SImode,V8DImode for AVX512.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready to push to trunk.

gcc/ChangeLog:

PR target/100711
* config/i386/sse.md (*andnot3): Extend below splitter
to VI_AVX2 to cover more modes.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr100711-2.c: Add v4di/v2di testcases.
* gcc.target/i386/pr100711-3.c: New test.
---
 gcc/config/i386/sse.md | 12 +++
 gcc/testsuite/gcc.target/i386/pr100711-2.c | 14 +++-
 gcc/testsuite/gcc.target/i386/pr100711-3.c | 40 ++
 3 files changed, 59 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100711-3.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 26dd0b1aa10..97f883d8083 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17116,17 +17116,17 @@ (define_split
 
 ;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn
 (define_split
-  [(set (match_operand:VI124_AVX2 0 "register_operand")
-   (and:VI124_AVX2
- (vec_duplicate:VI124_AVX2
+  [(set (match_operand:VI_AVX2 0 "register_operand")
+   (and:VI_AVX2
+ (vec_duplicate:VI_AVX2
(not:
  (match_operand: 1 "register_operand")))
- (match_operand:VI124_AVX2 2 "vector_operand")))]
+ (match_operand:VI_AVX2 2 "vector_operand")))]
   "TARGET_AVX2"
   [(set (match_dup 3)
-   (vec_duplicate:VI124_AVX2 (match_dup 1)))
+   (vec_duplicate:VI_AVX2 (match_dup 1)))
(set (match_dup 0)
-   (and:VI124_AVX2 (not:VI124_AVX2 (match_dup 3))
+   (and:VI_AVX2 (not:VI_AVX2 (match_dup 3))
(match_dup 2)))]
   "operands[3] = gen_reg_rtx (mode);")
 
diff --git a/gcc/testsuite/gcc.target/i386/pr100711-2.c 
b/gcc/testsuite/gcc.target/i386/pr100711-2.c
index ccaf1688e19..f75914fb7fc 100644
--- a/gcc/testsuite/gcc.target/i386/pr100711-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr100711-2.c
@@ -4,10 +4,12 @@
 typedef char v16qi __attribute__ ((vector_size (16)));
 typedef short v8hi __attribute__ ((vector_size (16)));
 typedef int v4si __attribute__ ((vector_size (16)));
+typedef long long v2di __attribute__((vector_size (16)));
 
 typedef char v32qi __attribute__ ((vector_size (32)));
 typedef short v16hi __attribute__ ((vector_size (32)));
 typedef int v8si __attribute__ ((vector_size (32)));
+typedef long long v4di __attribute__((vector_size (32)));
 
 v16qi foo_v16qi (char a, v16qi b)
 {
@@ -25,6 +27,11 @@ v4si foo_v4si (int a, v4si b)
 return (__extension__ (v4si) {~a, ~a, ~a, ~a}) & b;
 }
 
+v2di foo_v2di (long long a, v2di b)
+{
+return (__extension__ (v2di) {~a, ~a}) & b;
+}
+
 v32qi foo_v32qi (char a, v32qi b)
 {
 return (__extension__ (v32qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
@@ -44,4 +51,9 @@ v8si foo_v8si (int a, v8si b)
 return (__extension__ (v8si) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,}) & b;
 }
 
-/* { dg-final { scan-assembler-times "vpandn" 6 } } */
+v4di foo_v4di (long long a, v4di b)
+{
+return (__extension__ (v4di) {~a, ~a, ~a, ~a}) & b;
+}
+
+/* { dg-final { scan-assembler-times "vpandn" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100711-3.c 
b/gcc/testsuite/gcc.target/i386/pr100711-3.c
new file mode 100644
index 000..e90f2a48d8d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100711-3.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512bw" } */
+
+typedef char v64qi __attribute__ ((vector_size (64)));
+typedef short v32hi __attribute__ ((vector_size (64)));
+typedef int v16si __attribute__ ((vector_size (64)));
+typedef long long v8di __attribute__((vector_size (64)));
+
+v64qi foo_v64qi (char a, v64qi b)
+{
+return (__extension__ (v64qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+  ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+  ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+  ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+  ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) & b;
+}
+
+v32hi foo_v32hi (short a, v32hi b)
+{
+return (__extension__ (v32hi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+  ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) & b;
+}
+
+v16si foo_v16si (int a, v16si b)
+{
+return (__extension__ (v16si) {~a, ~a, ~a,

RE: [PATCH] RISC-V: Fix zero-scratch-regs-3.c fail

2023-05-25 Thread Li, Pan2 via Gcc-patches

Committed the PATCH v2, thanks Kito.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Friday, May 26, 2023 7:48 AM
To: 钟居哲 
Cc: GCC Patches ; Jeff Law ; 
Jeff Law ; Kito Cheng ; Palmer 
Dabbelt 
Subject: Re: [PATCH] RISC-V: Fix zero-scratch-regs-3.c fail

Lgtm with a minor comment

 於 2023年5月26日 週五 07:18 寫道：

> From: Juzhe-Zhong 
>
> Fix ICE of zero-scratch-regs-3.c:
> bug.c:7:1: internal compiler error: Segmentation fault
> 7 | }
>   | ^
> 0x1647b23 crash_signal
> ../../../riscv-gcc/gcc/toplev.cc:314
> 0x147053f maybe_legitimize_operand
> ../../../riscv-gcc/gcc/optabs.cc:7947
> 0x1470dc2 maybe_legitimize_operands(insn_code, unsigned int, unsigned 
> int,
> expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8084
> 0x1470e66 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8103
> 0x147146a maybe_expand_insn(insn_code, unsigned int, expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8158
> 0x14714fe expand_insn(insn_code, unsigned int, expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8189
> 0x1c20634 riscv_vector::insn_expander<11>::expand(insn_code, bool)
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:210
> 0x1c20075 riscv_vector::insn_expander<11>::emit_insn(insn_code, rtx_def**)
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:199
> 0x1c16bd1 riscv_vector::emit_vlmax_insn(unsigned int, int, rtx_def**,
> rtx_def*)
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:362
> 0x1ad5bb9 vector_zero_call_used_regs
> ../../../riscv-gcc/gcc/config/riscv/riscv.cc:7400
> 0x1ad5c25 riscv_zero_call_used_regs(HARD_REG_SET)
> ../../../riscv-gcc/gcc/config/riscv/riscv.cc:7420
> 0x115c910 gen_call_used_regs_seq
> ../../../riscv-gcc/gcc/function.cc:5924
> 0x115df81 execute
> ../../../riscv-gcc/gcc/function.cc:6718
>
> ICE happens since we didn't pass explicit VL operand when we can't use 
> gen_reg_rtx to generate VL operand. This will make operands num 
> mismatch.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (vector_zero_call_used_regs): Add 
> explicit VL operand.
>
> ---
>  gcc/config/riscv/riscv.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc 
> index 09fc9e5d95e..9e41200371d 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -7398,7 +7398,7 @@ vector_zero_call_used_regs (HARD_REG_SET
> need_zeroed_hardregs)
>
>   rtx ops[] = {target, CONST0_RTX (mode), vl};
>

Drop vl from here.

  riscv_vector::emit_vlmax_insn (code_for_pred_mov (mode),
> -riscv_vector::RVV_UNOP, ops);
> +riscv_vector::RVV_UNOP, ops, 
> + vl);
>
>   SET_HARD_REG_BIT (zeroed_hardregs, regno);
> }
> --
> 2.36.3
>
>

Re: [PATCH] RISC-V: Add autovec sign/zero extension and truncation.

2023-05-25 Thread juzhe.zh...@rivai.ai

I realize that both TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES and 
TARGET_VECTORIZE_RELATED_MODE
will partially enable some auto-vectorization even preferred_simd_mode does not 
enable auto-vectorization
when we don't specify --param=riscv-autovec-preference.

So plz add autovec_use_vlmax_p
into both these target hook implementation.

+opt_machine_mode
+vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode,
+ poly_uint64 nunits)
+{
+  /* TODO: We will support RVV VLS auto-vectorization mode in the future. */
+  poly_uint64 min_units;
+  if (riscv_v_ext_mode_p (vector_mode)
+  && multiple_p (BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul),
+  GET_MODE_SIZE (element_mode), _units))

Change it into:

+opt_machine_mode
+vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode,
+ poly_uint64 nunits)
+{
+  /* TODO: We will support RVV VLS auto-vectorization mode in the future. */
+  poly_uint64 min_units;
+  if (riscv_v_ext_vector_mode_p (vector_mode) &&  autovec_use_vlmax_p ()
+  && multiple_p (BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul),
+  GET_MODE_SIZE (element_mode), _units))


And

+unsigned int
+autovectorize_vector_modes (vector_modes *modes, bool)
+{
+  if (TARGET_VECTOR)
+{

You don't need TAREGET_VECTOR since you already gate it in :

+/* Implement TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES.  */
+unsigned int
+riscv_autovectorize_vector_modes (vector_modes *modes, bool all)
+{
+  if (TARGET_VECTOR)
+return riscv_vector::autovectorize_vector_modes (modes, all);
+
+  return default_autovectorize_vector_modes (modes, all);
+}

so plz change it into :

+unsigned int
+autovectorize_vector_modes (vector_modes *modes, bool)
+{
+  if (autovec_use_vlmax_p ())
+{

Doing this just like in riscv_vector::preferred_simd_modes

Others let Kito chime in more comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-25 17:03
To: gcc-patches; Kito Cheng; palmer; juzhe.zh...@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Add autovec sign/zero extension and truncation.
Hi,
 
this patch implements the autovec expanders for sign and zero extension
patterns as well as the accompanying truncations.  In order to use them
additional mode_attr iterators as well as vectorizer hooks are required.
Using these hooks we can e.g. vectorize with VNx4QImode as base mode
and extend VNx4SI to VNx4DI.  They are still going to be expanded in the
future.
 
vf4 and vf8 truncations are emulated by truncating two and three times
respectively.
 
The patch also adds tests and changes some expectations for already
existing ones.
 
Combine does not yet handle binary operations of two widened operands
as we are missing the necessary split/rewrite patterns.  These will be
added at a later time.
 
Co-authored-by: Juzhe Zhong 
 
riscv.exp testsuite is unchanged.  zero-scratch-regs-3.c seems
to FAIL in vcondu but that already happens on trunk.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2): New
expander.
(2): Dito.
(2): Dito.
(trunc2): Dito.
(trunc2): Dito.
(trunc2): Dito.
* config/riscv/riscv-protos.h (riscv_v_ext_mode_p): Declare.
(vectorize_related_mode): Define.
(autovectorize_vector_modes): Define.
* config/riscv/riscv-v.cc (vectorize_related_mode): Implement
hook.
(autovectorize_vector_modes): Implement hook.
* config/riscv/riscv.cc (riscv_v_ext_tuple_mode_p): Export.
(riscv_autovectorize_vector_modes): Implement target hook.
(riscv_vectorize_related_mode): Implement target hook.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
(TARGET_VECTORIZE_RELATED_MODE): Define.
* config/riscv/vector-iterators.md: Add lowercase versions of
mode_attr iterators.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Adjust
expectation.
* gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: Dito.
* gcc.target/riscv/rvv/rvv.exp: Add new conversion tests.
* gcc.target/riscv/rvv/vsetvl/avl_single-38.c: Do not vectorize.
* gcc.target/riscv/rvv/vsetvl/avl_single-47.c: Dito.
* gcc.target/riscv/rvv/vsetvl/avl_single-48.c: Dito.
* gcc.target/riscv/rvv/vsetvl/avl_single-49.c: Dito.
* gcc.target/riscv/rvv/vsetvl/imm_switch-8.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: New test.
*

[PATCH] RISC-V: Optimize TARGET_XTHEADCONDMOV

2023-05-25 Thread Die Li

This patch allows less instructions to be used when TARGET_XTHEADCONDMOV is 
enabled.

Provide an example from the existing testcases.

Testcase:
int ConEmv_imm_imm_reg(int x, int y){
  if (x == 1000) return 10;
  return y;
}

Cflags:
-O2 -march=rv64gc_xtheadcondmov -mabi=lp64d

before patch:
ConEmv_imm_imm_reg:
addia5,a0,-1000
li  a0,10
th.mvneza0,zero,a5
th.mveqza1,zero,a5
or  a0,a0,a1
ret

after patch:
ConEmv_imm_imm_reg:
addia5,a0,-1000
li  a0,10
th.mvneza0,a1,a5
ret

Signed-off-by: Die Li 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_conditional_move_onesided): 
Delete.
(riscv_expand_conditional_move):  Reuse the TARGET_SFB_ALU expand 
process for TARGET_XTHEADCONDMOV

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Update the output.
* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Likewise.
---
 gcc/config/riscv/riscv.cc | 44 +++--
 .../riscv/xtheadcondmov-indirect-rv32.c   | 48 +++
 .../riscv/xtheadcondmov-indirect-rv64.c   | 48 +++
 3 files changed, 42 insertions(+), 98 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 09fc9e5d95e..8b8ac9181ba 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3442,37 +3442,6 @@ riscv_expand_conditional_branch (rtx label, rtx_code 
code, rtx op0, rtx op1)
   emit_jump_insn (gen_condjump (condition, label));
 }
 
-/* Helper to emit two one-sided conditional moves for the movecc.  */
-
-static void
-riscv_expand_conditional_move_onesided (rtx dest, rtx cons, rtx alt,
-   rtx_code code, rtx op0, rtx op1)
-{
-  machine_mode mode = GET_MODE (dest);
-
-  gcc_assert (GET_MODE_CLASS (mode) == MODE_INT);
-  gcc_assert (reg_or_0_operand (cons, mode));
-  gcc_assert (reg_or_0_operand (alt, mode));
-
-  riscv_emit_int_compare (, , , true);
-  rtx cond = gen_rtx_fmt_ee (code, mode, op0, op1);
-
-  rtx tmp1 = gen_reg_rtx (mode);
-  rtx tmp2 = gen_reg_rtx (mode);
-
-  emit_insn (gen_rtx_SET (tmp1, gen_rtx_IF_THEN_ELSE (mode, cond,
- cons, const0_rtx)));
-
-  /* We need to expand a sequence for both blocks and we do that such,
- that the second conditional move will use the inverted condition.
- We use temporaries that are or'd to the dest register.  */
-  cond = gen_rtx_fmt_ee ((code == EQ) ? NE : EQ, mode, op0, op1);
-  emit_insn (gen_rtx_SET (tmp2, gen_rtx_IF_THEN_ELSE (mode, cond,
- alt, const0_rtx)));
-
-  emit_insn (gen_rtx_SET (dest, gen_rtx_IOR (mode, tmp1, tmp2)));
- }
-
 /* Emit a cond move: If OP holds, move CONS to DEST; else move ALT to DEST.
Return 0 if expansion failed.  */
 
@@ -3483,6 +3452,7 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
   rtx_code code = GET_CODE (op);
   rtx op0 = XEXP (op, 0);
   rtx op1 = XEXP (op, 1);
+  bool need_eq_ne_p = false;
 
   if (TARGET_XTHEADCONDMOV
   && GET_MODE_CLASS (mode) == MODE_INT
@@ -3492,14 +3462,12 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
   && GET_MODE (op0) == mode
   && GET_MODE (op1) == mode
   && (code == EQ || code == NE))
+need_eq_ne_p = true;
+
+  if (need_eq_ne_p || (TARGET_SFB_ALU
+  && GET_MODE (op0) == word_mode))
 {
-  riscv_expand_conditional_move_onesided (dest, cons, alt, code, op0, op1);
-  return true;
-}
-  else if (TARGET_SFB_ALU
-  && GET_MODE (op0) == word_mode)
-{
-  riscv_emit_int_compare (, , );
+  riscv_emit_int_compare (, , , need_eq_ne_p);
   rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
 
   /* The expander allows (const_int 0) for CONS for the benefit of
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c 
b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
index 9afdc2eabfd..e2b135f3d00 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
@@ -1,15 +1,13 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -march=rv32gc_xtheadcondmov -mabi=ilp32 
-mriscv-attribute" } */
-/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-O3" "-Oz" "-flto"} } */
 /* { dg-final { check-function-bodies "**" ""  } } */
 
 /*
 **ConEmv_imm_imm_reg:
 ** addia5,a0,-1000
 ** li  a0,10
-** th.mvneza0,zero,a5
-** th.mveqza1,zero,a5
-** or  a0,a0,a1
+** th.mvneza0,a1,a5
 ** ret
 */
 int ConEmv_imm_imm_reg(int x, int y){
@@ -20,9 +18,8 @@ int ConEmv_imm_imm_reg(int x, int y){
 /*
 **ConEmv_imm_reg_reg:
 ** addia5,a0,-1000
-** th.mvnez

[PATCH] RISC-V: Eliminate the magic number in riscv-v.cc

2023-05-25 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch would like to remove the magic number in the riscv-v.cc, and
align the same value to one macro.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_vlmax_insn): Eliminate the
magic number.
(emit_nonvlmax_insn): Ditto.
(emit_vlmax_merge_insn): Ditto.
(emit_vlmax_cmp_insn): Ditto.
(emit_vlmax_cmp_mu_insn): Ditto.
(expand_vec_series): Ditto.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-v.cc | 77 ++---
 1 file changed, 46 insertions(+), 31 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 458020ce0a1..20b589bf51b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -351,13 +351,15 @@ emit_vlmax_insn (unsigned icode, int op_num, rtx *ops, 
rtx vl)
 {
   machine_mode data_mode = GET_MODE (ops[0]);
   machine_mode mask_mode = get_mask_mode (data_mode).require ();
-  /* We have a maximum of 11 operands for RVV instruction patterns according to
-   * vector.md.  */
-  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
-  /*FULLY_UNMASKED_P*/ true,
-  /*USE_REAL_MERGE_P*/ false, /*HAS_AVL_P*/ true,
-  /*VLMAX_P*/ true,
-  /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
+  insn_expander e (/*OP_NUM*/ op_num,
+ /*HAS_DEST_P*/ true,
+ /*FULLY_UNMASKED_P*/ true,
+ /*USE_REAL_MERGE_P*/ false,
+ /*HAS_AVL_P*/ true,
+ /*VLMAX_P*/ true,
+ /*DEST_MODE*/ data_mode,
+ /*MASK_MODE*/ mask_mode);
+
   e.set_policy (TAIL_ANY);
   e.set_policy (MASK_ANY);
   /* According to LRA mov pattern in vector.md, we have a clobber operand
@@ -373,13 +375,15 @@ emit_nonvlmax_insn (unsigned icode, int op_num, rtx *ops, 
rtx avl)
 {
   machine_mode data_mode = GET_MODE (ops[0]);
   machine_mode mask_mode = get_mask_mode (data_mode).require ();
-  /* We have a maximum of 11 operands for RVV instruction patterns according to
-   * vector.md.  */
-  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
-  /*FULLY_UNMASKED_P*/ true,
-  /*USE_REAL_MERGE_P*/ false, /*HAS_AVL_P*/ true,
-  /*VLMAX_P*/ false,
-  /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
+  insn_expander e (/*OP_NUM*/ op_num,
+ /*HAS_DEST_P*/ true,
+ /*FULLY_UNMASKED_P*/ true,
+ /*USE_REAL_MERGE_P*/ false,
+ /*HAS_AVL_P*/ true,
+ /*VLMAX_P*/ false,
+ /*DEST_MODE*/ data_mode,
+ /*MASK_MODE*/ mask_mode);
+
   e.set_policy (TAIL_ANY);
   e.set_policy (MASK_ANY);
   e.set_vl (avl);
@@ -392,10 +396,15 @@ emit_vlmax_merge_insn (unsigned icode, int op_num, rtx 
*ops)
 {
   machine_mode dest_mode = GET_MODE (ops[0]);
   machine_mode mask_mode = get_mask_mode (dest_mode).require ();
-  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
-  /*FULLY_UNMASKED_P*/ false,
-  /*USE_REAL_MERGE_P*/ false, /*HAS_AVL_P*/ true,
-  /*VLMAX_P*/ true, dest_mode, mask_mode);
+  insn_expander e (/*OP_NUM*/ op_num,
+ /*HAS_DEST_P*/ true,
+ /*FULLY_UNMASKED_P*/ false,
+ /*USE_REAL_MERGE_P*/ false,
+ /*HAS_AVL_P*/ true,
+ /*VLMAX_P*/ true,
+ /*DEST_MODE*/ dest_mode,
+ /*MASK_MODE*/ mask_mode);
+
   e.set_policy (TAIL_ANY);
   e.emit_insn ((enum insn_code) icode, ops);
 }
@@ -405,12 +414,15 @@ void
 emit_vlmax_cmp_insn (unsigned icode, rtx *ops)
 {
   machine_mode mode = GET_MODE (ops[0]);
-  insn_expander<11> e (/*OP_NUM*/ RVV_CMP_OP, /*HAS_DEST_P*/ true,
-  /*FULLY_UNMASKED_P*/ true,
-  /*USE_REAL_MERGE_P*/ false,
-  /*HAS_AVL_P*/ true,
-  /*VLMAX_P*/ true,
-  /*DEST_MODE*/ mode, /*MASK_MODE*/ mode);
+  insn_expander e (/*OP_NUM*/ RVV_CMP_OP,
+ /*HAS_DEST_P*/ true,
+ /*FULLY_UNMASKED_P*/ true,
+ /*USE_REAL_MERGE_P*/ false,
+ /*HAS_AVL_P*/ true,
+

[PATCH V2] RISC-V: Fix zero-scratch-regs-3.c fail

2023-05-25 Thread juzhe . zhong

From: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv.cc (vector_zero_call_used_regs): Add explict VL 
and drop VL in ops.

---
 gcc/config/riscv/riscv.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 09fc9e5d95e..b16c60df6a7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7396,9 +7396,9 @@ vector_zero_call_used_regs (HARD_REG_SET 
need_zeroed_hardregs)
  emitted_vlmax_vsetvl = true;
}
 
- rtx ops[] = {target, CONST0_RTX (mode), vl};
+ rtx ops[] = {target, CONST0_RTX (mode)};
  riscv_vector::emit_vlmax_insn (code_for_pred_mov (mode),
-riscv_vector::RVV_UNOP, ops);
+riscv_vector::RVV_UNOP, ops, vl);
 
  SET_HARD_REG_BIT (zeroed_hardregs, regno);
}
-- 
2.36.3

RE: [PATCH] RISC-V: Add autovec sign/zero extension and truncation.

2023-05-25 Thread Li, Pan2 via Gcc-patches

Looks Juzhe has fixed this issue as below, thanks Juzhe.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619733.html

Pan

-Original Message-
From: Li, Pan2 
Sent: Thursday, May 25, 2023 8:22 PM
To: gcc-patches ; Wang, Yanzhang 

Cc: Robin Dapp ; Kito Cheng ; palmer 
; juzhe.zh...@rivai.ai; jeffreyalaw 
Subject: RE: [PATCH] RISC-V: Add autovec sign/zero extension and truncation.

The zero-scratch-regs-3.c comes from below PATCH. 

https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615494.html

Hi Yanzhang,

Could you please help to double check the issue reported by Robin? Aka: " 
zero-scratch-regs-3.c seems to FAIL in vcondu but that already happens on 
trunk."

Thanks a lot.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Robin Dapp via Gcc-patches
Sent: Thursday, May 25, 2023 5:03 PM
To: gcc-patches ; Kito Cheng ; 
palmer ; juzhe.zh...@rivai.ai; jeffreyalaw 

Cc: rdapp@gmail.com
Subject: [PATCH] RISC-V: Add autovec sign/zero extension and truncation.

Hi,

this patch implements the autovec expanders for sign and zero extension 
patterns as well as the accompanying truncations.  In order to use them 
additional mode_attr iterators as well as vectorizer hooks are required.
Using these hooks we can e.g. vectorize with VNx4QImode as base mode and extend 
VNx4SI to VNx4DI.  They are still going to be expanded in the future.

vf4 and vf8 truncations are emulated by truncating two and three times 
respectively.

The patch also adds tests and changes some expectations for already existing 
ones.

Combine does not yet handle binary operations of two widened operands as we are 
missing the necessary split/rewrite patterns.  These will be added at a later 
time.

Co-authored-by: Juzhe Zhong 

riscv.exp testsuite is unchanged.  zero-scratch-regs-3.c seems to FAIL in 
vcondu but that already happens on trunk.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (2): New
expander.
(2): Dito.
(2): Dito.
(trunc2): Dito.
(trunc2): Dito.
(trunc2): Dito.
* config/riscv/riscv-protos.h (riscv_v_ext_mode_p): Declare.
(vectorize_related_mode): Define.
(autovectorize_vector_modes): Define.
* config/riscv/riscv-v.cc (vectorize_related_mode): Implement
hook.
(autovectorize_vector_modes): Implement hook.
* config/riscv/riscv.cc (riscv_v_ext_tuple_mode_p): Export.
(riscv_autovectorize_vector_modes): Implement target hook.
(riscv_vectorize_related_mode): Implement target hook.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
(TARGET_VECTORIZE_RELATED_MODE): Define.
* config/riscv/vector-iterators.md: Add lowercase versions of
mode_attr iterators.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Adjust
expectation.
* gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: Dito.
* gcc.target/riscv/rvv/rvv.exp: Add new conversion tests.
* gcc.target/riscv/rvv/vsetvl/avl_single-38.c: Do not vectorize.
* gcc.target/riscv/rvv/vsetvl/avl_single-47.c: Dito.
* gcc.target/riscv/rvv/vsetvl/avl_single-48.c: Dito.
* gcc.target/riscv/rvv/vsetvl/avl_single-49.c: Dito.
* gcc.target/riscv/rvv/vsetvl/imm_switch-8.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vsext-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vsext-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vsext-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vsext-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vzext-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vzext-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vzext-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vzext-template.h: New test.
---

Re: [V8][PATCH 2/2] Update documentation to clarify a GCC extension [PR77650]

2023-05-25 Thread Qing Zhao via Gcc-patches




> On May 25, 2023, at 4:51 PM, Joseph Myers  wrote:
> 
> The documentation in this case is OK, though claims about how a future 
> version will behave have a poor track record (we tend to end up with such 
> claims persisting in the documentation even though the change in question 
> didn't get made and might sometimes no longer be considered desirable).
Then, do you have any suggestions on this claim? Shall we delete it from the 
doc? Or keep it?

Qing
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com

Re: [PATCH] RISC-V/testsuite: Run target testing over all the usual optimization levels

2023-05-25 Thread Vineet Gupta





On 5/25/23 14:17, Vineet Gupta wrote:
Thanks for taking a look at this. Please don't get me wrong, never 
mean to vilify the patches above - and I should have verified first 
(by reverting those) if they caused the issue - if at all. It just 
seemed that we started seeing these relatively recently and the timing 
of your changes seemed to coincide. As you say above, RV likely has 
existing less than ideal constructs which somehow used to work before, 
but not in the new regime.


Anyhow the goal is to get this fixed for RV and any more help will be 
appreciated since I'm really a TCL noob.


It seems my claim yesterday that adding torture-{init,finish} fixed RV 
issues were just premature. I was trying a mix of running the full 
suite vs. just  RUNTESTFLAGS="riscv.exp" and in some cases latter can 
give a false positive (I was making sure dejagnu got rebuilt and 
rekicked etc, but anyhow different issue).


I'm currently removing the ADDITIONAL_TORTURE_OPTIONS to see if this 
helps cure it 


Another data point: commenting this out altogether doesn't hep either.

-Vineet

Re: RISC-V Test Errors and Failures

2023-05-25 Thread Vineet Gupta





On 5/25/23 13:29, Thomas Schwinge wrote:

Hi!

On 2023-05-17T09:52:13+0200, Andreas Schwab via Gcc-patches 
 wrote:

On Mai 16 2023, Vineet Gupta wrote:


Yes I was seeing similar tcl errors and such - and in my case an even
higher count.

They are coming from commit d6654a4be3b.

I call FUD.  Until you prove otherwise, of coures.


Grüße
  Thomas


Just as a data point, with those patches reverted, I don't see the tcl 
errors.


2023-05-25 7a3c9f8e8362 Revert "Let each 'lto_init' determine the 
default 'LTO_OPTIONS', and 'torture-init' the 'LTO_TORTURE_OPTIONS'"
2023-05-25 22206cb760ee Revert "Testsuite: Add missing 
'torture-init'/'torture-finish' around 'LTO_TORTURE_OPTIONS' usage"
2023-05-25 db46b946dd6d Revert "Testsuite: Add 'torture-init-done', and 
use it to conditionalize implicit 'torture-init'"

2023-05-25 bd412162fd0d Revert "xxx vineet fixup"
2023-05-22 97a5e2241d33 xxx vineet fixup
2023-05-24 ec2e86274427 Fortran: reject bad DIM argument of SIZE 
intrinsic in simplification [PR104350]

...

   = Summary of gcc testsuite =
    | # of unexpected case / # of unique 
unexpected case

    |  gcc |  g++ | gfortran |
 rv64imafdc/  lp64d/ medlow |   25 / 4 |    1 / 1 |   72 /    12 |
 rv32imafdc/ ilp32d/ medlow |   26 / 5 |    3 / 2 |   72 /    12 |
   rv32imac/  ilp32/ medlow |   25 / 4 |    3 / 2 |  109 /    19 |
   rv64imac/   lp64/ medlow |   26 / 5 |    1 / 1 |  109 /    19 |

Re: [PATCH] RISC-V: Fix zero-scratch-regs-3.c fail

2023-05-25 Thread Kito Cheng via Gcc-patches

Lgtm with a minor comment

 於 2023年5月26日 週五 07:18 寫道：

> From: Juzhe-Zhong 
>
> Fix ICE of zero-scratch-regs-3.c:
> bug.c:7:1: internal compiler error: Segmentation fault
> 7 | }
>   | ^
> 0x1647b23 crash_signal
> ../../../riscv-gcc/gcc/toplev.cc:314
> 0x147053f maybe_legitimize_operand
> ../../../riscv-gcc/gcc/optabs.cc:7947
> 0x1470dc2 maybe_legitimize_operands(insn_code, unsigned int, unsigned int,
> expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8084
> 0x1470e66 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8103
> 0x147146a maybe_expand_insn(insn_code, unsigned int, expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8158
> 0x14714fe expand_insn(insn_code, unsigned int, expand_operand*)
> ../../../riscv-gcc/gcc/optabs.cc:8189
> 0x1c20634 riscv_vector::insn_expander<11>::expand(insn_code, bool)
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:210
> 0x1c20075 riscv_vector::insn_expander<11>::emit_insn(insn_code, rtx_def**)
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:199
> 0x1c16bd1 riscv_vector::emit_vlmax_insn(unsigned int, int, rtx_def**,
> rtx_def*)
> ../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:362
> 0x1ad5bb9 vector_zero_call_used_regs
> ../../../riscv-gcc/gcc/config/riscv/riscv.cc:7400
> 0x1ad5c25 riscv_zero_call_used_regs(HARD_REG_SET)
> ../../../riscv-gcc/gcc/config/riscv/riscv.cc:7420
> 0x115c910 gen_call_used_regs_seq
> ../../../riscv-gcc/gcc/function.cc:5924
> 0x115df81 execute
> ../../../riscv-gcc/gcc/function.cc:6718
>
> ICE happens since we didn't pass explicit VL operand when we can't use
> gen_reg_rtx
> to generate VL operand. This will make operands num mismatch.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (vector_zero_call_used_regs): Add explicit
> VL operand.
>
> ---
>  gcc/config/riscv/riscv.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 09fc9e5d95e..9e41200371d 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -7398,7 +7398,7 @@ vector_zero_call_used_regs (HARD_REG_SET
> need_zeroed_hardregs)
>
>   rtx ops[] = {target, CONST0_RTX (mode), vl};
>

Drop vl from here.

  riscv_vector::emit_vlmax_insn (code_for_pred_mov (mode),
> -riscv_vector::RVV_UNOP, ops);
> +riscv_vector::RVV_UNOP, ops, vl);
>
>   SET_HARD_REG_BIT (zeroed_hardregs, regno);
> }
> --
> 2.36.3
>
>

[PATCH] RISC-V: Fix zero-scratch-regs-3.c fail

2023-05-25 Thread juzhe . zhong

From: Juzhe-Zhong 

Fix ICE of zero-scratch-regs-3.c:
bug.c:7:1: internal compiler error: Segmentation fault
7 | }
  | ^
0x1647b23 crash_signal
../../../riscv-gcc/gcc/toplev.cc:314
0x147053f maybe_legitimize_operand
../../../riscv-gcc/gcc/optabs.cc:7947
0x1470dc2 maybe_legitimize_operands(insn_code, unsigned int, unsigned int, 
expand_operand*)
../../../riscv-gcc/gcc/optabs.cc:8084
0x1470e66 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
../../../riscv-gcc/gcc/optabs.cc:8103
0x147146a maybe_expand_insn(insn_code, unsigned int, expand_operand*)
../../../riscv-gcc/gcc/optabs.cc:8158
0x14714fe expand_insn(insn_code, unsigned int, expand_operand*)
../../../riscv-gcc/gcc/optabs.cc:8189
0x1c20634 riscv_vector::insn_expander<11>::expand(insn_code, bool)
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:210
0x1c20075 riscv_vector::insn_expander<11>::emit_insn(insn_code, rtx_def**)
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:199
0x1c16bd1 riscv_vector::emit_vlmax_insn(unsigned int, int, rtx_def**, rtx_def*)
../../../riscv-gcc/gcc/config/riscv/riscv-v.cc:362
0x1ad5bb9 vector_zero_call_used_regs
../../../riscv-gcc/gcc/config/riscv/riscv.cc:7400
0x1ad5c25 riscv_zero_call_used_regs(HARD_REG_SET)
../../../riscv-gcc/gcc/config/riscv/riscv.cc:7420
0x115c910 gen_call_used_regs_seq
../../../riscv-gcc/gcc/function.cc:5924
0x115df81 execute
../../../riscv-gcc/gcc/function.cc:6718

ICE happens since we didn't pass explicit VL operand when we can't use 
gen_reg_rtx
to generate VL operand. This will make operands num mismatch.

gcc/ChangeLog:

* config/riscv/riscv.cc (vector_zero_call_used_regs): Add explicit VL 
operand.

---
 gcc/config/riscv/riscv.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 09fc9e5d95e..9e41200371d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7398,7 +7398,7 @@ vector_zero_call_used_regs (HARD_REG_SET 
need_zeroed_hardregs)
 
  rtx ops[] = {target, CONST0_RTX (mode), vl};
  riscv_vector::emit_vlmax_insn (code_for_pred_mov (mode),
-riscv_vector::RVV_UNOP, ops);
+riscv_vector::RVV_UNOP, ops, vl);
 
  SET_HARD_REG_BIT (zeroed_hardregs, regno);
}
-- 
2.36.3

[committed] libstdc++: Add relational operators to __gnu_test::PointerBase

2023-05-25 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

The Cpp17Allocator requirements say that an allocator's pointer and
const_pointer types must meet the Cpp17RandomAccessIterator
requirements. That means our PointerBase helper for defining fancy
pointer types should support the full set of relational operators.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_allocator.h (PointerBase): Add
relational operators.
---
 libstdc++-v3/testsuite/util/testsuite_allocator.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/libstdc++-v3/testsuite/util/testsuite_allocator.h 
b/libstdc++-v3/testsuite/util/testsuite_allocator.h
index 9108ee40821..70dacb3fdf2 100644
--- a/libstdc++-v3/testsuite/util/testsuite_allocator.h
+++ b/libstdc++-v3/testsuite/util/testsuite_allocator.h
@@ -719,6 +719,15 @@ namespace __gnu_test
   friend std::ptrdiff_t operator-(PointerBase l, PointerBase r)
   { return l.value - r.value; }
 
+  friend bool operator<(PointerBase l, PointerBase r)
+  { return l.value < r.value; }
+  friend bool operator>(PointerBase l, PointerBase r)
+  { return l.value > r.value; }
+  friend bool operator<=(PointerBase l, PointerBase r)
+  { return l.value <= r.value; }
+  friend bool operator>=(PointerBase l, PointerBase r)
+  { return l.value >= r.value; }
+
   Derived&
   derived() { return static_cast(*this); }
 
-- 
2.40.1

Re: Re: [PATCH v2] RISC-V: Implement autovec abs, vneg, vnot.

2023-05-25 Thread 钟居哲

LGTM this patch. Let's wait for kito's final approval.
Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-25 22:43
To: 钟居哲; gcc-patches; kito.cheng; palmer; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH v2] RISC-V: Implement autovec abs, vneg, vnot.
> Beside, V2 patch should change this:
> emit_vlmax_masked_insn (unsigned icode, int op_num, rtx *ops)
> 
> change it into emit_vlmax_masked_mu_insn .
 
V3 is inline with these changes.
 
This patch implements abs2, vneg2 and vnot2 expanders
for integer vector registers and adds tests for them.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2): Add vneg/vnot.
(abs2): Add.
* config/riscv/riscv-protos.h (emit_vlmax_masked_mu_insn):
Declare.
* config/riscv/riscv-v.cc (emit_vlmax_masked_mu_insn): New
function.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add unop tests.
* gcc.target/riscv/rvv/autovec/unop/abs-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-template.h: New test.
---
gcc/config/riscv/autovec.md   | 43 ++-
gcc/config/riscv/riscv-protos.h   |  2 +
gcc/config/riscv/riscv-v.cc   | 16 +++
.../riscv/rvv/autovec/unop/abs-run.c  | 39 +
.../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  8 
.../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  8 
.../riscv/rvv/autovec/unop/abs-template.h | 26 +++
.../riscv/rvv/autovec/unop/vneg-run.c | 29 +
.../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  6 +++
.../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  6 +++
.../riscv/rvv/autovec/unop/vneg-template.h| 18 
.../riscv/rvv/autovec/unop/vnot-run.c | 43 +++
.../riscv/rvv/autovec/unop/vnot-rv32gcv.c |  6 +++
.../riscv/rvv/autovec/unop/vnot-rv64gcv.c |  6 +++
.../riscv/rvv/autovec/unop/vnot-template.h| 22 ++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  2 +
16 files changed, 279 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-template.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-template.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vnot-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vnot-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vnot-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vnot-template.h
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7fe4d94de39..38216d9812f 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -145,7 +145,7 @@ (define_expand "3"
})
;; -
-;;  [INT] Binary shifts by scalar.
+;;  [INT] Binary shifts by vector.
;; -
;; Includes:
;; - vsll.vv/vsra.vv/vsrl.vv
@@ -373,3 +373,44 @@ (define_expand "vcondu"
 DONE;
   }
)
+
+;; =
+;; == Unary arithmetic
+;; =
+
+;; 
---
+;;  [INT] Unary operations
+;; 
---
+;; Includes:
+;; - vneg.v/vnot.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VI 0 "register_operand")
+(any_int_unop:VI
+ (match_operand:VI 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
+;;

decremnt IV patch create fails on PowerPC

2023-05-25 Thread 钟居哲

Yesterday's patch has been approved (decremnt IV support):
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 

However, it creates fails on PowerPC:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 

I am really sorry for causing inconvinience.

I wonder as we disccussed:
+  /* If we're vectorizing a loop that uses length "controls" and
+ can iterate more than once, we apply decrementing IV approach
+ in loop control.  */
+  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
+  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
+  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
+  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+  && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
+   LOOP_VINFO_VECT_FACTOR (loop_vinfo
+LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;

This conditions can not disable decrement IV on PowerPC.
Should I add a target hook for it? 
The patch I can only do bootstrap and regression on X86.
I didn't have an environment to test PowerPC. I am really sorry.

Thanks.


juzhe.zh...@rivai.ai

Re: [PATCH] RISC-V/testsuite: Run target testing over all the usual optimization levels

2023-05-25 Thread Vineet Gupta




On 5/25/23 14:17, Vineet Gupta wrote:

FWIW if you want to test this out at your end, it is super easy.

|git clone https://github.com/riscv-collab/riscv-gnu-toolchain 
toolchain-upstream cd toolchain-upstream git submodule init git 
submodule update ||./configure --with-arch=rv64imafdc --with-abi=lp64d 
--enable-multilib --enable-linux --prefix= make -j 
make report-linux SIM=qemu Thx, -Vineet ||| 


Sorry the mailer clobbered the instructions.

git clone https://github.com/riscv-collab/riscv-gnu-toolchain 
toolchain-upstream

cd toolchain-upstream
git submodule init
git submodule update
./configure --with-arch=rv64imafdc --with-abi=lp64d --enable-multilib 
--enable-linux --prefix=

make -j
make -j make report-linux SIM=qemu

Thx

Re: [PATCH] RISC-V/testsuite: Run target testing over all the usual optimization levels

2023-05-25 Thread Vineet Gupta


Hi Thomas,

On 5/25/23 13:56, Thomas Schwinge wrote:

Hi!

On 2022-02-08T00:22:37+0800, Kito Cheng via Gcc-patches 
 wrote:

Hi Maciej:

Thanks for doing this, OK to trunk.

On Tue, Feb 1, 2022 at 7:04 AM Maciej W. Rozycki  wrote:

Use `gcc-dg-runtest' test driver rather than `dg-runtest' to run the
RISC-V testsuite as several targets already do.  Adjust test options
across individual test cases accordingly where required.

As some tests want to be run at `-Og', add a suitable optimization
variant via ADDITIONAL_TORTURE_OPTIONS, and include the moderately
recent `-Oz' variant as well.

 * testsuite/gcc.target/riscv/riscv.exp: Use `gcc-dg-runtest'
 rather than `dg-runtest'.  Add `-Og -g' and `-Oz' variants via
 ADDITIONAL_TORTURE_OPTIONS.
  As to adding `-Og -g' and `-Oz', this should probably be done globally in
gcc-dg.exp, but such a change would affect all the interested targets at
once and would require a huge one-by-one test case review.  Therefore I
think adding targets one by one instead is more feasible, and then we can
switch once all the targets have.
--- gcc.orig/gcc/testsuite/gcc.target/riscv/riscv.exp
+++ gcc/gcc/testsuite/gcc.target/riscv/riscv.exp
@@ -21,6 +21,8 @@ if ![istarget riscv*-*-*] then {
return
  }

+lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}
+
  # Load support procs.
  load_lib gcc-dg.exp

Per my understanding, that is not the correct way to do this.  See
'gcc/doc/sourcebuild.texi':

 [...] add to the default list by defining
 @var{ADDITIONAL_TORTURE_OPTIONS}.  Define these in a @file{.dejagnurc}
 file or add them to the @file{site.exp} file; for example [...]

Notice '.dejagnurc' or 'site.exp', that is: globally.  (Doing this
"globally in gcc-dg.exp" -- as you'd mentioned above -- would work too,
conditionalized to '[istarget riscv*-*-*]' only, for now, as you
suggested.)

Otherwise, per what we've not got, either of the following two happens:
before any other 'load_lib gcc-dg.exp', 'gcc.target/riscv/riscv.exp'
happens to be read first, does set 'ADDITIONAL_TORTURE_OPTIONS', then
does its 'load_lib gcc-dg.exp' -- which then incorporates
'ADDITIONAL_TORTURE_OPTIONS' for *all* (!) following '*.exp' files as
part of that 'runtest' instance.  Alternatively, any other '*.exp' file's
'load_lib gcc-dg.exp' comes first (without the desired
'ADDITIONAL_TORTURE_OPTIONS' being set), and once
'gcc.target/riscv/riscv.exp' is read, while it then does set
'ADDITIONAL_TORTURE_OPTIONS', its 'load_lib gcc-dg.exp' is a no-op, as
that one has already been loaded, and therefore the
'ADDITIONAL_TORTURE_OPTIONS' aren't incorporated.

Instead, I suggest to do this locally: do 'load_lib torture-options.exp',
'torture-init', 'set-torture-options [...]' (where that includes your
special options), 'gcc-dg-runtest', 'torture-finish'.  See other '*.exp'
files.


(No, I didn't invent this interface.)


(I however don't see yet how this would be related to the current
"ERROR: torture-init: torture_without_loops is not empty as expected"
discussion, as has, kind of, been claimed in

"RISC-V: Add missing torture-init and torture-finish for rvv.exp" and the
following.)


Thanks for taking a look at this. Please don't get me wrong, never mean 
to vilify the patches above - and I should have verified first (by 
reverting those) if they caused the issue - if at all. It just seemed 
that we started seeing these relatively recently and the timing of your 
changes seemed to coincide. As you say above, RV likely has existing 
less than ideal constructs which somehow used to work before, but not in 
the new regime.


Anyhow the goal is to get this fixed for RV and any more help will be 
appreciated since I'm really a TCL noob.


It seems my claim yesterday that adding torture-{init,finish} fixed RV 
issues were just premature. I was trying a mix of running the full suite 
vs. just  RUNTESTFLAGS="riscv.exp" and in some cases latter can give a 
false positive (I was making sure dejagnu got rebuilt and rekicked etc, 
but anyhow different issue).


I'm currently removing the ADDITIONAL_TORTURE_OPTIONS to see if this 
helps cure it and then try the new sequence you pointed to above.


FWIW if you want to test this out at your end, it is super easy.

|git clone https://github.com/riscv-collab/riscv-gnu-toolchain 
toolchain-upstream cd toolchain-upstream git submodule init git 
submodule update ||./configure --with-arch=rv64imafdc --with-abi=lp64d --enable-multilib 
--enable-linux --prefix= make -j make report-linux 
SIM=qemu Thx, -Vineet |||

Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-05-25 Thread Joseph Myers

What happens if the field giving the number of elements is in a contained 
anonymous structure or union?

struct s {
  struct { size_t count; };
  int array[] __attribute__ ((element_count ("count")));
};

This ought to work - a general principle in C is that anonymous structures 
and unions are transparent as far as name lookup for fields is concerned.  
But I don't see any testcases for it and I'm not sure it would work with 
the present code.

What if the string is a wide string?  I don't expect that to work (either 
as a matter of interface design, or in the present code), but I think that 
case should have a specific check and error.

What happens in the case where -fexec-charset specifies a 
non-ASCII-compatible character set?  I expect that to work OK with the 
existing code, because translation of string literals to the execution 
character set is disabled in __attribute__ parsing, but having a testcase 
for it would be good.

What happens if the field referenced for the element count does not have 
integer type?  I'd expect an error, but don't see one in the code or tests 
here.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] RISC-V/testsuite: Run target testing over all the usual optimization levels

2023-05-25 Thread Thomas Schwinge via Gcc-patches

Hi!

On 2022-02-08T00:22:37+0800, Kito Cheng via Gcc-patches 
 wrote:
> Hi Maciej:
>
> Thanks for doing this, OK to trunk.
>
> On Tue, Feb 1, 2022 at 7:04 AM Maciej W. Rozycki  wrote:
>>
>> Use `gcc-dg-runtest' test driver rather than `dg-runtest' to run the
>> RISC-V testsuite as several targets already do.  Adjust test options
>> across individual test cases accordingly where required.
>>
>> As some tests want to be run at `-Og', add a suitable optimization
>> variant via ADDITIONAL_TORTURE_OPTIONS, and include the moderately
>> recent `-Oz' variant as well.
>>
>> * testsuite/gcc.target/riscv/riscv.exp: Use `gcc-dg-runtest'
>> rather than `dg-runtest'.  Add `-Og -g' and `-Oz' variants via
>> ADDITIONAL_TORTURE_OPTIONS.

>>  As to adding `-Og -g' and `-Oz', this should probably be done globally in
>> gcc-dg.exp, but such a change would affect all the interested targets at
>> once and would require a huge one-by-one test case review.  Therefore I
>> think adding targets one by one instead is more feasible, and then we can
>> switch once all the targets have.

>> --- gcc.orig/gcc/testsuite/gcc.target/riscv/riscv.exp
>> +++ gcc/gcc/testsuite/gcc.target/riscv/riscv.exp
>> @@ -21,6 +21,8 @@ if ![istarget riscv*-*-*] then {
>>return
>>  }
>>
>> +lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}
>> +
>>  # Load support procs.
>>  load_lib gcc-dg.exp

Per my understanding, that is not the correct way to do this.  See
'gcc/doc/sourcebuild.texi':

[...] add to the default list by defining
@var{ADDITIONAL_TORTURE_OPTIONS}.  Define these in a @file{.dejagnurc}
file or add them to the @file{site.exp} file; for example [...]

Notice '.dejagnurc' or 'site.exp', that is: globally.  (Doing this
"globally in gcc-dg.exp" -- as you'd mentioned above -- would work too,
conditionalized to '[istarget riscv*-*-*]' only, for now, as you
suggested.)

Otherwise, per what we've not got, either of the following two happens:
before any other 'load_lib gcc-dg.exp', 'gcc.target/riscv/riscv.exp'
happens to be read first, does set 'ADDITIONAL_TORTURE_OPTIONS', then
does its 'load_lib gcc-dg.exp' -- which then incorporates
'ADDITIONAL_TORTURE_OPTIONS' for *all* (!) following '*.exp' files as
part of that 'runtest' instance.  Alternatively, any other '*.exp' file's
'load_lib gcc-dg.exp' comes first (without the desired
'ADDITIONAL_TORTURE_OPTIONS' being set), and once
'gcc.target/riscv/riscv.exp' is read, while it then does set
'ADDITIONAL_TORTURE_OPTIONS', its 'load_lib gcc-dg.exp' is a no-op, as
that one has already been loaded, and therefore the
'ADDITIONAL_TORTURE_OPTIONS' aren't incorporated.

Instead, I suggest to do this locally: do 'load_lib torture-options.exp',
'torture-init', 'set-torture-options [...]' (where that includes your
special options), 'gcc-dg-runtest', 'torture-finish'.  See other '*.exp'
files.

(No, I didn't invent this interface.)

(I however don't see yet how this would be related to the current
"ERROR: torture-init: torture_without_loops is not empty as expected"
discussion, as has, kind of, been claimed in

"RISC-V: Add missing torture-init and torture-finish for rvv.exp" and the
following.)

Grüße
 Thomas

>> @@ -34,7 +36,7 @@ if ![info exists DEFAULT_CFLAGS] then {
>>  dg-init
>>
>>  # Main loop.
>> -dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \
>> +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \
>> "" $DEFAULT_CFLAGS
>>
>>  # All done.

Re: [V8][PATCH 2/2] Update documentation to clarify a GCC extension [PR77650]

2023-05-25 Thread Joseph Myers

The documentation in this case is OK, though claims about how a future 
version will behave have a poor track record (we tend to end up with such 
claims persisting in the documentation even though the change in question 
didn't get made and might sometimes no longer be considered desirable).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: RISC-V Test Errors and Failures

2023-05-25 Thread Thomas Schwinge via Gcc-patches

Hi!

On 2023-05-17T09:52:13+0200, Andreas Schwab via Gcc-patches 
 wrote:
> On Mai 16 2023, Vineet Gupta wrote:
>
>> Yes I was seeing similar tcl errors and such - and in my case an even
>> higher count.
>
> They are coming from commit d6654a4be3b.

I call FUD.  Until you prove otherwise, of coures.


Grüße
 Thomas

Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-25 Thread Thomas Schwinge via Gcc-patches

Hi!

On 2023-05-24T15:13:19-0700, Vineet Gupta  wrote:
> On 5/24/23 13:34, Thomas Schwinge wrote:
>> Yeah, at this point I'm not sure whether my recent changes really are
>> related/relevant here.
>>
>>> Apparently in addition to Kito's patch below, If I comment out the
>>> additional torture options, failures go down drastically.
>> Meaning that *all* those ERRORs disappear?
>
> No but they reduced significantly. Anyhow I think the issue should be 
> simple enough for someone familiar with how the tcl stuff works...

I'm here to help -- but you'll have to help me to help you, please.

>>> diff --git a/gcc/testsuite/gcc.target/riscv/riscv.exp
>>> b/gcc/testsuite/gcc.target/riscv/riscv.exp
>>>
>>> -lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}
>>> +#lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}
>>>
>>> @Thomas, do you have some thoughts on how to fix riscv.exp properly in
>>> light of recent changes to exp files.
>> I'm trying to understand this, but so far don't.  Can I please see a
>> complete 'gcc.log' file where the ERRORs are visible?

> So we are at bleeding edge gcc from today
>   2023-05-24 ec2e86274427 Fortran: reject bad DIM argument of SIZE 
> intrinsic in simplification [PR104350]
>
> With an additional fix from Kito along the lines of..
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
> b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>
>   dg-init
> +torture-init
>
>   # All done.
> +torture-finish
>   dg-finish

That shouldn't be necessary here?

> I'm pasting a snippet of gcc.log. Issue is indeed triggered by rvv.exp 
> which needs some love.

I'd intentionally asked to "see a complete 'gcc.log' file where the
ERRORs are visible".

On 2023-05-24T16:12:20-0700, Vineet Gupta  wrote:
> On 5/24/23 15:13, Vineet Gupta wrote:
>>
>> PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects  (test for excess errors)
>> PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects   scan-assembler-times mul\t 1
>> PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects   scan-assembler-not div\t
>> PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects   scan-assembler-not rem\t
>> testcase 
>> /scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp
>>  
>> completed in 60 seconds
>> Running 
>> /scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
>>  
>> ...
>> ERROR: tcl error sourcing 
>> /scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp.
>> ERROR: tcl error code NONE
>> ERROR: torture-init: torture_without_loops is not empty as expected

I'd seen this before, in your earlier emails.

> Never mind. Looks like I found the issue - with just trial and error and 
> no idea of how this stuff works.

Instead of "magic", let's please try to properly work this out.

> The torture-{init,finish} needs to be in riscv.exp not rvv.exp
> Running full tests now.

I still don't understand this.

My current theory would be that some other '*.exp' file runs
'torture-init' and then prematurely ends without 'torture-finish', and
thus the torture testing state bleeds into the next '*.exp' file(s).  I'd
hoped that I could pinpoint that via "a complete 'gcc.log' file where the
ERRORs are visible".


Grüße
 Thomas

Re: [committed] testsuite: Require trampolines for nestev-vla tests

2023-05-25 Thread Martin Uecker via Gcc-patches



Thanks! I will try to not forget this next time.

Am Donnerstag, dem 25.05.2023 um 21:20 +0300 schrieb Dimitar Dimitrov:
> Three recent test cases declare nested C functions, so they fail on
> targets lacking support for trampolines. Fix by adding the necessary
> filter.
> 
> Committed as obvious.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/nested-vla-1.c: Require effective target trampolines.
>   * gcc.dg/nested-vla-2.c: Ditto.
>   * gcc.dg/nested-vla-3.c: Ditto.
> 
> CC: Martin Uecker 
> Signed-off-by: Dimitar Dimitrov 
> ---
>  gcc/testsuite/gcc.dg/nested-vla-1.c | 1 +
>  gcc/testsuite/gcc.dg/nested-vla-2.c | 1 +
>  gcc/testsuite/gcc.dg/nested-vla-3.c | 1 +
>  3 files changed, 3 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.dg/nested-vla-1.c 
> b/gcc/testsuite/gcc.dg/nested-vla-1.c
> index 5b62c2c213a..d1b3dc3c5f8 100644
> --- a/gcc/testsuite/gcc.dg/nested-vla-1.c
> +++ b/gcc/testsuite/gcc.dg/nested-vla-1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-options "-std=gnu99" } */
> +/* { dg-require-effective-target trampolines } */
>  
> 
>  
> 
>  int main()
> diff --git a/gcc/testsuite/gcc.dg/nested-vla-2.c 
> b/gcc/testsuite/gcc.dg/nested-vla-2.c
> index d83c90a0b16..294b01d370e 100644
> --- a/gcc/testsuite/gcc.dg/nested-vla-2.c
> +++ b/gcc/testsuite/gcc.dg/nested-vla-2.c
> @@ -1,5 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-options "-std=gnu99" } */
> +/* { dg-require-effective-target trampolines } */
>  
> 
>  
> 
>  int main()
> diff --git a/gcc/testsuite/gcc.dg/nested-vla-3.c 
> b/gcc/testsuite/gcc.dg/nested-vla-3.c
> index 1ffb482da3b..d2ba04adab8 100644
> --- a/gcc/testsuite/gcc.dg/nested-vla-3.c
> +++ b/gcc/testsuite/gcc.dg/nested-vla-3.c
> @@ -1,5 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-options "-std=gnu99" } */
> +/* { dg-require-effective-target trampolines } */
>  
> 
>  
> 
>  int main()

Re: [PATCH] libstdc++: use using instead of typedef for type_traits

2023-05-25 Thread Jonathan Wakely via Gcc-patches

On Thu, 25 May 2023 at 19:32, Patrick Palka via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> On Tue, May 23, 2023 at 3:42 PM Ken Matsui via Gcc-patches
>  wrote:
> >
> > Since the type_traits header is a C++11 header file, using can be used
> instead
> > of typedef. This patch provides more readability, especially for long
> type
> > names.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/std/type_traits: Use using instead of typedef
>
> LGTM, thanks!
>

Agreed.

+Reviewed-by: Jonathan Wakely 

Patrick, please could you push it at your convenience, thanks.



> > ---
> >  libstdc++-v3/include/std/type_traits | 158 +--
> >  1 file changed, 79 insertions(+), 79 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/std/type_traits
> b/libstdc++-v3/include/std/type_traits
> > index bc6982f9e64..0e7a9c9c7f3 100644
> > --- a/libstdc++-v3/include/std/type_traits
> > +++ b/libstdc++-v3/include/std/type_traits
> > @@ -61,9 +61,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >template
> >  struct integral_constant
> >  {
> > -  static constexpr _Tp  value = __v;
> > -  typedef _Tp   value_type;
> > -  typedef integral_constant<_Tp, __v>   type;
> > +  static constexpr _Tp value = __v;
> > +  using value_type = _Tp;
> > +  using type = integral_constant<_Tp, __v>;
> >constexpr operator value_type() const noexcept { return value; }
> >  #if __cplusplus > 201103L
> >
> > @@ -109,7 +109,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >// Partial specialization for true.
> >template
> >  struct enable_if
> > -{ typedef _Tp type; };
> > +{ using type = _Tp; };
> >
> >// __enable_if_t (std::enable_if_t for C++11)
> >template
> > @@ -946,7 +946,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  struct __is_destructible_impl
> >  : public __do_is_destructible_impl
> >  {
> > -  typedef decltype(__test<_Tp>(0)) type;
> > +  using type = decltype(__test<_Tp>(0));
> >  };
> >
> >template > @@ -1000,7 +1000,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  struct __is_nt_destructible_impl
> >  : public __do_is_nt_destructible_impl
> >  {
> > -  typedef decltype(__test<_Tp>(0)) type;
> > +  using type = decltype(__test<_Tp>(0));
> >  };
> >
> >template > @@ -1252,7 +1252,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  struct __is_implicitly_default_constructible_impl
> >  : public __do_is_implicitly_default_constructible_impl
> >  {
> > -  typedef decltype(__test(declval<_Tp>())) type;
> > +  using type = decltype(__test(declval<_Tp>()));
> >  };
> >
> >template
> > @@ -1422,7 +1422,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  is_array<_To>>::value>
> >  struct __is_convertible_helper
> >  {
> > -  typedef typename is_void<_To>::type type;
> > +  using type = typename is_void<_To>::type;
> >  };
> >
> >  #pragma GCC diagnostic push
> > @@ -1443,7 +1443,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > __test(...);
> >
> >  public:
> > -  typedef decltype(__test<_From, _To>(0)) type;
> > +  using type = decltype(__test<_From, _To>(0));
> >  };
> >  #pragma GCC diagnostic pop
> >
> > @@ -1521,20 +1521,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >/// remove_const
> >template
> >  struct remove_const
> > -{ typedef _Tp type; };
> > +{ using type = _Tp; };
> >
> >template
> >  struct remove_const<_Tp const>
> > -{ typedef _Tp type; };
> > +{ using type = _Tp; };
> >
> >/// remove_volatile
> >template
> >  struct remove_volatile
> > -{ typedef _Tp type; };
> > +{ using type = _Tp; };
> >
> >template
> >  struct remove_volatile<_Tp volatile>
> > -{ typedef _Tp type; };
> > +{ using type = _Tp; };
> >
> >/// remove_cv
> >  #if __has_builtin(__remove_cv)
> > @@ -1658,83 +1658,83 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >
> >template
> >  struct __cv_selector<_Unqualified, false, false>
> > -{ typedef _Unqualified __type; };
> > +{ using __type = _Unqualified; };
> >
> >template
> >  struct __cv_selector<_Unqualified, false, true>
> > -{ typedef volatile _Unqualified __type; };
> > +{ using __type = volatile _Unqualified; };
> >
> >template
> >  struct __cv_selector<_Unqualified, true, false>
> > -{ typedef const _Unqualified __type; };
> > +{ using __type = const _Unqualified; };
> >
> >template
> >  struct __cv_selector<_Unqualified, true, true>
> > -{ typedef const volatile _Unqualified __type; };
> > +{ using __type = const volatile _Unqualified; };
> >
> >template >bool _IsConst = is_const<_Qualified>::value,
> >bool _IsVol = is_volatile<_Qualified>::value>
> >  class __match_cv_qualifiers
> >  {
> > -  typedef __cv_selector<_Unqualified, _IsConst, _IsVol> __match;
> > +  using __match =

Re: [PATCH] libstdc++: use using instead of typedef for type_traits

2023-05-25 Thread Ken Matsui via Gcc-patches

On Thu, May 25, 2023 at 11:31 AM Patrick Palka  wrote:
>
> On Tue, May 23, 2023 at 3:42 PM Ken Matsui via Gcc-patches
>  wrote:
> >
> > Since the type_traits header is a C++11 header file, using can be used 
> > instead
> > of typedef. This patch provides more readability, especially for long type
> > names.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/std/type_traits: Use using instead of typedef
>
> LGTM, thanks!

Thank you!

> > ---
> >  libstdc++-v3/include/std/type_traits | 158 +--
> >  1 file changed, 79 insertions(+), 79 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/std/type_traits 
> > b/libstdc++-v3/include/std/type_traits
> > index bc6982f9e64..0e7a9c9c7f3 100644
> > --- a/libstdc++-v3/include/std/type_traits
> > +++ b/libstdc++-v3/include/std/type_traits
> > @@ -61,9 +61,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >template
> >  struct integral_constant
> >  {
> > -  static constexpr _Tp  value = __v;
> > -  typedef _Tp   value_type;
> > -  typedef integral_constant<_Tp, __v>   type;
> > +  static constexpr _Tp value = __v;
> > +  using value_type = _Tp;
> > +  using type = integral_constant<_Tp, __v>;
> >constexpr operator value_type() const noexcept { return value; }
> >  #if __cplusplus > 201103L
> >
> > @@ -109,7 +109,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >// Partial specialization for true.
> >template
> >  struct enable_if
> > -{ typedef _Tp type; };
> > +{ using type = _Tp; };
> >
> >// __enable_if_t (std::enable_if_t for C++11)
> >template
> > @@ -946,7 +946,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  struct __is_destructible_impl
> >  : public __do_is_destructible_impl
> >  {
> > -  typedef decltype(__test<_Tp>(0)) type;
> > +  using type = decltype(__test<_Tp>(0));
> >  };
> >
> >template > @@ -1000,7 +1000,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  struct __is_nt_destructible_impl
> >  : public __do_is_nt_destructible_impl
> >  {
> > -  typedef decltype(__test<_Tp>(0)) type;
> > +  using type = decltype(__test<_Tp>(0));
> >  };
> >
> >template > @@ -1252,7 +1252,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  struct __is_implicitly_default_constructible_impl
> >  : public __do_is_implicitly_default_constructible_impl
> >  {
> > -  typedef decltype(__test(declval<_Tp>())) type;
> > +  using type = decltype(__test(declval<_Tp>()));
> >  };
> >
> >template
> > @@ -1422,7 +1422,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  is_array<_To>>::value>
> >  struct __is_convertible_helper
> >  {
> > -  typedef typename is_void<_To>::type type;
> > +  using type = typename is_void<_To>::type;
> >  };
> >
> >  #pragma GCC diagnostic push
> > @@ -1443,7 +1443,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > __test(...);
> >
> >  public:
> > -  typedef decltype(__test<_From, _To>(0)) type;
> > +  using type = decltype(__test<_From, _To>(0));
> >  };
> >  #pragma GCC diagnostic pop
> >
> > @@ -1521,20 +1521,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >/// remove_const
> >template
> >  struct remove_const
> > -{ typedef _Tp type; };
> > +{ using type = _Tp; };
> >
> >template
> >  struct remove_const<_Tp const>
> > -{ typedef _Tp type; };
> > +{ using type = _Tp; };
> >
> >/// remove_volatile
> >template
> >  struct remove_volatile
> > -{ typedef _Tp type; };
> > +{ using type = _Tp; };
> >
> >template
> >  struct remove_volatile<_Tp volatile>
> > -{ typedef _Tp type; };
> > +{ using type = _Tp; };
> >
> >/// remove_cv
> >  #if __has_builtin(__remove_cv)
> > @@ -1658,83 +1658,83 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >
> >template
> >  struct __cv_selector<_Unqualified, false, false>
> > -{ typedef _Unqualified __type; };
> > +{ using __type = _Unqualified; };
> >
> >template
> >  struct __cv_selector<_Unqualified, false, true>
> > -{ typedef volatile _Unqualified __type; };
> > +{ using __type = volatile _Unqualified; };
> >
> >template
> >  struct __cv_selector<_Unqualified, true, false>
> > -{ typedef const _Unqualified __type; };
> > +{ using __type = const _Unqualified; };
> >
> >template
> >  struct __cv_selector<_Unqualified, true, true>
> > -{ typedef const volatile _Unqualified __type; };
> > +{ using __type = const volatile _Unqualified; };
> >
> >template >bool _IsConst = is_const<_Qualified>::value,
> >bool _IsVol = is_volatile<_Qualified>::value>
> >  class __match_cv_qualifiers
> >  {
> > -  typedef __cv_selector<_Unqualified, _IsConst, _IsVol> __match;
> > +  using __match = __cv_selector<_Unqualified, _IsConst, _IsVol>;
> >
> >  public:
> > -  typedef typename __match::__type __type;
> > +

Re: [PATCH] libstdc++: use using instead of typedef for type_traits

2023-05-25 Thread Patrick Palka via Gcc-patches

On Tue, May 23, 2023 at 3:42 PM Ken Matsui via Gcc-patches
 wrote:
>
> Since the type_traits header is a C++11 header file, using can be used instead
> of typedef. This patch provides more readability, especially for long type
> names.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits: Use using instead of typedef

LGTM, thanks!

> ---
>  libstdc++-v3/include/std/type_traits | 158 +--
>  1 file changed, 79 insertions(+), 79 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index bc6982f9e64..0e7a9c9c7f3 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -61,9 +61,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template
>  struct integral_constant
>  {
> -  static constexpr _Tp  value = __v;
> -  typedef _Tp   value_type;
> -  typedef integral_constant<_Tp, __v>   type;
> +  static constexpr _Tp value = __v;
> +  using value_type = _Tp;
> +  using type = integral_constant<_Tp, __v>;
>constexpr operator value_type() const noexcept { return value; }
>  #if __cplusplus > 201103L
>
> @@ -109,7 +109,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>// Partial specialization for true.
>template
>  struct enable_if
> -{ typedef _Tp type; };
> +{ using type = _Tp; };
>
>// __enable_if_t (std::enable_if_t for C++11)
>template
> @@ -946,7 +946,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  struct __is_destructible_impl
>  : public __do_is_destructible_impl
>  {
> -  typedef decltype(__test<_Tp>(0)) type;
> +  using type = decltype(__test<_Tp>(0));
>  };
>
>template @@ -1000,7 +1000,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  struct __is_nt_destructible_impl
>  : public __do_is_nt_destructible_impl
>  {
> -  typedef decltype(__test<_Tp>(0)) type;
> +  using type = decltype(__test<_Tp>(0));
>  };
>
>template @@ -1252,7 +1252,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  struct __is_implicitly_default_constructible_impl
>  : public __do_is_implicitly_default_constructible_impl
>  {
> -  typedef decltype(__test(declval<_Tp>())) type;
> +  using type = decltype(__test(declval<_Tp>()));
>  };
>
>template
> @@ -1422,7 +1422,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  is_array<_To>>::value>
>  struct __is_convertible_helper
>  {
> -  typedef typename is_void<_To>::type type;
> +  using type = typename is_void<_To>::type;
>  };
>
>  #pragma GCC diagnostic push
> @@ -1443,7 +1443,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> __test(...);
>
>  public:
> -  typedef decltype(__test<_From, _To>(0)) type;
> +  using type = decltype(__test<_From, _To>(0));
>  };
>  #pragma GCC diagnostic pop
>
> @@ -1521,20 +1521,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>/// remove_const
>template
>  struct remove_const
> -{ typedef _Tp type; };
> +{ using type = _Tp; };
>
>template
>  struct remove_const<_Tp const>
> -{ typedef _Tp type; };
> +{ using type = _Tp; };
>
>/// remove_volatile
>template
>  struct remove_volatile
> -{ typedef _Tp type; };
> +{ using type = _Tp; };
>
>template
>  struct remove_volatile<_Tp volatile>
> -{ typedef _Tp type; };
> +{ using type = _Tp; };
>
>/// remove_cv
>  #if __has_builtin(__remove_cv)
> @@ -1658,83 +1658,83 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>template
>  struct __cv_selector<_Unqualified, false, false>
> -{ typedef _Unqualified __type; };
> +{ using __type = _Unqualified; };
>
>template
>  struct __cv_selector<_Unqualified, false, true>
> -{ typedef volatile _Unqualified __type; };
> +{ using __type = volatile _Unqualified; };
>
>template
>  struct __cv_selector<_Unqualified, true, false>
> -{ typedef const _Unqualified __type; };
> +{ using __type = const _Unqualified; };
>
>template
>  struct __cv_selector<_Unqualified, true, true>
> -{ typedef const volatile _Unqualified __type; };
> +{ using __type = const volatile _Unqualified; };
>
>templatebool _IsConst = is_const<_Qualified>::value,
>bool _IsVol = is_volatile<_Qualified>::value>
>  class __match_cv_qualifiers
>  {
> -  typedef __cv_selector<_Unqualified, _IsConst, _IsVol> __match;
> +  using __match = __cv_selector<_Unqualified, _IsConst, _IsVol>;
>
>  public:
> -  typedef typename __match::__type __type;
> +  using __type = typename __match::__type;
>  };
>
>// Utility for finding the unsigned versions of signed integral types.
>template
>  struct __make_unsigned
> -{ typedef _Tp __type; };
> +{ using __type = _Tp; };
>
>template<>
>  struct __make_unsigned
> -{ typedef unsigned char __type; };
> +{ using __type = unsigned char; };

[committed] testsuite: Require trampolines for nestev-vla tests

2023-05-25 Thread Dimitar Dimitrov

Three recent test cases declare nested C functions, so they fail on
targets lacking support for trampolines. Fix by adding the necessary
filter.

Committed as obvious.

gcc/testsuite/ChangeLog:

* gcc.dg/nested-vla-1.c: Require effective target trampolines.
* gcc.dg/nested-vla-2.c: Ditto.
* gcc.dg/nested-vla-3.c: Ditto.

CC: Martin Uecker 
Signed-off-by: Dimitar Dimitrov 
---
 gcc/testsuite/gcc.dg/nested-vla-1.c | 1 +
 gcc/testsuite/gcc.dg/nested-vla-2.c | 1 +
 gcc/testsuite/gcc.dg/nested-vla-3.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/nested-vla-1.c 
b/gcc/testsuite/gcc.dg/nested-vla-1.c
index 5b62c2c213a..d1b3dc3c5f8 100644
--- a/gcc/testsuite/gcc.dg/nested-vla-1.c
+++ b/gcc/testsuite/gcc.dg/nested-vla-1.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-std=gnu99" } */
+/* { dg-require-effective-target trampolines } */
 
 
 int main()
diff --git a/gcc/testsuite/gcc.dg/nested-vla-2.c 
b/gcc/testsuite/gcc.dg/nested-vla-2.c
index d83c90a0b16..294b01d370e 100644
--- a/gcc/testsuite/gcc.dg/nested-vla-2.c
+++ b/gcc/testsuite/gcc.dg/nested-vla-2.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-std=gnu99" } */
+/* { dg-require-effective-target trampolines } */
 
 
 int main()
diff --git a/gcc/testsuite/gcc.dg/nested-vla-3.c 
b/gcc/testsuite/gcc.dg/nested-vla-3.c
index 1ffb482da3b..d2ba04adab8 100644
--- a/gcc/testsuite/gcc.dg/nested-vla-3.c
+++ b/gcc/testsuite/gcc.dg/nested-vla-3.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-std=gnu99" } */
+/* { dg-require-effective-target trampolines } */
 
 
 int main()
-- 
2.40.1

Re: [PATCH] RISC-V: In pipeline scheduling, insns should not be fusion in different BB blocks.

2023-05-25 Thread Jeff Law via Gcc-patches





On 5/25/23 02:32, Jin Ma wrote:

When the last insn1 of BB1 and the first insn2 of BB2 are fusion, insn2 will
clear all dependencies in the function chain_to_prev_insn, resulting in insn2
may mov to any BB, and the program calculation result is wrong.

gcc/ChangeLog:

* sched-deps.cc (sched_macro_fuse_insns): Insns should not be fusion
in different BB blocks
I've pushed this to the trunk.  After a week or so I'll push it to the 
active release branches.


jeff

[avr,committed]: Implement PR104327 for avr

2023-05-25 Thread Georg-Johann Lay





Am 25.05.23 um 17:07 schrieb Richard Biener:




Am 25.05.2023 um 16:22 schrieb Georg-Johann Lay :




Am 25.05.23 um 08:35 schrieb Richard Biener:

On Wed, May 24, 2023 at 5:44 PM Georg-Johann Lay  wrote:
Am 24.05.23 um 11:38 schrieb Richard Biener:

On Tue, May 23, 2023 at 2:56 PM Georg-Johann Lay  wrote:


PR target/104327 not only affects s390 but also avr:
The avr backend pre-sets some options depending on optimization level.
The inliner then thinks that always_inline functions are not eligible
for inlining and terminates with an error.

Proposing the following patch that implements TARGET_CAN_INLINE_P.

Ok to apply?

Johann

target/104327: Allow more inlining between different optimization levels.

avr-common.cc introduces the following options that are set depending
on optimization level: -mgas-isr-prologues, -mmain-is-OS-task and
-fsplit-wide-types-early.  The inliner thinks that different options
disallow cross-optimization inlining, so provide can_inline_p.

gcc/
  PR target/104327
  * config/avr/avr.cc (avr_can_inline_p): New static function.
  (TARGET_CAN_INLINE_P): Define to that function.
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 9fa50ca230d..55b48f63865 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -1018,6 +1018,22 @@ avr_no_gccisr_function_p (tree func)
  return avr_lookup_function_attribute1 (func, "no_gccisr");
}

+
+/* Implement `TARGET_CAN_INLINE_P'.  */
+/* Some options like -mgas_isr_prologues depend on optimization level,
+   and the inliner might think that due to different options, inlining
+   is not permitted; see PR104327.  */
+
+static bool
+avr_can_inline_p (tree /* caller */, tree callee)
+{
+  // For now, dont't allow to inline ISRs.  If the user actually wants
+  // to inline ISR code, they have to turn the body of the ISR into an
+  // ordinary function.
+
+  return ! avr_interrupt_function_p (callee);


I'm not sure if AVR has ISA extensions but the above will likely break
things like

void __attribute__((target("-mX"))) foo () { asm ("isa X opcode");
stmt-that-generates-X-ISA; }


This yields

warning: target attribute is not supported on this machine [-Wattributes]

Ah, that's an interesting fact.  So that indeed leaves
__attribute__((optimize(...)))
influencing the set of active target attributes via the generic option target
hooks like in your case the different defaults.

avr has -mmcu= target options, but switching them in mid-air
won't work because the file prologue might already be different
and incompatible across different architectures.  And I never
saw any user requesting such a thing, and I can't imagine
any reasonable use case...  If the warning is not strong enough,
may be it can be turned into an error, but -Wattributes is not
specific enough for that.

Note the target attribute is then simply ignored.

void bar ()
{
if (cpu-has-X)
  foo ();
}

if always-inlines are the concern you can use

bool always_inline
  = (DECL_DISREGARD_INLINE_LIMITS (callee)
 && lookup_attribute ("always_inline",
  DECL_ATTRIBUTES (callee)));
/* Do what the user says.  */
if (always_inline)
  return true;

return default_target_can_inline_p (caller, callee);


The default implementation of can_inline_p worked fine for avr.
As far as I understand, the new behavior is due to clean-up
of global states for options?

I think the last change was r8-2658-g9b25e12d2d940a which
for targets without target attribute support made it more likely
to run into the default hook actually comparing the options.
Previously the "default" was oddly special-cased but you
could have still run into compares with two different set of
defaults when there's another "default" default.  Say, compile
with -O2 and have one optimize(0) and one optimize(Os)
function it would compare the optimize(0) and optimize(Os)
set if they were distinct from the -O2 set.  That probably never
happened for AVR.

So I need to take into account inlining costs and decide on that
whether it's preferred to inline a function or not?

No, the hook isn't about cost, it's about full incompatibility.  So
if the different -m options that could be in effect for AVR in
a single TU for different functions never should prevent inlining
then simply make the hook return true.  If there's a specific
option (that can differ from what specified on the compiler
command line!) that should, then you should compare the
setting of that option from the DECL_FUNCTION_SPECIFIC_TARGET
of the caller and the callee.
But as far as I can see simply returning true should be correct
for AVR, or like your patch handle interrupts differently (though
the -Winline diagnostic will tell the user there's a mismatch in
target options which might be confusing).


Ok, simply "true" sounds reasonable.  Is that change ok then?


Yes.

Richard


Committed as https://gcc.gnu.org/r14-1245

Johann


--- a/gcc/config/avr/avr.cc
+++

[COMMITTED] i386: Use 2x-wider modes when emulating QImode vector instructions

2023-05-25 Thread Uros Bizjak via Gcc-patches

Rewrite ix86_expand_vecop_qihi2 to expand fo 2x-wider (e.g. V16QI -> V16HImode)
instructions when available.  Currently, the compiler generates following
assembly for V16QImode multiplication (-mavx2):

vpunpcklbw  %xmm0, %xmm0, %xmm3
vpunpcklbw  %xmm1, %xmm1, %xmm2
vpunpckhbw  %xmm0, %xmm0, %xmm0
movl$255, %eax
vpunpckhbw  %xmm1, %xmm1, %xmm1
vpmullw %xmm3, %xmm2, %xmm2
vmovd   %eax, %xmm3
vpmullw %xmm0, %xmm1, %xmm1
vpbroadcastw%xmm3, %xmm3
vpand   %xmm2, %xmm3, %xmm0
vpand   %xmm1, %xmm3, %xmm3
vpackuswb   %xmm3, %xmm0, %xmm0

and only with -mavx512bw -mavx512vl generates:

vpmovzxbw   %xmm1, %ymm1
vpmovzxbw   %xmm0, %ymm0
vpmullw %ymm1, %ymm0, %ymm0
vpmovwb %ymm0, %xmm0

Patched compiler generates more optimized code involving multiplication
in 2x-wider mode in cases where missing truncate instruction has to be
emulated with a permutation (-mavx2):

vpmovzxbw   %xmm0, %ymm0
vpmovzxbw   %xmm1, %ymm1
movl$255, %eax
vpmullw %ymm1, %ymm0, %ymm1
vmovd   %eax, %xmm0
vpbroadcastw%xmm0, %ymm0
vpand   %ymm1, %ymm0, %ymm0
vpackuswb   %ymm0, %ymm0, %ymm0
vpermq  $216, %ymm0, %ymm0

The patch also adjusts cost calculation of V*QImode emulations to account
for generation of 2x-wider mode instructions.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Rewrite to expand to 2x-wider (e.g. V16QI -> V16HImode)
instructions when available.  Emulate truncation via
ix86_expand_vec_perm_const_1 when native truncate insn
is not available.
(ix86_expand_vecop_qihi_partial) : Use pmovzx
when available.  Trivially rename some variables.
(ix86_expand_vecop_qihi): Unconditionally call ix86_expand_vecop_qihi2.
* config/i386/i386.cc (ix86_multiplication_cost): Rewrite cost
calculation of V*QImode emulations to account for generation of
2x-wider mode instructions.
(ix86_shift_rotate_cost): Update cost calculation of V*QImode
emulations to account for generation of 2x-wider mode instructions.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512vl-pr95488-1.c: Revert 2023-05-18 change.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 5a57be82e98..0d8953b8c75 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -23106,68 +23106,6 @@ ix86_expand_vec_interleave (rtx targ, rtx op0, rtx 
op1, bool high_p)
   gcc_assert (ok);
 }
 
-/* This function is similar as ix86_expand_vecop_qihi,
-   but optimized under AVX512BW by using vpmovwb.
-   For example, optimize vector MUL generation like
-
-   vpmovzxbw ymm2, xmm0
-   vpmovzxbw ymm3, xmm1
-   vpmullw   ymm4, ymm2, ymm3
-   vpmovwb   xmm0, ymm4
-
-   it would take less instructions than ix86_expand_vecop_qihi.
-   Return true if success.  */
-
-static bool
-ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, rtx op1, rtx op2)
-{
-  machine_mode himode, qimode = GET_MODE (dest);
-  rtx hop1, hop2, hdest;
-  rtx (*gen_truncate)(rtx, rtx);
-  bool uns_p = (code == ASHIFTRT) ? false : true;
-
-  /* There are no V64HImode instructions.  */
-  if (qimode == V64QImode)
-return false;
-
-  /* vpmovwb only available under AVX512BW.  */
-  if (!TARGET_AVX512BW)
-return false;
-
-  if (qimode == V16QImode && !TARGET_AVX512VL)
-return false;
-
-  /* Do not generate ymm/zmm instructions when
- target prefers 128/256 bit vector width.  */
-  if ((qimode == V16QImode && TARGET_PREFER_AVX128)
-  || (qimode == V32QImode && TARGET_PREFER_AVX256))
-return false;
-
-  switch (qimode)
-{
-case E_V16QImode:
-  himode = V16HImode;
-  gen_truncate = gen_truncv16hiv16qi2;
-  break;
-case E_V32QImode:
-  himode = V32HImode;
-  gen_truncate = gen_truncv32hiv32qi2;
-  break;
-default:
-  gcc_unreachable ();
-}
-
-  hop1 = gen_reg_rtx (himode);
-  hop2 = gen_reg_rtx (himode);
-  hdest = gen_reg_rtx (himode);
-  emit_insn (gen_extend_insn (hop1, op1, himode, qimode, uns_p));
-  emit_insn (gen_extend_insn (hop2, op2, himode, qimode, uns_p));
-  emit_insn (gen_rtx_SET (hdest, simplify_gen_binary (code, himode,
- hop1, hop2)));
-  emit_insn (gen_truncate (dest, hdest));
-  return true;
-}
-
 /* Expand a vector operation shift by constant for a V*QImode in terms of the
same operation on V*HImode. Return true if success. */
 static bool
@@ -23272,9 +23210,9 @@ void
 ix86_expand_vecop_qihi_partial (enum rtx_code code, rtx dest, rtx op1, rtx op2)
 {
   machine_mode qimode = GET_MODE (dest);
-  rtx qop1, qop2, hop1, hop2, qdest, hres;
+  rtx qop1, qop2, hop1, hop2, qdest, hdest;
   bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
-  bool uns_p = true;
+  bool uns_p = code != ASHIFTRT;
 
   switch (qimode)
 {
@@

[avr,committed] PR82931: Improve single-bit transfers between registers.

2023-05-25 Thread Georg-Johann Lay


Applied this patch that makes one insn more generic so it can handle
more bit positions than just 0.

Johann

--

target/82931: Make a pattern more generic to match more bit-transfers.

There is already a pattern in avr.md that matches single-bit transfers
from one register to another one, but it only handled bit 0 of 8-bit
registers.  This change makes that pattern more generic so it matches
more of similar single-bit transfers.

gcc/
PR target/82931
* config/avr/avr.md (*movbitqi.0): Rename to *movbit.0-6.
Handle any bit position and use mode QISI.
* config/avr/avr.cc (avr_rtx_costs_1) [IOR]: Return a cost
of 2 insns for bit-transfer of respective style.

gcc/testsuite/
PR target/82931
* gcc.target/avr/pr82931.c: New test.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 4fa6f5309b2..31706964eb1 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -10843,6 +10843,15 @@ avr_rtx_costs_1 (rtx x, machine_mode mode, int 
outer_code,

 *total += COSTS_N_INSNS (1);
   return true;
 }
+  if (IOR == code
+  && AND == GET_CODE (XEXP (x, 0))
+  && AND == GET_CODE (XEXP (x, 1))
+  && single_zero_operand (XEXP (XEXP (x, 0), 1), mode))
+{
+  // Open-coded bit transfer.
+  *total = COSTS_N_INSNS (2);
+  return true;
+}
   *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
   *total += avr_operand_rtx_cost (XEXP (x, 0), mode, code, 0, speed);
   if (!CONST_INT_P (XEXP (x, 1)))
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index a79c6824fad..371965938a6 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -9096,16 +9096,20 @@ (define_insn "*movbitqi.1-6.b"
   "bst %3,0\;bld %0,%4"
   [(set_attr "length" "2")])

-;; Move bit $3.0 into bit $0.0.
-;; For bit 0, combiner generates slightly different pattern.
-(define_insn "*movbitqi.0"
-  [(set (match_operand:QI 0 "register_operand" "=r")
-(ior:QI (and:QI (match_operand:QI 1 "register_operand"  "0")
-(match_operand:QI 2 "single_zero_operand"   "n"))
-(and:QI (match_operand:QI 3 "register_operand"  "r")
-(const_int 1]
-  "0 == exact_log2 (~INTVAL(operands[2]) & GET_MODE_MASK (QImode))"
-  "bst %3,0\;bld %0,0"
+;; Move bit $3.x into bit $0.x.
+(define_insn "*movbit.0-6"
+  [(set (match_operand:QISI 0 "register_operand" 
"=r")
+(ior:QISI (and:QISI (match_operand:QISI 1 "register_operand" 
"0")
+(match_operand:QISI 2 "single_zero_operand" 
"n"))
+  (and:QISI (match_operand:QISI 3 "register_operand" 
"r")
+(match_operand:QISI 4 "single_one_operand" 
"n"]

+  "GET_MODE_MASK(mode)
+   == (GET_MODE_MASK(mode) & (INTVAL(operands[2]) ^ 
INTVAL(operands[4])))"

+  {
+auto bitmask = GET_MODE_MASK (mode) & UINTVAL (operands[4]);
+operands[4] = GEN_INT (exact_log2 (bitmask));
+return "bst %T3%T4" CR_TAB "bld %T0%T4";
+  }
   [(set_attr "length" "2")])

 ;; Move bit $2.0 into bit $0.7.
diff --git a/gcc/testsuite/gcc.target/avr/pr82931.c 
b/gcc/testsuite/gcc.target/avr/pr82931.c

new file mode 100644
index 000..477284fa127
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/pr82931.c
@@ -0,0 +1,29 @@
+/* { dg-options "-Os" } */
+/* { dg-final { scan-assembler-times "bst" 4 } } */
+/* { dg-final { scan-assembler-times "bld" 4 } } */
+
+typedef __UINT8_TYPE__ uint8_t;
+typedef __UINT16_TYPE__ uint16_t;
+
+#define BitMask (1u << 14)
+#define Bit8Mask ((uint8_t) (1u << 4))
+
+void merge1_8 (uint8_t *dst, const uint8_t *src)
+{
+*dst = (*src & Bit8Mask) | (*dst & ~ Bit8Mask);
+}
+
+void merge2_8 (uint8_t *dst, const uint8_t *src)
+{
+*dst ^= (*dst ^ *src) & Bit8Mask;
+}
+
+void merge1_16 (uint16_t *dst, const uint16_t *src)
+{
+*dst = (*src & BitMask) | (*dst & ~ BitMask);
+}
+
+void merge2_16 (uint16_t *dst, const uint16_t *src)
+{
+*dst ^= (*dst ^ *src) & BitMask;
+}

Re: [i386 PATCH] A minor code clean-up: Use NULL_RTX instead of nullptr

2023-05-25 Thread Bernhard Reutner-Fischer via Gcc-patches

On Wed, 24 May 2023 18:54:06 +0100
"Roger Sayle"  wrote:

> My understanding is that GCC's preferred null value for rtx is NULL_RTX
> (and for tree is NULL_TREE), and by being typed allows strict type checking,
> and use with function polymorphism and template instantiation.
> C++'s nullptr is preferred over NULL and 0 for pointer types that don't
> have a defined null of the correct type.
> 
> This minor clean-up uses NULL_RTX consistently in i386-expand.cc.

Oh. Well, i can't resist cleanups :)

Given
$ cat /tmp/inp0.c ; echo EOF
rtx myfunc (int i, int j)
{
  rtx ret;
  if (i)
return NULL;
  if (j)
   ret = NULL;
  if (ret == NULL) {
ret = NULL_RTX;
  }
  if (!ret)
return (rtx)2;
  return NULL_RTX;
}
EOF
$ spatch --c++=11 --smpl-spacing --in-place --sp-file 
~/coccinelle/rtx-null.0.cocci /tmp/inp0.c
init_defs_builtins: /usr/bin/../lib/coccinelle/standard.h
--- /tmp/inp0.c
+++ /tmp/cocci-output-76891-58af4a-inp0.c
@@ -2,10 +2,10 @@ rtx myfunc (int i, int j)
 {
   rtx ret;
   if (i)
-return NULL;
+return NULL_RTX;
   if (j)
-   ret = NULL;
-  if (ret == NULL) {
+   ret = NULL_RTX;
+  if (ret == NULL_RTX) {
 ret = NULL_RTX;
   }
   if (!ret)
HANDLING: /tmp/inp0.c
diff = 

So you if you would feel like, someone could
find ./ \( -name "testsuite" -o -name "contrib" -o -name "examples" -o -name 
".git" -o -name "zlib" -o -name "intl" \) -prune -o \( -name "*.[chpx]*" -a 
-type f \) -exec spatch --c++=11 --smpl-spacing --in-place $opts --sp-file 
~/coccinelle/rtx-null.0.cocci {} \;
with the attached rtx-null coccinelle script.
(and handle nullptr too, and the same game for tree)

Just a thought..


rtx-null.0.cocci
Description: Binary data

[PATCH] btf: improve -dA comments for testsuite

2023-05-25 Thread David Faust via Gcc-patches

Many BTF type kinds refer to other types via index to the final types
list. However, the order of the final types list is not guaranteed to
remain the same for the same source program between different runs of
the compiler, making it difficult to test inter-type references.

This patch updates the assembler comments output when writing a
given BTF record to include minimal information about the referenced
type, if any. This allows for the regular expressions used in the gcc
testsuite to do some basic integrity checks on inter-type references.

For example, for the type

unsigned int *

Assembly comments like the following are written with -dA:

.4byte  0   ; TYPE 2 BTF_KIND_PTR ''
.4byte  0x200   ; btt_info: kind=2, kflag=0, vlen=0
.4byte  0x1 ; btt_type: (BTF_KIND_INT 'unsigned int')

Several BTF tests which can immediately be made more robust with this
change are updated. It will also be useful in new tests for the upcoming
btf_type_tag support.

Tested on BPF and x86_64, no known regressions.
OK for trunk?

Thanks.

gcc/

* btfout.cc (btf_kind_names): New.
(btf_kind_name): New.
(btf_absolute_var_id): New utility function.
(btf_relative_var_id): Likewise.
(btf_relative_func_id): Likewise.
(btf_absolute_datasec_id): Likewise.
(btf_asm_type_ref): New.
(btf_asm_type): Update asm comments and use btf_asm_type_ref ().
(btf_asm_array): Likewise. Accept ctf_container_ref parameter.
(btf_asm_varent): Likewise.
(btf_asm_func_arg): Likewise.
(btf_asm_datasec_entry): Likewise.
(btf_asm_datasec_type): Likewise.
(btf_asm_func_type): Likewise. Add index parameter.
(btf_asm_sou_member): Likewise.
(output_btf_vars): Update btf_asm_* call accordingly.
(output_asm_btf_sou_fields): Likewise.
(output_asm_btf_func_args_list): Likewise.
(output_asm_btf_vlen_bytes): Likewise.
(output_btf_func_types): Add ctf_container_ref parameter.
Pass it to btf_asm_func_type.
(output_btf_datasec_types): Update btf_asm_datsec_type call similarly.
(btf_output): Update output_btf_func_types call similarly.

gcc/testsuite/

* gcc.dg/debug/btf/btf-array-1.c: Use new BTF asm comments
in scan-assembler expressions where useful.
* gcc.dg/debug/btf/btf-anonymous-struct-1.c: Likewise.
* gcc.dg/debug/btf/btf-anonymous-union-1.c: Likewise.
* gcc.dg/debug/btf/btf-bitfields-2.c: Likewise.
* gcc.dg/debug/btf/btf-bitfields-3.c: Likewise.
* gcc.dg/debug/btf/btf-function-6.c: Likewise.
* gcc.dg/debug/btf/btf-pointers-1.c: Likewise.
* gcc.dg/debug/btf/btf-struct-1.c: Likewise.
* gcc.dg/debug/btf/btf-struct-2.c: Likewise.
* gcc.dg/debug/btf/btf-typedef-1.c: Likewise.
* gcc.dg/debug/btf/btf-union-1.c: Likewise.
* gcc.dg/debug/btf/btf-variables-1.c: Likewise.
* gcc.dg/debug/btf/btf-variables-2.c: Likewise. Update outdated comment.
* gcc.dg/debug/btf/btf-function-3.c: Update outdated comment.
---
 gcc/btfout.cc | 220 ++
 .../gcc.dg/debug/btf/btf-anonymous-struct-1.c |   3 +-
 .../gcc.dg/debug/btf/btf-anonymous-union-1.c  |   4 +-
 gcc/testsuite/gcc.dg/debug/btf/btf-array-1.c  |   3 +
 .../gcc.dg/debug/btf/btf-bitfields-2.c|   2 +-
 .../gcc.dg/debug/btf/btf-bitfields-3.c|   2 +-
 .../gcc.dg/debug/btf/btf-function-3.c |   2 +-
 .../gcc.dg/debug/btf/btf-function-6.c |   4 +-
 .../gcc.dg/debug/btf/btf-pointers-1.c |   3 +
 gcc/testsuite/gcc.dg/debug/btf/btf-struct-1.c |   4 +-
 gcc/testsuite/gcc.dg/debug/btf/btf-struct-2.c |   2 +-
 .../gcc.dg/debug/btf/btf-typedef-1.c  |  14 +-
 gcc/testsuite/gcc.dg/debug/btf/btf-union-1.c  |   2 +-
 .../gcc.dg/debug/btf/btf-variables-1.c|   6 +
 .../gcc.dg/debug/btf/btf-variables-2.c|   7 +-
 15 files changed, 215 insertions(+), 63 deletions(-)

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 497c1ca06e6..8960acfbbaa 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -114,6 +114,23 @@ static unsigned int num_types_added = 0;
CTF types.  */
 static unsigned int num_types_created = 0;
 
+/* Name strings for BTF kinds.
+   Note: the indices here must match the type defines in btf.h.  */
+static const char *const btf_kind_names[] =
+  {
+"UNKN", "INT", "PTR", "ARRAY", "STRUCT", "UNION", "ENUM", "FWD",
+"TYPEDEF", "VOLATILE", "CONST", "RESTRICT", "FUNC", "FUNC_PROTO",
+"VAR", "DATASEC", "FLOAT", "DECL_TAG", "TYPE_TAG", "ENUM64"
+  };
+
+/* Return a name string for the given BTF_KIND.  */
+
+static const char *
+btf_kind_name (uint32_t btf_kind)
+{
+  return btf_kind_names[btf_kind];
+}
+
 /* Map a CTF type kind to the corresponding BTF type kind.  */
 
 static uint32_t
@@ -141,6 +158,57 @@ get_btf_kind (uint32_t ctf_kind)
   return BTF_KIND_UNKN;
 }
 
+/* Helper

[V1][PATCH 3/3] Use the element_count attribute information in bound sanitizer[PR108896]

2023-05-25 Thread Qing Zhao via Gcc-patches

2023-05-17 Qing Zhao 

gcc/c-family/ChangeLog:

PR C/108896
* c-ubsan.cc (ubsan_instrument_bounds): Use element_count attribute
information.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/ubsan/flex-array-element-count-bounds.c: New test.
---
 gcc/c-family/c-ubsan.cc   | 16 +++
 .../ubsan/flex-array-element-count-bounds.c   | 46 +++
 2 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-element-count-bounds.c

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index cfb7cbf389c..04eb05b2c24 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -362,6 +362,10 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
 {
   tree type = TREE_TYPE (array);
   tree domain = TYPE_DOMAIN (type);
+  /* whether the array ref is a flexible array member with valid element_count
+ attribute.  */
+  bool fam_has_count_attr = false;
+  tree element_count = NULL_TREE;
 
   if (domain == NULL_TREE)
 return NULL_TREE;
@@ -375,6 +379,17 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
  && COMPLETE_TYPE_P (type)
  && integer_zerop (TYPE_SIZE (type)))
bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1);
+  /* If the array ref is to flexible array member field which has
+element_count attribute.  We can use the information from the
+attribute as the bound to instrument the reference.  */
+  else if ((element_count = component_ref_get_element_count (array))
+   != NULL_TREE)
+   {
+ fam_has_count_attr = true;
+ bound = fold_build2 (MINUS_EXPR, TREE_TYPE (element_count),
+  element_count,
+  build_int_cst (TREE_TYPE (element_count), 1));
+   }
   else
return NULL_TREE;
 }
@@ -387,6 +402,7 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
  -fsanitize=bounds-strict.  */
   tree base = get_base_address (array);
   if (!sanitize_flags_p (SANITIZE_BOUNDS_STRICT)
+  && !fam_has_count_attr
   && TREE_CODE (array) == COMPONENT_REF
   && base && (INDIRECT_REF_P (base) || TREE_CODE (base) == MEM_REF))
 {
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-element-count-bounds.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-element-count-bounds.c
new file mode 100644
index 000..be5ee352144
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-element-count-bounds.c
@@ -0,0 +1,46 @@
+/* test the attribute element_count and its usage in
+   bounds sanitizer.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+
+#include 
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((element_count ("b")));
+} *array_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int 
annotated_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + annotated_count *  sizeof (int));
+  array_annotated->b = annotated_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test (int normal_index, int annotated_index)
+{
+  array_flex->c[normal_index] = 1;
+  array_annotated->c[annotated_index] = 2;
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);   
+  test (10,10);
+  return 0;
+}
+
+/* { dg-output "36:21: runtime error: index 10 out of bounds for type" } */
-- 
2.31.1

[V1][PATCH 2/3] Use the element_count atribute info in builtin object size [PR108896].

2023-05-25 Thread Qing Zhao via Gcc-patches

2023-05-17 Qing Zhao 

gcc/ChangeLog:

PR C/108896
* tree-object-size.cc (addr_object_size): Use the element_count
attribute info.
* tree.cc (component_ref_has_element_count_p): New function.
(component_ref_get_element_count): New function.
* tree.h (component_ref_has_element_count_p): New prototype.
(component_ref_get_element_count): New prototype.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-element-count-2.c: New test.
---
 .../gcc.dg/flex-array-element-count-2.c   | 56 +++
 gcc/tree-object-size.cc   | 37 ++--
 gcc/tree.cc   | 93 +++
 gcc/tree.h| 10 ++
 4 files changed, 189 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-element-count-2.c

diff --git a/gcc/testsuite/gcc.dg/flex-array-element-count-2.c 
b/gcc/testsuite/gcc.dg/flex-array-element-count-2.c
new file mode 100644
index 000..5a280e8c731
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-element-count-2.c
@@ -0,0 +1,56 @@
+/* test the attribute element_count and its usage in
+ * __builtin_dynamic_object_size.  */ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+#define expect(p, _v) do { \
+size_t v = _v; \
+if (p == v) \
+   __builtin_printf ("ok:  %s == %zd\n", #p, p); \
+else \
+   {  \
+ __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
+ FAIL (); \
+   } \
+} while (0);
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((element_count ("b")));
+} *array_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
+  + normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+   + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+expect(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+expect(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);   
+  test ();
+  DONE ();
+}
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 9a936a91983..f9aadd59054 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -585,6 +585,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
   if (pt_var != TREE_OPERAND (ptr, 0))
 {
   tree var;
+  tree element_count_ref = NULL_TREE;
 
   if (object_size_type & OST_SUBOBJECT)
{
@@ -600,11 +601,12 @@ addr_object_size (struct object_size_info *osi, 
const_tree ptr,
var = TREE_OPERAND (var, 0);
  if (var != pt_var && TREE_CODE (var) == ARRAY_REF)
var = TREE_OPERAND (var, 0);
- if (! TYPE_SIZE_UNIT (TREE_TYPE (var))
+ if (! component_ref_has_element_count_p (var)
+&& ((! TYPE_SIZE_UNIT (TREE_TYPE (var))
  || ! tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (var)))
  || (pt_var_size && TREE_CODE (pt_var_size) == INTEGER_CST
  && tree_int_cst_lt (pt_var_size,
- TYPE_SIZE_UNIT (TREE_TYPE (var)
+ TYPE_SIZE_UNIT (TREE_TYPE (var)))
var = pt_var;
  else if (var != pt_var && TREE_CODE (pt_var) == MEM_REF)
{
@@ -612,6 +614,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
  /* For >fld, compute object size if fld isn't a flexible array
 member.  */
  bool is_flexible_array_mem_ref = false;
+
  while (v && v != pt_var)
switch (TREE_CODE (v))
  {
@@ -639,6 +642,8 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
break;
  }
is_flexible_array_mem_ref = array_ref_flexible_size_p (v);
+   element_count_ref = component_ref_get_element_count (v);
+
while (v != pt_var && TREE_CODE (v) == COMPONENT_REF)
  if (TREE_CODE (TREE_TYPE (TREE_OPERAND (v, 0)))
  != UNION_TYPE
@@ -652,8 +657,11 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
   == RECORD_TYPE)
  {
/* compute object size only if v is not a
-  flexible array member.  */
-   if (!is_flexible_array_mem_ref)
+  flexible array

[V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-05-25 Thread Qing Zhao via Gcc-patches

'element_count ("COUNT")'
 The 'element_count' attribute may be attached to the flexible array
 member of a structure.  It indicates that the number of the
 elements of the array is given by the field named "COUNT" in the
 same structure as the flexible array member.  GCC uses this
 information to improve the results of
 '__builtin_dynamic_object_size' and array bound sanitizer.

 For instance, the following declaration:

  struct P {
size_t count;
int array[] __attribute__ ((element_count ("count")));
  };

 specify that 'array' is a flexible array member whose number of
 element is given by the field "'count'" in the same structure.

The number of elements information provided by this attribute can be
used by __builtin_dynamic_object_size and array bound sanitizer to detect
out-of-bound errors for flexible array member references.

2023-05-17 Qing Zhao 

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_element_count_attribute): New function.
* c-common.cc (c_flexible_array_member_type_p): To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_element_count_attribute): New function.
(finish_struct): Use renamed function and verify element count
attribute.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute element_count.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-element-count.c: New test.
---
 gcc/c-family/c-attribs.cc | 51 
 gcc/c-family/c-common.cc  | 13 
 gcc/c-family/c-common.h   |  1 +
 gcc/c/c-decl.cc   | 61 ++-
 gcc/doc/extend.texi   | 21 +++
 .../gcc.dg/flex-array-element-count.c | 27 
 6 files changed, 158 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-element-count.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 072cfb69147..d45d11077c3 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -103,6 +103,8 @@ static tree handle_warn_if_not_aligned_attribute (tree *, 
tree, tree,
  int, bool *);
 static tree handle_strict_flex_array_attribute (tree *, tree, tree,
 int, bool *);
+static tree handle_element_count_attribute (tree *, tree, tree,
+  int, bool *);
 static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *);
@@ -373,6 +375,8 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_warn_if_not_aligned_attribute, NULL },
   { "strict_flex_array",  1, 1, true, false, false, false,
  handle_strict_flex_array_attribute, NULL },
+  { "element_count", 1, 1, true, false, false, false,
+ handle_element_count_attribute, NULL },
   { "weak",   0, 0, true,  false, false, false,
  handle_weak_attribute, NULL },
   { "noplt",   0, 0, true,  false, false, false,
@@ -2555,6 +2559,53 @@ handle_strict_flex_array_attribute (tree *node, tree 
name,
   return NULL_TREE;
 }
 
+/* Handle a "element_count" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_element_count_attribute (tree *node, tree name,
+   tree args, int ARG_UNUSED (flags),
+   bool *no_add_attrs)
+{
+  tree decl = *node;
+  tree argval = TREE_VALUE (args);
+
+  /* This attribute only applies to field decls of a structure.  */
+  if (TREE_CODE (decl) != FIELD_DECL)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute may not be specified for non-field"
+   " declaration %q+D", name, decl);
+  *no_add_attrs = true;
+}
+  /* This attribute only applies to field with array type.  */
+  else if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute may not be specified for a non-array field",
+   name);
+  *no_add_attrs = true;
+}
+  /* This attribute only applies to a C99 flexible array member type.  */
+  else if (! c_flexible_array_member_type_p (TREE_TYPE (decl)))
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute

[V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-05-25 Thread Qing Zhao via Gcc-patches

Hi,

This patch set introduces a new attribute "element_count" to annotate bounds 
for C99 flexible array member.

A gcc bugzilla PR108896 has been created to record this task:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896

A nice writeup "Bounded Flexible Arrays in C" 
https://people.kernel.org/kees/bounded-flexible-arrays-in-c.
written by Kees Cook, from Kernel Self-Protection Project, provides a solid
background and motivation of this new attribute:

"With flexible arrays now a first-class citizen in Linux and the compilers,
it becomes possible to extend their available diagnostics.  What the compiler
is missing is knowledge of how the length of a given flexible array is tracked.
For well-described flexible array structs, this means associating the member 
holding the element count with the flexible array member. This idea is not new,
though prior implementation 
(https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2660.pdf)
proposals have wanted to make changes to the C language syntax. A simpler
approach is the addition of struct member attributes, and is under discussion
 and early development by both the GCC and Clang developer communities."

The basic idea is to annotate the flexible array member with a new attribute
 "element_count" to track its number of elements to another field in the same
 structure, for example:

struct object {
..
 size_t count;  /* carries the number of elements info for the FAM flex.  */
 int flex[]; 
};

will become:

struct object {
..
 size_t count:  /* carries the number of elements info for the FAM flex.  */
 int flex[] __attribute__((element_count ("count")));
};

GCC will pass the number of elements info from the attached attribute to both 
__builtin_dynamic_object_size and bounds sanitizer to check the out-of-bounds
or dynamic object size issues during runtime for flexible array members.

This new feature will provide nice protection to flexible array members (which
currently are completely ignored by both __builtin_dynamic_object_size and
bounds sanitizers).

Possible future additions to this initial work include supporting counts from
a variable outside the structure, or a field in the outer structure if needed.  

If the GCC extension works well, this feature might be promoted into new C
 standard in the future.

Clang has a similar initial implemenation which is under review:

https://reviews.llvm.org/D148381

Linux kernel also has a patch to use this new feature:

https://lore.kernel.org/lkml/20230504211827.GA1666363@dev-arch.thelio-3990X/T/

The patch set include 3 patches:

1/3: Provide element_count attribute to flexible array member field (PR108896)
2/3: Use the element_count atribute info in builtin object size [PR108896].
3/3: Use the element_count attribute information in bound sanitizer[PR108896]

bootstrapped and regression tested on aarch64 and x86.

Let me know if you have any comment or suggestion.

Thanks.

Qing

Re: [PATCH v2] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-05-25 Thread Fangrui Song via Gcc-patches


On 2023-05-25, Jan Beulich wrote:

On 25.05.2023 17:16, Fangrui Song wrote:

--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -32942,9 +32942,10 @@ the cache line size.  @samp{compat} is the default.

 @opindex mlarge-data-threshold
 @item -mlarge-data-threshold=@var{threshold}
-When @option{-mcmodel=medium} is specified, data objects larger than
-@var{threshold} are placed in the large data section.  This value must be the
-same across all objects linked into the binary, and defaults to 65535.
+When @option{-mcmodel=medium} or @option{-mcmodel=large} is specified, data
+objects larger than @var{threshold} are placed in large data sections.  This
+value must be the same across all objects linked into the binary, and defaults
+to 65535.


Where's the "must be the same" requirement coming from?


It's an existing requirement.  I think it may be related to discouraging
different COMDAT sections names due to different -mlarge-data-threshold=.
I don't think it makes sense but did not feel strongly dropping it.

Happy to drop the requirement if I revise this patch.


As to the default - to remain compatible with earlier versions, shouldn't
large model code default to "infinity"?

Jan


I have thought about this compatibility need and feel that it is very
unlikly to be needed.  GNU ld has supported large data sections since
2005
(https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=3b22753a67cf616514de804ef6d5ed5e90a7d883).
Users' programs with the internal linker scripts will still be working
and -fdata-sections sections will be combined.

First, -mcmodel=large use cases are rare enough.  Rare perhaps
-mcmodel=largel was considered theoretic excercise  in
trying to reach feature completion
(https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU/m/NNuA0P7pAQAJ),
without this patch -mcmodel=large object files don't interract well with
existing -mcmodel=small object files.
Moreover, if a user expects a specific section prefix with
-mcmodel=large, that's a brittle assumption. I think it's fair to say
that the fault is on the user side and GCC doesn't need to work around
their issues.

Re: [PATCH] RISC-V: Add the option "-mdisable-multilib-check" to avoid multilib checks breaking the compilation.

2023-05-25 Thread Kito Cheng via Gcc-patches

> When testing a extension, it is often necessary for a certain program not to
> need some kind of extension, such as the bitmanip extension, to evaluate the
> performance or codesize of the extension. However, the current multilib rules
> will report an error when it is not a superset of the MULTILIB_REQUIRED list,
> which will cause the program to be unable to link normally, thus failing to
> achieve the expected purpose.
>
> Therefore, the compilation option is added to avoid riscv_multi_lib_check()
> interruption of compilation.

I think it's dangerous to remove the check, but I can understand there
are some cases where we really do not want the `security lock`,
so I am OK with introducing a new option to disable that.
but plz add documentation to the option into gcc/doc/invoke.texi
and mention it will just use default multilib and MIGHT not be correct
in some cases.

> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -295,3 +295,7 @@ Enum(riscv_autovec_lmul) String(m8) Value(RVV_M8)
>  -param=riscv-autovec-lmul=
>  Target RejectNegative Joined Enum(riscv_autovec_lmul) 
> Var(riscv_autovec_lmul) Init(RVV_M1)
>  -param=riscv-autovec-lmul= Set the RVV LMUL of 
> auto-vectorization in the RISC-V port.
> +
> +mdisable-multilib-check

I would like to rename that to mno-multilib-check since that is more
like all other gcc option conventions.

> +Target Bool Var(riscv_disable_multilib_check) Init(0)

You don't need Var(riscv_disable_multilib_check) and Init(0) since no
one check the value
And you need RejectNegative to prevent gcc accept -mno-disable-multilib-check

> +Disable multilib checking by riscv_multi_lib_check().

`riscv_multi_lib_check()` is meanless for users since it's
implementation detail.
`Disable multilib checking; use the default multilib if a compatible
one is not found.`

Re: [PATCH] stor-layout, aarch64: Express SRA intrinsics with RTL codes

2023-05-25 Thread Richard Sandiford via Gcc-patches

Kyrylo Tkachov via Gcc-patches  writes:
> Hi all,
>
> This patch expresses the intrinsics for the SRA and RSRA instructions with
> standard RTL codes rather than relying on UNSPECs.
> These instructions perform a vector shift right plus accumulate with an
> optional rounding constant addition for the RSRA variant.
> There are a number of interesting points:
>
> * The scalar-in-SIMD-registers variant for DImode SRA e.g. ssra d0, d1, #N
> is left using the UNSPECs. Expressing it as a DImode plus+shift led to all
> kinds of trouble as it started matching the existing define_insns for
> "add x0, x0, asr #N" instructions and adding the SRA form as an extra
> alternative required a significant amount of deduplication of iterators and
> things still didn't work out well. I decided not to tackle that case in
> this patch. It can be attempted later.
>
> * For the RSRA variants that add a rounding constant (1 << (shift-1)) the
> addition is notionally performed in a wider mode than the input types so that
> overflow is handled properly. In RTL this can be represented with an 
> appropriate
> extend operation followed by a truncate back to the original modes.
> However for 128-bit input modes such as V4SI we don't have appropriate modes
> defined for this widening i.e. we'd need a V4DI mode to represent the
> intermediate widened result.  This patch defines such modes for
> V16HI,V8SI,V4DI,V2TI. These will come handy in the future too as we have
> more Advanced SIMD instruction that have similar intermediate widening
> semantics.
>
> * The above new modes led to a problem with stor-layout.cc. The new modes only
> exist for the sake of the RTL optimisers understanding the semantics of the
> instruction but are not indended to be moved to and from register or memory,
> assigned to types, used as TYPE_MODE or participate in auto-vectorisation.
> This is expressed in aarch64 by aarch64_classify_vector_mode returning zero
> for these new modes. However, the code in stor-layout.cc:
> explicitly doesn't check this when picking a TYPE_MODE due to modes being made
> potentially available later through target switching (PR38240).
> This led to these modes being picked as TYPE_MODE for declarations such as:
> typedef int16_t vnx8hi __attribute__((vector_size (32))) when 256-bit
> fixed-length SVE modes are available and vector_type_mode later struggling
> to rectify this.
> This issue is addressed with the new target hook
> TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P that is intended to check if a
> vector mode can be used in any legal target attribute configuration of the
> port, as opposed to the existing TARGET_VECTOR_MODE_SUPPORTED_P that checks
> only the initial target configuration. This allows a simple adjustment in
> stor-layout.cc that still disqualifies these limited modes early on while
> allowing consideration of modes that can be turned on in the future with
> target attributes.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for the non-aarch64 parts?

Yes, thanks.

Since we'd discussed this approach off-list, I wanted to leave a gap in
case others objected to it.  But I guess they would have spoken up by
now if so.

Richard

[V8][PATCH 2/2] Update documentation to clarify a GCC extension [PR77650]

2023-05-25 Thread Qing Zhao via Gcc-patches

on a structure with a C99 flexible array member being nested in
another structure.

"The GCC extension accepts a structure containing an ISO C99 "flexible array
member", or a union containing such a structure (possibly recursively)
to be a member of a structure.

 There are two situations:

   * A structure containing a C99 flexible array member, or a union
 containing such a structure, is the last field of another structure,
 for example:

  struct flex  { int length; char data[]; };
  union union_flex { int others; struct flex f; };

  struct out_flex_struct { int m; struct flex flex_data; };
  struct out_flex_union { int n; union union_flex flex_data; };

 In the above, both 'out_flex_struct.flex_data.data[]' and
 'out_flex_union.flex_data.f.data[]' are considered as flexible
 arrays too.

   * A structure containing a C99 flexible array member, or a union
 containing such a structure, is not the last field of another structure,
 for example:

  struct flex  { int length; char data[]; };

  struct mid_flex { int m; struct flex flex_data; int n; };

 In the above, accessing a member of the array 'mid_flex.flex_data.data[]'
 might have undefined behavior.  Compilers do not handle such a case
 consistently, Any code relying on this case should be modified to ensure
 that flexible array members only end up at the ends of structures.

 Please use the warning option '-Wflex-array-member-not-at-end' to
 identify all such cases in the source code and modify them.  This
 warning will be on by default starting from GCC 15.
"

gcc/c-family/ChangeLog:

* c.opt: New option -Wflex-array-member-not-at-end.

gcc/c/ChangeLog:

* c-decl.cc (finish_struct): Issue warnings for new option.

gcc/ChangeLog:

* doc/extend.texi: Document GCC extension on a structure containing
a flexible array member to be a member of another structure.

gcc/testsuite/ChangeLog:

* gcc.dg/variable-sized-type-flex-array.c: New test.
---
 gcc/c-family/c.opt|  5 +++
 gcc/c/c-decl.cc   |  9 
 gcc/doc/extend.texi   | 44 ++-
 .../gcc.dg/variable-sized-type-flex-array.c   | 31 +
 4 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/variable-sized-type-flex-array.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index cddeece..c26d9801b63 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -737,6 +737,11 @@ Wformat-truncation=
 C ObjC C++ LTO ObjC++ Joined RejectNegative UInteger Var(warn_format_trunc) 
Warning LangEnabledBy(C ObjC C++ LTO ObjC++,Wformat=, warn_format >= 1, 0) 
IntegerRange(0, 2)
 Warn about calls to snprintf and similar functions that truncate output.
 
+Wflex-array-member-not-at-end
+C C++ Var(warn_flex_array_member_not_at_end) Warning
+Warn when a structure containing a C99 flexible array member as the last
+field is not at the end of another structure.
+
 Wif-not-aligned
 C ObjC C++ ObjC++ Var(warn_if_not_aligned) Init(1) Warning
 Warn when the field in a struct is not aligned.
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index e14f514cb6e..ecd10ebb69c 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9278,6 +9278,15 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
TYPE_INCLUDES_FLEXARRAY (t)
  = is_last_field && TYPE_INCLUDES_FLEXARRAY (TREE_TYPE (x));
 
+  if (warn_flex_array_member_not_at_end
+ && !is_last_field
+ && RECORD_OR_UNION_TYPE_P (TREE_TYPE (x))
+ && TYPE_INCLUDES_FLEXARRAY (TREE_TYPE (x)))
+   warning_at (DECL_SOURCE_LOCATION (x),
+   OPT_Wflex_array_member_not_at_end,
+   "structure containing a flexible array member"
+   " is not at the end of another structure");
+
   if (DECL_NAME (x)
  || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
saw_named_field = true;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index f9d13b495ad..17ef80e75cc 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1751,7 +1751,49 @@ Flexible array members may only appear as the last 
member of a
 A structure containing a flexible array member, or a union containing
 such a structure (possibly recursively), may not be a member of a
 structure or an element of an array.  (However, these uses are
-permitted by GCC as extensions.)
+permitted by GCC as extensions, see details below.)
+@end itemize
+
+The GCC extension accepts a structure containing an ISO C99 @dfn{flexible array
+member}, or a union containing such a structure (possibly recursively)
+to be a member of a structure.
+
+There are two situations:
+
+@itemize @bullet
+@item
+A structure containing a C99 flexible array member, or a union containing
+such a structure, is the last field of another structure, for example:
+

Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-25 Thread Carl Love via Gcc-patches

Peter, Kewen:

On Thu, 2023-05-25 at 13:28 +0800, Kewen.Lin wrote:
> on 2023/5/24 23:20, Carl Love wrote:
> > On Wed, 2023-05-24 at 13:32 +0800, Kewen.Lin wrote:
> > > on 2023/5/24 06:30, Peter Bergner wrote:
> > > > On 5/23/23 12:24 AM, Kewen.Lin wrote:
> > > > > on 2023/5/23 01:31, Carl Love wrote:
> > > > > > The builtins were requested for use in GLibC.  As of
> > > > > > version
> > > > > > 2.31 they
> > > > > > were added as inline asm.  They requested a builtin so the
> > > > > > asm
> > > > > > could be
> > > > > > removed.
> > > > > 
> > > > > So IMHO we also want the similar support for mffscrn, that is
> > > > > to
> > > > > make
> > > > > use of mffscrn and mffscrni on Power9 and later, but falls
> > > > > back
> > > > > to 
> > > > > __builtin_set_fpscr_rn + mffs similar on older platforms.
> > > > 
> > > > So __builtin_set_fpscr_rn everything we want (sets the RN bits)
> > > > and
> > > > uses mffscrn/mffscrni on P9 and later and uses older insns on
> > > > pre-
> > > > P9.
> > > > The only problem is we don't return the current FPSCR bits, as
> > > > the
> > > > bif
> > > > is defined to return void.
> > > 
> > > Yes.
> > > 
> > > > Crazy idea, but could we extend the built-in
> > > > with an overload that returns the FPSCR bits?  
> > > 
> > > So you agree that we should make this proposed new bif handle
> > > pre-P9
> > > just
> > > like some other existing bifs. :)  I think extending it is good
> > > and
> > > doable,
> > > but the only concern here is the bif name
> > > "__builtin_set_fpscr_rn",
> > > which
> > > matches the existing behavior (only set rounding) but doesn't
> > > match
> > > the
> > > proposed extending behavior (set rounding and get some env bits
> > > back).
> > > Maybe it's not a big deal if the documentation clarify it well.
> > 
> > Extending the builtin to pre Power 9 is straight forward and I
> > agree
> > would make good sense to do.
> > 
> > I am a bit concerned on how to extend __builtin_set_fpscr_rn to add
> > the
> > new functionality.  Peter suggests overloading the builtin to
> > either
> > return void or returns FPSCR bits.  It is my understanding that the
> > return value for a given builtin had to be the same, i.e. you can't
> > overload the return value. Maybe you can with Bill's new
> > infrastructure?  I recall having problems trying to overload the
> > return
> > value in the past and Bill said you couldn't do it.  I play with
> > this
> > and see if I can overload the return value.
> 
> Your understanding on that we fail to overload this for just
> different
> return types is correct.  But previously I interpreted the extending
> proposal as to extend
> 
>   void __builtin_set_fpscr_rn (int);
> 
> to 
> 
>   void __builtin_set_fpscr_rn (int, double*);
> 
> The related address taken and store here can be optimized out
> normally.

I don't think that is correct.   The current definition of the builtin
is:

 void __builtin_set_fpscr_rn (int);

The proposal by Peter was to change the return type to double, i.e.

 double __builtin_set_fpscr_rn (int);

Peter also said the following:

   The built-in machinery can see that the usage is expecting a return
   value or not and for the pre-P9 code, can skip generating the ending
   mffs if we don't want the return value.

Which I don't think we want.  The mffscrn and mffscrni instructions
return the contents of the control bits in the FPSCR, that is, bits
29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), are placed
into the corresponding bits in register FRT. All other bits in register
FRT are set to 0.  

The instructions also updates the current RN field of the FPSCR with
the new RN supplied the second argument of the instruction.  So, the
instructions update the RN field just like the __builtin_set_fpscr_rn. 
So, we can use the existing __builtin_set_fpscr_rn to update the RN for
all ISAs, we just need to have __builtin_set_fpscr_rn always return a
double with the desired fields from the FPSCR (the current RN).  This
will then emulate the behavior of the mffscrn and mffscrni
instructions.  The current uses of __builtin_set_fpscr_rn will just
ignore the return value which is not a problem.  The return value can
be stored in the places were the user is currently using the inline asm
for the mffscrn and mffscrni instructions.

The __builtin_set_fpscr_rn builtin is currently using the mffscrn and
mffscrni on Power 9 and throwing away the result from the instruction. 
We just need to change __builtin_set_fpscr_rn to return the value
instead.  For the pre Power 9 code, the builtin will need to read the
full FPSCR, mask of the desired fields and return the fields.

So, there is no need for the builtin to have to determine if the user
is storing the result of the __builtin_set_fpscr_rn.  The RN bits will
always be updated by the __builtin_set_fpscr_rn builtin and the
existing fields of the FPSCR will always be returned by the builtin.

Please let me know if you agree.  I think I have this sorted out

Re: [PATCH] Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode.

2023-05-25 Thread Segher Boessenkool

On Thu, May 25, 2023 at 10:29:47AM -0400, Vladimir Makarov wrote:
> 
> On 5/17/23 02:57, liuhongt wrote:
> >r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost
> >calculation when the preferred register class are not known yet.
> >It regressed powerpc PR109610 and PR109858, it looks too aggressive to use
> >NO_REGS when mode can be allocated with GENERAL_REGS.
> >The patch takes a step back, still use GENERAL_REGS when
> >hard_regno_mode_ok for mode and GENERAL_REGS, otherwise uses NO_REGS.
> >Kewen confirmed the patch fixed PR109858, I vefiried it also fixed 
> >PR109610.
> >
> >Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> >No big performance impact for SPEC2017 on icelake server.
> >Ok for trunk?
> >
> >gcc/ChangeLog:
> >
> > * ira-costs.cc (scan_one_insn): Only use NO_REGS in cost
> > calculation when !hard_regno_mode_ok for GENERAL_REGS and
> > mode, otherwise still use GENERAL_REGS.
> 
> Thank you for the patch.  It looks good for me.  It is ok to commit it 
> into the trunk.

Thanks everyone involved for fixing this nasty regression!  Much
appreciated.


Segher

Re: [PATCH] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

2023-05-25 Thread Segher Boessenkool

Hi Alex,

On Thu, May 25, 2023 at 10:55:37AM -0300, Alexandre Oliva wrote:
> On May 25, 2023, Segher Boessenkool  wrote:
> > Fwiw, updating the insn counts blindly like this
> 
> ... is a claim that carries a wildly incorrect and insulting underlying
> assumption:

Sorry you feel that way.  I'm not even assuming anything :-(

> I've actually identified the corresponding change to the
> lp64 tests, compared the effects of the codegen changes, and concluded
> the tests needed this changing for ilp32 to keep on testing for the same
> thing after code changes brought about by changes that AFAICT had been
> well understood when making the lp64 adjustments.

But you didn't explain any of that (saying it is so is not the same
thing at all as explaining it!)

> > If it is not possible to keep these tests up-to-date easily
> 
> The counts have been stable for a couple of release cycles already.
> 
> The change that caused the codegen differences is identified and
> understood; the PR confirmed my findings, naming the root cause and the
> incomplete testsuite adjustment.

Oh, was this discussed in some PR?  The patch submission should have
carried the conclusions from the discussions there then :-)

> I suspect there may also be ABI-related assumptions implied by the 'add'
> counts, but I don't know enough about all the ppc variants to be sure.

The compiler can and will create all kinds of code for wildly unexpected
reasons.  "add" is dangerous to count already, but it is not as bad as
"addi" :-)

> Now, if your implied claim is correct that counting 'add/addi'
> instructions in these tests is fragile, dropping the checks for those
> would probably be best.

The same is true for almost all instructions.  You can only sanely count
instructions if either you count only unusual insns, or if you test only
*tiny* functions (say five insns, including the blr at the end!)

> But if ppc maintainers seem to have different
> opinions as to how to deal with the fallout of that one-time codegen
> change, it would be foolish for me to get pulled into the cross fire.

There is no crossfire.  I did not dis-approve the patch, just said this
is a high maintenance direction to proceed in.  There has been a lot of
that the last few years, we should improve on that.  It is not about
this patch (only).

> Here's the patch that corrects the long-broken counts, with the
> requested adjustments, retested with ppc- and ppc64-vx7r2.  Ok?

> Codegen changes caused add instruction count mismatches on
> ppc-*-linux-gnu and other 32-bit ppc targets.  At some point the
> expected counts were adjusted for lp64, but ilp32 differences
> remained, and published test results confirm it.

... and this is not something that can be confimed like this.  Just
spend a few minutes more to put *actual numbers* here, with some
indication this is good and correct codegen, so that it is bloody easy
for a reviewer to review and for a maintainer to approve!

>  /* -m32 target has an 'add' in place of one of the 'addi'. */
> -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } } 
> } */
> -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } } 
> } */
> +/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 } } */

Just {\madd} or more conservative {\maddi?\M} then?

Segher

[V8][PATCH 0/2]Accept and Handle the case when a structure including a FAM nested in another structure

2023-05-25 Thread Qing Zhao via Gcc-patches

(Resend due to the previous patches didn't include the version number)
Hi,

This is the 8th version of the patch, which rebased on the latest trunk.
This is an important patch needed by Linux Kernel security project. 

compared to the 7th version, the major change are:
1. update the documentation wordings based on Joseph's suggestions.
2. change the name of the new macro TYPE_INCLUDE_FLEXARRAY to
   TYPE_INCLUDES_FLEXARRAY. 

all others keep the same as version 7. 

the 7th version are here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619033.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619034.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619036.html

bootstrapped and regression tested on aarch64 and x86.

Okay for commit?

thanks a lot.

Qing

[V8][PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-05-25 Thread Qing Zhao via Gcc-patches

GCC extension accepts the case when a struct with a C99 flexible array member
is embedded into another struct or union (possibly recursively) as the last
field.
__builtin_object_size should treat such struct as flexible size.

gcc/c/ChangeLog:

PR tree-optimization/101832
* c-decl.cc (finish_struct): Set TYPE_INCLUDES_FLEXARRAY for
struct/union type.

gcc/lto/ChangeLog:

PR tree-optimization/101832
* lto-common.cc (compare_tree_sccs_1): Compare bit
TYPE_NO_NAMED_ARGS_STDARG_P or TYPE_INCLUDES_FLEXARRAY properly
for its corresponding type.

gcc/ChangeLog:

PR tree-optimization/101832
* print-tree.cc (print_node): Print new bit type_includes_flexarray.
* tree-core.h (struct tree_type_common): Use bit no_named_args_stdarg_p
as type_includes_flexarray for RECORD_TYPE or UNION_TYPE.
* tree-object-size.cc (addr_object_size): Handle structure/union type
when it has flexible size.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields): Stream
in bit no_named_args_stdarg_p properly for its corresponding type.
* tree-streamer-out.cc (pack_ts_type_common_value_fields): Stream
out bit no_named_args_stdarg_p properly for its corresponding type.
* tree.h (TYPE_INCLUDES_FLEXARRAY): New macro TYPE_INCLUDES_FLEXARRAY.

gcc/testsuite/ChangeLog:

PR tree-optimization/101832
* gcc.dg/builtin-object-size-pr101832.c: New test.
---
 gcc/c/c-decl.cc   |  11 ++
 gcc/lto/lto-common.cc |   5 +-
 gcc/print-tree.cc |   5 +
 .../gcc.dg/builtin-object-size-pr101832.c | 134 ++
 gcc/tree-core.h   |   2 +
 gcc/tree-object-size.cc   |  23 ++-
 gcc/tree-streamer-in.cc   |   5 +-
 gcc/tree-streamer-out.cc  |   5 +-
 gcc/tree.h|   7 +-
 9 files changed, 192 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 1af51c4acfc..e14f514cb6e 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9267,6 +9267,17 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
   /* Set DECL_NOT_FLEXARRAY flag for FIELD_DECL x.  */
   DECL_NOT_FLEXARRAY (x) = !is_flexible_array_member_p (is_last_field, x);
 
+  /* Set TYPE_INCLUDES_FLEXARRAY for the context of x, t.
+when x is an array and is the last field.  */
+  if (TREE_CODE (TREE_TYPE (x)) == ARRAY_TYPE)
+   TYPE_INCLUDES_FLEXARRAY (t)
+ = is_last_field && flexible_array_member_type_p (TREE_TYPE (x));
+  /* Recursively set TYPE_INCLUDES_FLEXARRAY for the context of x, t
+when x is an union or record and is the last field.  */
+  else if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
+   TYPE_INCLUDES_FLEXARRAY (t)
+ = is_last_field && TYPE_INCLUDES_FLEXARRAY (TREE_TYPE (x));
+
   if (DECL_NAME (x)
  || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
saw_named_field = true;
diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
index 537570204b3..f6b85bbc6f7 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -1275,7 +1275,10 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
   if (AGGREGATE_TYPE_P (t1))
compare_values (TYPE_TYPELESS_STORAGE);
   compare_values (TYPE_EMPTY_P);
-  compare_values (TYPE_NO_NAMED_ARGS_STDARG_P);
+  if (FUNC_OR_METHOD_TYPE_P (t1))
+   compare_values (TYPE_NO_NAMED_ARGS_STDARG_P);
+  if (RECORD_OR_UNION_TYPE_P (t1))
+   compare_values (TYPE_INCLUDES_FLEXARRAY);
   compare_values (TYPE_PACKED);
   compare_values (TYPE_RESTRICT);
   compare_values (TYPE_USER_ALIGN);
diff --git a/gcc/print-tree.cc b/gcc/print-tree.cc
index ccecd3dc6a7..62451b6cf4e 100644
--- a/gcc/print-tree.cc
+++ b/gcc/print-tree.cc
@@ -632,6 +632,11 @@ print_node (FILE *file, const char *prefix, tree node, int 
indent,
  && TYPE_CXX_ODR_P (node))
fputs (" cxx-odr-p", file);
 
+  if ((code == RECORD_TYPE
+  || code == UNION_TYPE)
+ && TYPE_INCLUDES_FLEXARRAY (node))
+   fputs (" includes-flexarray", file);
+
   /* The transparent-union flag is used for different things in
 different nodes.  */
   if ((code == UNION_TYPE || code == RECORD_TYPE)
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c
new file mode 100644
index 000..60078e11634
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c
@@ -0,0 +1,134 @@
+/* PR 101832: 
+   GCC extension accepts the case when a struct with a C99 flexible array
+   member is embedded into another struct (possibly recursively).
+   __builtin_object_size will treat such struct as flexible size.

Re: [PATCH v2] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-05-25 Thread Jan Beulich via Gcc-patches

On 25.05.2023 17:16, Fangrui Song wrote:
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -32942,9 +32942,10 @@ the cache line size.  @samp{compat} is the default.
>  
>  @opindex mlarge-data-threshold
>  @item -mlarge-data-threshold=@var{threshold}
> -When @option{-mcmodel=medium} is specified, data objects larger than
> -@var{threshold} are placed in the large data section.  This value must be the
> -same across all objects linked into the binary, and defaults to 65535.
> +When @option{-mcmodel=medium} or @option{-mcmodel=large} is specified, data
> +objects larger than @var{threshold} are placed in large data sections.  This
> +value must be the same across all objects linked into the binary, and 
> defaults
> +to 65535.

Where's the "must be the same" requirement coming from?

As to the default - to remain compatible with earlier versions, shouldn't
large model code default to "infinity"?

Jan

[PATCH] VECT: Add SELECT_VL support

2023-05-25 Thread juzhe . zhong

From: Ju-Zhe Zhong 

This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.

This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750

The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:

1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration

Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
-  _36 = MIN_EXPR ;
+  _36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -551,9 +551,14 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   /* Create decrement IV.  */
   create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, _gsi,
 insert_after, _before_incr, _after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
-   index_before_incr,
-   nitems_step));
+  tree len = NULL_TREE;
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   len = gimple_build (header_seq, IFN_SELECT_VL, iv_type,
+   index_before_incr, nitems_step);
+  else
+   len = gimple_build (header_seq, MIN_EXPR, iv_type, index_before_incr,
+   nitems_step);
+  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, len));
   *iv_step = step;
   return index_after_incr;
 }
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 5b7a0da0034..f67340976c8 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -974,6 +974,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
 using_partial_vectors_p (false),
 using_decrementing_iv_p (false),
+using_select_vl_p (false),
 epil_using_partial_vectors_p (false),
 partial_load_store_bias (0),
 peeling_for_gaps (false),
@@ -2737,6 +2738,14 @@ start_over:
LOOP_VINFO_VECT_FACTOR (loop_vinfo
 LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
 
+  /* If we're using decrement IV and SELECT_VL is supported by the target.
+ Use output of SELECT_VL to adjust IV of loop control and data reference.
+ Note: We only use SELECT_VL on single-rgroup control.  */
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
+  && LOOP_VINFO_LENS (loop_vinfo).length () == 1
+  && !slp)
+LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
+
   /* If we're vectorizing an epilogue loop, the vectorized loop either needs
  to be able to handle fewer than VF scalars, or needs to have a lower VF
  than the main loop.  */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 127b987cd62..8e8b0f71a4a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3147,6 +3147,61 @@ vect_get_data_ptr_increment (vec_info *vinfo,
   return iv_step;
 }
 
+/* Prepare the pointer IVs which needs to be updated by a variable amount.
+   Such variable amount is the outcome of .SELECT_VL. In this case, we can
+   allow each iteration process the flexible number of elements as long as
+   the number <= vf elments.
+
+   Return data reference according to SELECT_VL.
+   If new statements are needed, insert them before GSI.  */
+
+static tree
+get_select_vl_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info,
+   tree aggr_type, class loop *at_loop, tree offset,
+   tree *dummy, gimple_stmt_iterator *gsi,
+   bool simd_lane_access_p, vec_loop_lens *loop_lens,
+   dr_vec_info *dr_info,
+   vect_memory_access_type memory_access_type)
+{
+  loop_vec_info loop_vinfo = dyn_cast (vinfo);
+  tree step = vect_dr_behavior (vinfo, dr_info)->step;
+
+  /* TODO: We don't support gather/scatter or load_lanes/store_lanes for 
pointer
+ IVs are updated by variable amount but we will support them in the future.
+   */
+  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
+ && memory_access_type != VMAT_LOAD_STORE_LANES);
+
+  /* When we support SELECT_VL pattern, we dynamic adjust
+ the memory address by .SELECT_VL result.
+
+ The result of .SELECT_VL is the number of elements to
+ be processed of each iteration. So the memory address
+ adjustment operation should be:
+
+ bytesize = GET_MODE_SIZE (element_mode (aggr_type));
+ addr = addr + .SELECT_VL (ARG..) * bytesize;
+  */
+  gimple *ptr_incr;
+  tree loop_len
+= vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, aggr_type, 0, 0);
+  tree len_type = TREE_TYPE (loop_len);
+

Re: [V7][PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-05-25 Thread Qing Zhao via Gcc-patches




> On May 25, 2023, at 1:41 AM, Bernhard Reutner-Fischer  
> wrote:
> 
> On 24 May 2023 16:09:21 CEST, Qing Zhao  wrote:
>> Bernhard,
>> 
>> Thanks a lot for your comments.
>> 
>>> On May 19, 2023, at 7:11 PM, Bernhard Reutner-Fischer 
>>>  wrote:
>>> 
>>> On Fri, 19 May 2023 20:49:47 +
>>> Qing Zhao via Gcc-patches  wrote:
>>> 
 GCC extension accepts the case when a struct with a flexible array member
 is embedded into another struct or union (possibly recursively).
>>> 
>>> Do you mean TYPE_TRAILING_FLEXARRAY()?
>> 
>> The following might be more accurate description:
>> 
>> GCC extension accepts the case when a struct with a flexible array member
>> is embedded into another struct or union (possibly recursively) as the last 
>> field.
>> 
>> 
>> 
>>> 
 diff --git a/gcc/tree.h b/gcc/tree.h
 index 0b72663e6a1..237644e788e 100644
 --- a/gcc/tree.h
 +++ b/gcc/tree.h
 @@ -786,7 +786,12 @@ extern void omp_clause_range_check_failed 
 (const_tree, const char *, int,
   (...) prototype, where arguments can be accessed with va_start and
   va_arg), as opposed to an unprototyped function.  */
 #define TYPE_NO_NAMED_ARGS_STDARG_P(NODE) \
 -  (TYPE_CHECK (NODE)->type_common.no_named_args_stdarg_p)
 +  (FUNC_OR_METHOD_CHECK (NODE)->type_common.no_named_args_stdarg_p)
 +
 +/* True if this RECORD_TYPE or UNION_TYPE includes a flexible array member
 +   at the last field recursively.  */
 +#define TYPE_INCLUDE_FLEXARRAY(NODE) \
 +  (RECORD_OR_UNION_CHECK (NODE)->type_common.no_named_args_stdarg_p)
>>> 
>>> Until i read the description above i read TYPE_INCLUDE_FLEXARRAY as an
>>> option to include or not include something. The description hints more
>>> at TYPE_INCLUDES_FLEXARRAY (with an S) to be a type which has at least
>>> one member which has a trailing flexible array or which itself has a
>>> trailing flexible array.
>> 
>> Yes, TYPE_INCLUDES_FLEXARRAY (maybe with a S is a better name) means the 
>> structure/union TYPE includes a flexible array member or includes a struct 
>> with a flexible array member as the last field.
>> 
> 
> So ANY_TRAILING_FLEXARRAY or TYPE_CONTAINS_FLEXARRAY, TYPE_INCLUDES_FLEXARRAY 
> or something like that would be more clear, i don't know.
> I'd probably use the first, but that's enough bike shedding for me now. Let's 
> see what others think.

Thanks. I changed it to TYPE_INCLUDES_FLEXARRAY.

Qing
> 
> thanks,
> 
>> Hope this is clear.
>> thanks.
>> 
>> Qing
>>> 
 
 /* In an IDENTIFIER_NODE, this means that assemble_name was called with
   this string as an argument.  */

[PATCH v2] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-05-25 Thread Fangrui Song via Gcc-patches

When using -mcmodel=medium, large data objects larger than the
-mlarge-data-threshold threshold are placed into large data sections
(.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
.l* sections into separate output sections.  If small and medium code
model object files are mixed, the .l* sections won't exert relocation
overflow pressure on sections in object files built with -mcmodel=small.

However, when using -mcmodel=large, -mlarge-data-threshold doesn't
apply.  This means that the .rodata/.data/.bss sections may exert
relocation overflow pressure on sections in -mcmodel=small object files.

This patch allows -mcmodel=large to generate .l* sections.

Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
("Large data sections for the large code model")

Signed-off-by: Fangrui Song 

---
Changes from v1 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616947.html):
* Clarify commit message. Add link to 
https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
---
 gcc/config/i386/i386.cc| 15 +--
 gcc/config/i386/i386.opt   |  2 +-
 gcc/doc/invoke.texi|  7 ---
 gcc/testsuite/gcc.target/i386/large-data.c | 13 +
 4 files changed, 27 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/large-data.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 202abf0b39c..3568da4f053 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -637,7 +637,8 @@ ix86_can_inline_p (tree caller, tree callee)
 static bool
 ix86_in_large_data_p (tree exp)
 {
-  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC)
+  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC &&
+  ix86_cmodel != CM_LARGE && ix86_cmodel != CM_LARGE_PIC)
 return false;
 
   if (exp == NULL_TREE)
@@ -848,8 +849,9 @@ x86_elf_aligned_decl_common (FILE *file, tree decl,
const char *name, unsigned HOST_WIDE_INT size,
unsigned align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-  && size > (unsigned int)ix86_section_threshold)
+  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
+  ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
+ size > (unsigned int)ix86_section_threshold)
 {
   switch_to_section (get_named_section (decl, ".lbss", 0));
   fputs (LARGECOMM_SECTION_ASM_OP, file);
@@ -869,9 +871,10 @@ void
 x86_output_aligned_bss (FILE *file, tree decl, const char *name,
unsigned HOST_WIDE_INT size, unsigned align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-  && size > (unsigned int)ix86_section_threshold)
-switch_to_section (get_named_section (decl, ".lbss", 0));
+  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
+   ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
+  size > (unsigned int)ix86_section_threshold)
+switch_to_section(get_named_section(decl, ".lbss", 0));
   else
 switch_to_section (bss_section);
   ASM_OUTPUT_ALIGN (file, floor_log2 (align / BITS_PER_UNIT));
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index d74f6b1f8fc..de8e722cd62 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -282,7 +282,7 @@ Branches are this expensive (arbitrary units).
 
 mlarge-data-threshold=
 Target RejectNegative Joined UInteger Var(ix86_section_threshold) 
Init(DEFAULT_LARGE_SECTION_THRESHOLD)
--mlarge-data-threshold=Data greater than given threshold will 
go into .ldata section in x86-64 medium model.
+-mlarge-data-threshold=Data greater than given threshold will 
go into a large data section in x86-64 medium and large code models.
 
 mcmodel=
 Target RejectNegative Joined Enum(cmodel) Var(ix86_cmodel) Init(CM_32)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ee78591c73e..4b5391e12b5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -32942,9 +32942,10 @@ the cache line size.  @samp{compat} is the default.
 
 @opindex mlarge-data-threshold
 @item -mlarge-data-threshold=@var{threshold}
-When @option{-mcmodel=medium} is specified, data objects larger than
-@var{threshold} are placed in the large data section.  This value must be the
-same across all objects linked into the binary, and defaults to 65535.
+When @option{-mcmodel=medium} or @option{-mcmodel=large} is specified, data
+objects larger than @var{threshold} are placed in large data sections.  This
+value must be the same across all objects linked into the binary, and defaults
+to 65535.
 
 @opindex mrtd
 @item -mrtd
diff --git a/gcc/testsuite/gcc.target/i386/large-data.c 
b/gcc/testsuite/gcc.target/i386/large-data.c
new file mode 100644
index 000..09a917431d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/large-data.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target

[PATCH 3/3] xtensa: Rework 'setmemsi' insn pattern

2023-05-25 Thread Takayuki 'January June' Suwa via Gcc-patches

In order to reject voodoo estimation logic with lots of magic numbers,
this patch revises the code to measure the costs of the three memset
methods based on the actual emission size of the insn sequence
corresponding to each method and choose the smallest one.

gcc/ChangeLog:

* config/xtensa/xtensa-protos.h
(xtensa_expand_block_set_unrolled_loop,
xtensa_expand_block_set_small_loop): Remove.
(xtensa_expand_block_set): New prototype.
* config/xtensa/xtensa.cc
(xtensa_expand_block_set_libcall): New subfunction.
(xtensa_expand_block_set_unrolled_loop,
xtensa_expand_block_set_small_loop): Rewrite as subfunctions.
(xtensa_expand_block_set): New function that calls the above
subfunctions.
* config/xtensa/xtensa.md (memsetsi): Change to invoke only
xtensa_expand_block_set().
---
 gcc/config/xtensa/xtensa-protos.h |   3 +-
 gcc/config/xtensa/xtensa.cc   | 319 --
 gcc/config/xtensa/xtensa.md   |   4 +-
 3 files changed, 172 insertions(+), 154 deletions(-)

diff --git a/gcc/config/xtensa/xtensa-protos.h 
b/gcc/config/xtensa/xtensa-protos.h
index ec715b44e4d..b0b15a42799 100644
--- a/gcc/config/xtensa/xtensa-protos.h
+++ b/gcc/config/xtensa/xtensa-protos.h
@@ -42,8 +42,7 @@ extern void xtensa_expand_conditional_branch (rtx *, 
machine_mode);
 extern int xtensa_expand_conditional_move (rtx *, int);
 extern int xtensa_expand_scc (rtx *, machine_mode);
 extern int xtensa_expand_block_move (rtx *);
-extern int xtensa_expand_block_set_unrolled_loop (rtx *);
-extern int xtensa_expand_block_set_small_loop (rtx *);
+extern int xtensa_expand_block_set (rtx *);
 extern void xtensa_split_operand_pair (rtx *, machine_mode);
 extern int xtensa_constantsynth (rtx, HOST_WIDE_INT);
 extern int xtensa_emit_move_sequence (rtx *, machine_mode);
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 46ab9f36b56..3b5d25b660a 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl-iter.h"
 #include "insn-attr.h"
 #include "tree-pass.h"
+#include "print-rtl.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1530,77 +1531,61 @@ xtensa_expand_block_move (rtx *operands)
 }
 
 
-/* Try to expand a block set operation to a sequence of RTL move
-   instructions.  If not optimizing, or if the block size is not a
-   constant, or if the block is too large, or if the value to
-   initialize the block with is not a constant, the expansion
-   fails and GCC falls back to calling memset().
+/* Worker function for xtensa_expand_block_set().
 
-   operands[0] is the destination
-   operands[1] is the length
-   operands[2] is the initialization value
-   operands[3] is the alignment */
+   Expand into an insn sequence that calls the "memset" function.  */
 
-static int
-xtensa_sizeof_MOVI (HOST_WIDE_INT imm)
+static rtx_insn *
+xtensa_expand_block_set_libcall (rtx dst_mem,
+HOST_WIDE_INT value,
+HOST_WIDE_INT bytes)
 {
-  return (TARGET_DENSITY && IN_RANGE (imm, -32, 95)) ? 2 : 3;
+  rtx reg;
+  rtx_insn *seq;
+
+  start_sequence ();
+
+  reg = XEXP (dst_mem, 0);
+  if (! REG_P (reg))
+reg = XEXP (replace_equiv_address (dst_mem,
+  force_reg (Pmode, reg)), 0);
+  emit_library_call (gen_rtx_SYMBOL_REF (Pmode, "memset"),
+LCT_NORMAL, VOIDmode,
+reg, SImode,
+GEN_INT (value), SImode,
+GEN_INT (bytes), SImode);
+
+  seq = get_insns ();
+  end_sequence ();
+
+  return seq;
 }
 
-int
-xtensa_expand_block_set_unrolled_loop (rtx *operands)
+/* Worker function for xtensa_expand_block_set().
+
+   Expand into an insn sequence of one constant load followed by multiple
+   memory stores.  Returns NULL if the conditions for expansion are not
+   met.  */
+
+static rtx_insn *
+xtensa_expand_block_set_unrolled_loop (rtx dst_mem,
+  HOST_WIDE_INT value,
+  HOST_WIDE_INT bytes,
+  HOST_WIDE_INT align)
 {
-  rtx dst_mem = operands[0];
-  HOST_WIDE_INT bytes, value, align;
-  int expand_len, funccall_len;
-  rtx x, reg;
+  rtx reg;
   int offset;
+  rtx_insn *seq;
 
-  if (!CONST_INT_P (operands[1]) || !CONST_INT_P (operands[2]))
-return 0;
+  if (bytes > 64)
+return NULL;
 
-  bytes = INTVAL (operands[1]);
-  if (bytes <= 0)
-return 0;
-  value = (int8_t)INTVAL (operands[2]);
-  align = INTVAL (operands[3]);
-  if (align > MOVE_MAX)
-align = MOVE_MAX;
-
-  /* Insn expansion: holding the init value.
- Either MOV(.N) or L32R w/litpool.  */
-  if (align == 1)
-expand_len = xtensa_sizeof_MOVI (value);
-  else if (value == 0 || value == -1)
-expand_len = TARGET_DENSITY ? 2 : 3;
-

[PATCH 1/3] xtensa: Addendum of the commit e33d2dcb463161a110ac345a451132ce8b2b23d9

2023-05-25 Thread Takayuki 'January June' Suwa via Gcc-patches

gcc/ChangeLog:

* config/xtensa/xtensa.md (*extzvsi-1bit_ashlsi3):
Retract excessive line folding, and correct the value of
the "length" insn attribute related to TARGET_DENSITY.
(*extzvsi-1bit_addsubx): Ditto.
---
 gcc/config/xtensa/xtensa.md | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 6c1d8ee8f81..11258125165 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -1009,8 +1009,7 @@
(ashift:SI (match_dup 0)
   (match_dup 3)))]
 {
-  int pos = INTVAL (operands[2]),
-  shift = floor_log2 (INTVAL (operands[3]));
+  int pos = INTVAL (operands[2]), shift = floor_log2 (INTVAL (operands[3]));
   switch (GET_CODE (operands[4]))
 {
 case ASHIFT:
@@ -1029,7 +1028,10 @@
 }
   [(set_attr "type""arith")
(set_attr "mode""SI")
-   (set_attr "length"  "6")])
+   (set (attr "length")
+(if_then_else (match_test "TARGET_DENSITY && INTVAL (operands[3]) == 
2")
+ (const_int 5)
+ (const_int 6)))])
 
 (define_insn_and_split "*extzvsi-1bit_addsubx"
   [(set (match_operand:SI 0 "register_operand" "=a")
@@ -1053,8 +1055,7 @@
(match_dup 4))
 (match_dup 2)]))]
 {
-  int pos = INTVAL (operands[3]),
-  shift = floor_log2 (INTVAL (operands[4]));
+  int pos = INTVAL (operands[3]), shift = floor_log2 (INTVAL (operands[4]));
   switch (GET_CODE (operands[6]))
 {
 case ASHIFT:
-- 
2.30.2

[PATCH 2/3] xtensa: Add 'subtraction from constant' insn pattern

2023-05-25 Thread Takayuki 'January June' Suwa via Gcc-patches

This patch makes try to eliminate using temporary pseudo for
'(minus:SI (const_int) (reg:SI))' if the addition of negative constant
value can be emitted in a single machine instruction.

/* example */
int test0(int x) {
  return 1 - x;
}
int test1(int x) {
  return 100 - x;
}
int test2(int x) {
  return 25600 - x;
}

;; before
test0:
movi.n  a9, 1
sub a2, a9, a2
ret.n
test1:
movia9, 0x64
sub a2, a9, a2
ret.n
test2:
movi.n  a9, 0x19
sllia9, a9, 10
sub a2, a9, a2
ret.n

;; after
test0:
addi.n  a2, a2, -1
neg a2, a2
ret.n
test1:
addia2, a2, -100
neg a2, a2
ret.n
test2:
addmi   a2, a2, -0x6400
neg a2, a2
ret.n

gcc/ChangeLog:

* config/xtensa/xtensa-protos.h (xtensa_m1_or_1_thru_15):
New prototype.
* config/xtensa/xtensa.cc (xtensa_m1_or_1_thru_15):
New function.
* config/xtensa/constraints.md (O):
Change to use the above function.
* config/xtensa/xtensa.md (*subsi3_from_const):
New insn_and_split pattern.
---
 gcc/config/xtensa/constraints.md  |  2 +-
 gcc/config/xtensa/xtensa-protos.h |  1 +
 gcc/config/xtensa/xtensa.cc   |  7 +++
 gcc/config/xtensa/xtensa.md   | 24 
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/gcc/config/xtensa/constraints.md b/gcc/config/xtensa/constraints.md
index 53e4d0d8dd1..5cade1db8ff 100644
--- a/gcc/config/xtensa/constraints.md
+++ b/gcc/config/xtensa/constraints.md
@@ -108,7 +108,7 @@
 (define_constraint "O"
  "An integer constant that can be used in ADDI.N instructions."
  (and (match_code "const_int")
-  (match_test "ival == -1 || IN_RANGE (ival, 1, 15)")))
+  (match_test "xtensa_m1_or_1_thru_15 (ival)")))
 
 (define_constraint "P"
  "An integer constant that can be used as a mask value in an EXTUI
diff --git a/gcc/config/xtensa/xtensa-protos.h 
b/gcc/config/xtensa/xtensa-protos.h
index 64cbf27c248..ec715b44e4d 100644
--- a/gcc/config/xtensa/xtensa-protos.h
+++ b/gcc/config/xtensa/xtensa-protos.h
@@ -27,6 +27,7 @@ extern bool xtensa_simm8x256 (HOST_WIDE_INT);
 extern bool xtensa_simm12b (HOST_WIDE_INT);
 extern bool xtensa_b4const_or_zero (HOST_WIDE_INT);
 extern bool xtensa_b4constu (HOST_WIDE_INT);
+extern bool xtensa_m1_or_1_thru_15 (HOST_WIDE_INT);
 extern bool xtensa_mask_immediate (HOST_WIDE_INT);
 extern bool xtensa_mem_offset (unsigned, machine_mode);
 
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index e3af78cd228..46ab9f36b56 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -471,6 +471,13 @@ xtensa_b4constu (HOST_WIDE_INT v)
 }
 
 
+bool
+xtensa_m1_or_1_thru_15 (HOST_WIDE_INT v)
+{
+  return v == -1 || IN_RANGE (v, 1, 15);
+}
+
+
 bool
 xtensa_mask_immediate (HOST_WIDE_INT v)
 {
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 11258125165..113b313026e 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -216,6 +216,30 @@
(set_attr "mode""SI")
(set_attr "length"  "3")])
 
+(define_insn_and_split "*subsi3_from_const"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+   (minus:SI (match_operand:SI 1 "const_int_operand" "i")
+ (match_operand:SI 2 "register_operand" "r")))]
+  "xtensa_simm8 (-INTVAL (operands[1]))
+   || xtensa_simm8x256 (-INTVAL (operands[1]))"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (plus:SI (match_dup 2)
+(match_dup 1)))
+   (set (match_dup 0)
+   (neg:SI (match_dup 0)))]
+{
+  operands[1] = GEN_INT (-INTVAL (operands[1]));
+}
+  [(set_attr "type""arith")
+   (set_attr "mode""SI")
+   (set (attr "length")
+   (if_then_else (match_test "TARGET_DENSITY
+  && xtensa_m1_or_1_thru_15 (-INTVAL 
(operands[1]))")
+ (const_int 5)
+ (const_int 6)))])
+
 (define_insn "subsf3"
   [(set (match_operand:SF 0 "register_operand" "=f")
(minus:SF (match_operand:SF 1 "register_operand" "f")
-- 
2.30.2

Re: [patch]: Implement PR104327 for avr

2023-05-25 Thread Richard Biener via Gcc-patches




> Am 25.05.2023 um 16:22 schrieb Georg-Johann Lay :
> 
> 
> 
>> Am 25.05.23 um 08:35 schrieb Richard Biener:
>>> On Wed, May 24, 2023 at 5:44 PM Georg-Johann Lay  wrote:
>>> Am 24.05.23 um 11:38 schrieb Richard Biener:
 On Tue, May 23, 2023 at 2:56 PM Georg-Johann Lay  wrote:
> 
> PR target/104327 not only affects s390 but also avr:
> The avr backend pre-sets some options depending on optimization level.
> The inliner then thinks that always_inline functions are not eligible
> for inlining and terminates with an error.
> 
> Proposing the following patch that implements TARGET_CAN_INLINE_P.
> 
> Ok to apply?
> 
> Johann
> 
> target/104327: Allow more inlining between different optimization levels.
> 
> avr-common.cc introduces the following options that are set depending
> on optimization level: -mgas-isr-prologues, -mmain-is-OS-task and
> -fsplit-wide-types-early.  The inliner thinks that different options
> disallow cross-optimization inlining, so provide can_inline_p.
> 
> gcc/
>  PR target/104327
>  * config/avr/avr.cc (avr_can_inline_p): New static function.
>  (TARGET_CAN_INLINE_P): Define to that function.
> diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
> index 9fa50ca230d..55b48f63865 100644
> --- a/gcc/config/avr/avr.cc
> +++ b/gcc/config/avr/avr.cc
> @@ -1018,6 +1018,22 @@ avr_no_gccisr_function_p (tree func)
>  return avr_lookup_function_attribute1 (func, "no_gccisr");
>}
> 
> +
> +/* Implement `TARGET_CAN_INLINE_P'.  */
> +/* Some options like -mgas_isr_prologues depend on optimization level,
> +   and the inliner might think that due to different options, inlining
> +   is not permitted; see PR104327.  */
> +
> +static bool
> +avr_can_inline_p (tree /* caller */, tree callee)
> +{
> +  // For now, dont't allow to inline ISRs.  If the user actually wants
> +  // to inline ISR code, they have to turn the body of the ISR into an
> +  // ordinary function.
> +
> +  return ! avr_interrupt_function_p (callee);
 
 I'm not sure if AVR has ISA extensions but the above will likely break
 things like
 
 void __attribute__((target("-mX"))) foo () { asm ("isa X opcode");
 stmt-that-generates-X-ISA; }
>>> 
>>> This yields
>>> 
>>> warning: target attribute is not supported on this machine [-Wattributes]
>> Ah, that's an interesting fact.  So that indeed leaves
>> __attribute__((optimize(...)))
>> influencing the set of active target attributes via the generic option target
>> hooks like in your case the different defaults.
>>> avr has -mmcu= target options, but switching them in mid-air
>>> won't work because the file prologue might already be different
>>> and incompatible across different architectures.  And I never
>>> saw any user requesting such a thing, and I can't imagine
>>> any reasonable use case...  If the warning is not strong enough,
>>> may be it can be turned into an error, but -Wattributes is not
>>> specific enough for that.
>> Note the target attribute is then simply ignored.
 void bar ()
 {
if (cpu-has-X)
  foo ();
 }
 
 if always-inlines are the concern you can use
 
bool always_inline
  = (DECL_DISREGARD_INLINE_LIMITS (callee)
 && lookup_attribute ("always_inline",
  DECL_ATTRIBUTES (callee)));
/* Do what the user says.  */
if (always_inline)
  return true;
 
return default_target_can_inline_p (caller, callee);
>>> 
>>> The default implementation of can_inline_p worked fine for avr.
>>> As far as I understand, the new behavior is due to clean-up
>>> of global states for options?
>> I think the last change was r8-2658-g9b25e12d2d940a which
>> for targets without target attribute support made it more likely
>> to run into the default hook actually comparing the options.
>> Previously the "default" was oddly special-cased but you
>> could have still run into compares with two different set of
>> defaults when there's another "default" default.  Say, compile
>> with -O2 and have one optimize(0) and one optimize(Os)
>> function it would compare the optimize(0) and optimize(Os)
>> set if they were distinct from the -O2 set.  That probably never
>> happened for AVR.
>>> So I need to take into account inlining costs and decide on that
>>> whether it's preferred to inline a function or not?
>> No, the hook isn't about cost, it's about full incompatibility.  So
>> if the different -m options that could be in effect for AVR in
>> a single TU for different functions never should prevent inlining
>> then simply make the hook return true.  If there's a specific
>> option (that can differ from what specified on the compiler
>> command line!) that should, then you should compare the
>> setting of that option from the

RE: [PATCH V16] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread Li, Pan2 via Gcc-patches

Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Thursday, May 25, 2023 9:06 PM
To: Richard Sandiford 
Cc: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH V16] VECT: Add decrement IV iteration loop control by 
variable amount support

On Thu, 25 May 2023, Richard Sandiford wrote:

> This looks good to me.  Just a couple of very minor cosmetic things:
> 
> juzhe.zh...@rivai.ai writes:
> > @@ -753,17 +846,35 @@ vect_set_loop_condition_partial_vectors (class loop 
> > *loop,
> >   continue;
> >   }
> >  
> > -   /* See whether zero-based IV would ever generate all-false masks
> > -  or zero length before wrapping around.  */
> > -   bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
> > -
> > -   /* Set up all controls for this group.  */
> > -   test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
> > -_seq,
> > -_seq,
> > -loop_cond_gsi, rgc,
> > -niters, niters_skip,
> > -might_wrap_p);
> > +   if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) || !iv_rgc
> > +   || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
> > +   != rgc->max_nscalars_per_iter * rgc->factor))
> 
> Coding style is to put each subcondition on a separate line when the 
> whole condition doesn't fit on a single line.  So:
> 
>   if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
>   || !iv_rgc
>   || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
>   != rgc->max_nscalars_per_iter * rgc->factor))
> 
> > @@ -2725,6 +2726,17 @@ start_over:
> >&& !vect_verify_loop_lens (loop_vinfo))
> >  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> >  
> > +  /* If we're vectorizing an loop that uses length "controls" and
> 
> s/an loop/a loop/(Sorry for not noticing earlier.)
> 
> OK for trunk from my POV with those changes; no need to repost unless 
> your policies require it.  Please give Richi a chance to comment too 
> though.

LGTM as well.

Thanks,
Richard.

Re: [PATCH v2] RISC-V: Implement autovec abs, vneg, vnot.

2023-05-25 Thread Robin Dapp via Gcc-patches

> Beside, V2 patch should change this:
> emit_vlmax_masked_insn (unsigned icode, int op_num, rtx *ops)
> 
> change it into emit_vlmax_masked_mu_insn .

V3 is inline with these changes.

This patch implements abs2, vneg2 and vnot2 expanders
for integer vector registers and adds tests for them.

gcc/ChangeLog:

* config/riscv/autovec.md (2): Add vneg/vnot.
(abs2): Add.
* config/riscv/riscv-protos.h (emit_vlmax_masked_mu_insn):
Declare.
* config/riscv/riscv-v.cc (emit_vlmax_masked_mu_insn): New
function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add unop tests.
* gcc.target/riscv/rvv/autovec/unop/abs-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-template.h: New test.
---
 gcc/config/riscv/autovec.md   | 43 ++-
 gcc/config/riscv/riscv-protos.h   |  2 +
 gcc/config/riscv/riscv-v.cc   | 16 +++
 .../riscv/rvv/autovec/unop/abs-run.c  | 39 +
 .../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  8 
 .../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  8 
 .../riscv/rvv/autovec/unop/abs-template.h | 26 +++
 .../riscv/rvv/autovec/unop/vneg-run.c | 29 +
 .../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  6 +++
 .../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  6 +++
 .../riscv/rvv/autovec/unop/vneg-template.h| 18 
 .../riscv/rvv/autovec/unop/vnot-run.c | 43 +++
 .../riscv/rvv/autovec/unop/vnot-rv32gcv.c |  6 +++
 .../riscv/rvv/autovec/unop/vnot-rv64gcv.c |  6 +++
 .../riscv/rvv/autovec/unop/vnot-template.h| 22 ++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  2 +
 16 files changed, 279 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-template.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-template.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vnot-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vnot-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vnot-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vnot-template.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7fe4d94de39..38216d9812f 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -145,7 +145,7 @@ (define_expand "3"
 })
 
 ;; -
-;;  [INT] Binary shifts by scalar.
+;;  [INT] Binary shifts by vector.
 ;; -
 ;; Includes:
 ;; - vsll.vv/vsra.vv/vsrl.vv
@@ -373,3 +373,44 @@ (define_expand "vcondu"
 DONE;
   }
 )
+
+;; =
+;; == Unary arithmetic
+;; =
+
+;; 
---
+;;  [INT] Unary operations
+;; 
---
+;; Includes:
+;; - vneg.v/vnot.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VI 0 "register_operand")
+(any_int_unop:VI
+ (match_operand:VI 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
+;; 
---
+;; - ABS expansion to vmslt and vneg
+;;

Re: [PATCH] Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode.

2023-05-25 Thread Vladimir Makarov via Gcc-patches




On 5/17/23 02:57, liuhongt wrote:

r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost
calculation when the preferred register class are not known yet.
It regressed powerpc PR109610 and PR109858, it looks too aggressive to use
NO_REGS when mode can be allocated with GENERAL_REGS.
The patch takes a step back, still use GENERAL_REGS when
hard_regno_mode_ok for mode and GENERAL_REGS, otherwise uses NO_REGS.
Kewen confirmed the patch fixed PR109858, I vefiried it also fixed PR109610.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
No big performance impact for SPEC2017 on icelake server.
Ok for trunk?

gcc/ChangeLog:

* ira-costs.cc (scan_one_insn): Only use NO_REGS in cost
calculation when !hard_regno_mode_ok for GENERAL_REGS and
mode, otherwise still use GENERAL_REGS.


Thank you for the patch.  It looks good for me.  It is ok to commit it 
into the trunk.

RE: [PATCH 1/1] arm: merge MVE_5 and MVE_6 iterators

2023-05-25 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Thursday, May 25, 2023 1:25 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov 
> Cc: Christophe Lyon 
> Subject: [PATCH 1/1] arm: merge MVE_5 and MVE_6 iterators
> 
> MVE_5 and MVE_6 iterators are the same: this patch replaces MVE_6 with
> MVE_5 everywhere in mve.md and removes MVE_6 from iterators.md.
> 

Ok from me. I'd consider these kinds of cleanups obvious changes.
Thanks,
Kyrill

> 2023-05-25  Christophe Lyon 
> 
>   gcc/
>   * config/arm/iterators.md (MVE_6): Remove.
>   * config/arm/mve.md: Replace MVE_6 with MVE_5.
> ---
>  gcc/config/arm/iterators.md |  1 -
>  gcc/config/arm/mve.md   | 68 ++---
>  2 files changed, 34 insertions(+), 35 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 597c1dae640..9e77af55d60 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -272,7 +272,6 @@
>  (define_mode_iterator MVE_3 [V16QI V8HI])
>  (define_mode_iterator MVE_2 [V16QI V8HI V4SI])
>  (define_mode_iterator MVE_5 [V8HI V4SI])
> -(define_mode_iterator MVE_6 [V8HI V4SI])
>  (define_mode_iterator MVE_7 [V16BI V8BI V4BI V2QI])
>  (define_mode_iterator MVE_7_HI [HI V16BI V8BI V4BI V2QI])
>  (define_mode_iterator MVE_V8HF [V8HF])
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 9e3570c5264..74909ce47e1 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -3732,9 +3732,9 @@
>  ;; [vldrhq_gather_offset_s vldrhq_gather_offset_u]
>  ;;
>  (define_insn "mve_vldrhq_gather_offset_"
> -  [(set (match_operand:MVE_6 0 "s_register_operand" "=")
> - (unspec:MVE_6 [(match_operand: 1
> "memory_operand" "Us")
> -(match_operand:MVE_6 2 "s_register_operand" "w")]
> +  [(set (match_operand:MVE_5 0 "s_register_operand" "=")
> + (unspec:MVE_5 [(match_operand: 1
> "memory_operand" "Us")
> +(match_operand:MVE_5 2 "s_register_operand" "w")]
>   VLDRHGOQ))
>]
>"TARGET_HAVE_MVE"
> @@ -3755,9 +3755,9 @@
>  ;; [vldrhq_gather_offset_z_s vldrhq_gather_offset_z_u]
>  ;;
>  (define_insn "mve_vldrhq_gather_offset_z_"
> -  [(set (match_operand:MVE_6 0 "s_register_operand" "=")
> - (unspec:MVE_6 [(match_operand: 1
> "memory_operand" "Us")
> -(match_operand:MVE_6 2 "s_register_operand" "w")
> +  [(set (match_operand:MVE_5 0 "s_register_operand" "=")
> + (unspec:MVE_5 [(match_operand: 1
> "memory_operand" "Us")
> +(match_operand:MVE_5 2 "s_register_operand" "w")
>  (match_operand: 3
> "vpr_register_operand" "Up")
>   ]VLDRHGOQ))
>]
> @@ -3780,9 +3780,9 @@
>  ;; [vldrhq_gather_shifted_offset_s vldrhq_gather_shifted_offset_u]
>  ;;
>  (define_insn "mve_vldrhq_gather_shifted_offset_"
> -  [(set (match_operand:MVE_6 0 "s_register_operand" "=")
> - (unspec:MVE_6 [(match_operand: 1
> "memory_operand" "Us")
> -(match_operand:MVE_6 2 "s_register_operand" "w")]
> +  [(set (match_operand:MVE_5 0 "s_register_operand" "=")
> + (unspec:MVE_5 [(match_operand: 1
> "memory_operand" "Us")
> +(match_operand:MVE_5 2 "s_register_operand" "w")]
>   VLDRHGSOQ))
>]
>"TARGET_HAVE_MVE"
> @@ -3803,9 +3803,9 @@
>  ;; [vldrhq_gather_shifted_offset_z_s vldrhq_gather_shited_offset_z_u]
>  ;;
>  (define_insn "mve_vldrhq_gather_shifted_offset_z_"
> -  [(set (match_operand:MVE_6 0 "s_register_operand" "=")
> - (unspec:MVE_6 [(match_operand: 1
> "memory_operand" "Us")
> -(match_operand:MVE_6 2 "s_register_operand" "w")
> +  [(set (match_operand:MVE_5 0 "s_register_operand" "=")
> + (unspec:MVE_5 [(match_operand: 1
> "memory_operand" "Us")
> +(match_operand:MVE_5 2 "s_register_operand" "w")
>  (match_operand: 3
> "vpr_register_operand" "Up")
>   ]VLDRHGSOQ))
>]
> @@ -3828,8 +3828,8 @@
>  ;; [vldrhq_s, vldrhq_u]
>  ;;
>  (define_insn "mve_vldrhq_"
> -  [(set (match_operand:MVE_6 0 "s_register_operand" "=w")
> - (unspec:MVE_6 [(match_operand: 1
> "mve_memory_operand" "Ux")]
> +  [(set (match_operand:MVE_5 0 "s_register_operand" "=w")
> + (unspec:MVE_5 [(match_operand: 1
> "mve_memory_operand" "Ux")]
>VLDRHQ))
>]
>"TARGET_HAVE_MVE"
> @@ -3870,8 +3870,8 @@
>  ;; [vldrhq_z_s vldrhq_z_u]
>  ;;
>  (define_insn "mve_vldrhq_z_"
> -  [(set (match_operand:MVE_6 0 "s_register_operand" "=w")
> - (unspec:MVE_6 [(match_operand: 1
> "mve_memory_operand" "Ux")
> +  [(set (match_operand:MVE_5 0 "s_register_operand" "=w")
> + (unspec:MVE_5 [(match_operand: 1
> "mve_memory_operand" "Ux")
>   (match_operand: 2 "vpr_register_operand" "Up")]
>VLDRHQ))
>]
> @@ -4449,7 +4449,7 @@
>  (define_insn "mve_vstrhq_p_"
>[(set (match_operand: 0 "mve_memory_operand" "=Ux")
>   (unspec:
> -  [(match_operand:MVE_6 1 "s_register_operand" "w")
> +

Re: [patch]: Implement PR104327 for avr

2023-05-25 Thread Georg-Johann Lay





Am 25.05.23 um 08:35 schrieb Richard Biener:

On Wed, May 24, 2023 at 5:44 PM Georg-Johann Lay  wrote:

Am 24.05.23 um 11:38 schrieb Richard Biener:

On Tue, May 23, 2023 at 2:56 PM Georg-Johann Lay  wrote:


PR target/104327 not only affects s390 but also avr:
The avr backend pre-sets some options depending on optimization level.
The inliner then thinks that always_inline functions are not eligible
for inlining and terminates with an error.

Proposing the following patch that implements TARGET_CAN_INLINE_P.

Ok to apply?

Johann

target/104327: Allow more inlining between different optimization levels.

avr-common.cc introduces the following options that are set depending
on optimization level: -mgas-isr-prologues, -mmain-is-OS-task and
-fsplit-wide-types-early.  The inliner thinks that different options
disallow cross-optimization inlining, so provide can_inline_p.

gcc/
  PR target/104327
  * config/avr/avr.cc (avr_can_inline_p): New static function.
  (TARGET_CAN_INLINE_P): Define to that function.
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 9fa50ca230d..55b48f63865 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -1018,6 +1018,22 @@ avr_no_gccisr_function_p (tree func)
  return avr_lookup_function_attribute1 (func, "no_gccisr");
}

+
+/* Implement `TARGET_CAN_INLINE_P'.  */
+/* Some options like -mgas_isr_prologues depend on optimization level,
+   and the inliner might think that due to different options, inlining
+   is not permitted; see PR104327.  */
+
+static bool
+avr_can_inline_p (tree /* caller */, tree callee)
+{
+  // For now, dont't allow to inline ISRs.  If the user actually wants
+  // to inline ISR code, they have to turn the body of the ISR into an
+  // ordinary function.
+
+  return ! avr_interrupt_function_p (callee);


I'm not sure if AVR has ISA extensions but the above will likely break
things like

void __attribute__((target("-mX"))) foo () { asm ("isa X opcode");
stmt-that-generates-X-ISA; }


This yields

warning: target attribute is not supported on this machine [-Wattributes]


Ah, that's an interesting fact.  So that indeed leaves
__attribute__((optimize(...)))
influencing the set of active target attributes via the generic option target
hooks like in your case the different defaults.


avr has -mmcu= target options, but switching them in mid-air
won't work because the file prologue might already be different
and incompatible across different architectures.  And I never
saw any user requesting such a thing, and I can't imagine
any reasonable use case...  If the warning is not strong enough,
may be it can be turned into an error, but -Wattributes is not
specific enough for that.


Note the target attribute is then simply ignored.


void bar ()
{
if (cpu-has-X)
  foo ();
}

if always-inlines are the concern you can use

bool always_inline
  = (DECL_DISREGARD_INLINE_LIMITS (callee)
 && lookup_attribute ("always_inline",
  DECL_ATTRIBUTES (callee)));
/* Do what the user says.  */
if (always_inline)
  return true;

return default_target_can_inline_p (caller, callee);


The default implementation of can_inline_p worked fine for avr.
As far as I understand, the new behavior is due to clean-up
of global states for options?


I think the last change was r8-2658-g9b25e12d2d940a which
for targets without target attribute support made it more likely
to run into the default hook actually comparing the options.
Previously the "default" was oddly special-cased but you
could have still run into compares with two different set of
defaults when there's another "default" default.  Say, compile
with -O2 and have one optimize(0) and one optimize(Os)
function it would compare the optimize(0) and optimize(Os)
set if they were distinct from the -O2 set.  That probably never
happened for AVR.


So I need to take into account inlining costs and decide on that
whether it's preferred to inline a function or not?


No, the hook isn't about cost, it's about full incompatibility.  So
if the different -m options that could be in effect for AVR in
a single TU for different functions never should prevent inlining
then simply make the hook return true.  If there's a specific
option (that can differ from what specified on the compiler
command line!) that should, then you should compare the
setting of that option from the DECL_FUNCTION_SPECIFIC_TARGET
of the caller and the callee.

But as far as I can see simply returning true should be correct
for AVR, or like your patch handle interrupts differently (though
the -Winline diagnostic will tell the user there's a mismatch in
target options which might be confusing).


Ok, simply "true" sounds reasonable.  Is that change ok then?

Johann



Richard.


Johann


+}
+
/* Implement `TARGET_SET_CURRENT_FUNCTION'.  */
/* Sanity cheching for above function attributes.  */

@@ -14713,6 +14729,9 @@

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-05-25 Thread Philipp Tomsich

On Thu, 25 May 2023 at 16:14, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 5/25/23 07:50, Richard Biener wrote:
> > On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
> >  wrote:
> >>
> >>
> >>
> >> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
> >>> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis  
> >>> wrote:
> 
>  Implementation of the new RISC-V optimization pass for memory offset
>  calculations, documentation and testcases.
> >>>
> >>> Why do fwprop or combine not what you want to do?

At least for stack variables, the virtual-stack-vars is not resolved
until reload.
So combine will be running much too early to be of any use (and I
haven't recently looked at whether one of the propagation passes runs
after).

Philipp.

> >> I think a lot of them end up coming from register elimination.
> >
> > Why isn't this a problem for other targets then?  Or maybe it is and this
> > shouldn't be a machine specific pass?  Maybe postreload-gcse should
> > perform strength reduction (I can't think of any other post reload pass
> > that would do something even remotely related).
> It is to some degree.  I ran into similar problems at my prior employer.
>   We ended up working around it in the target files in a different way
> -- which didn't work when I quickly tried it on RISC-V.
>
> Seems like it would be worth another investigative step as part of the
> evaluation of this patch.  I wasn't at 100% when I did that poking
> around many months ago.
>
> Jeff

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-05-25 Thread Jeff Law via Gcc-patches





On 5/25/23 07:50, Richard Biener wrote:

On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
 wrote:




On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:

On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis  wrote:


Implementation of the new RISC-V optimization pass for memory offset
calculations, documentation and testcases.


Why do fwprop or combine not what you want to do?

I think a lot of them end up coming from register elimination.


Why isn't this a problem for other targets then?  Or maybe it is and this
shouldn't be a machine specific pass?  Maybe postreload-gcse should
perform strength reduction (I can't think of any other post reload pass
that would do something even remotely related).
It is to some degree.  I ran into similar problems at my prior employer. 
 We ended up working around it in the target files in a different way 
-- which didn't work when I quickly tried it on RISC-V.


Seems like it would be worth another investigative step as part of the 
evaluation of this patch.  I wasn't at 100% when I did that poking 
around many months ago.


Jeff

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-05-25 Thread Manolis Tsamis

On Thu, May 25, 2023 at 4:53 PM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
>  wrote:
> >
> >
> >
> > On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
> > > On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis  
> > > wrote:
> > >>
> > >> Implementation of the new RISC-V optimization pass for memory offset
> > >> calculations, documentation and testcases.
> > >
> > > Why do fwprop or combine not what you want to do?
> > I think a lot of them end up coming from register elimination.
>
> Why isn't this a problem for other targets then?  Or maybe it is and this
> shouldn't be a machine specific pass?  Maybe postreload-gcse should
> perform strength reduction (I can't think of any other post reload pass
> that would do something even remotely related).
>
> Richard.
>

It should be a problem for other targets as well (especially RISC-style ISAs).

It can be easily seen by comparing the generated code for the
testcases: Example for testcase-2 on AArch64:
https://godbolt.org/z/GMT1K7Ebr
Although the patterns in the test cases are the ones that are simple
as the complex ones manifest in complex programs, the case still
holds.
The code for this pass is quite generic and could work for most/all
targets if that would be interesting.

Manolis

> > jeff

[PATCH][committed] aarch64: PR target/99195 Annotate complex FP patterns for vec-concat-zero

2023-05-25 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

This patch annotates the complex add and mla patterns for vec-concat-zero.
Testing showed an interesting bug in our MD patterns where they were defined to 
match:
(plus:VHSDF (match_operand:VHSDF 1 "register_operand" "0")
(unspec:VHSDF [(match_operand:VHSDF 2 "register_operand" 
"w")
   (match_operand:VHSDF 3 "register_operand" 
"w")
   (match_operand:SI 4 "const_int_operand" "n")]
   FCMLA))

but the canonicalisation rules for PLUS require the more "complex" operand to 
be first so
during combine when the new substituted patterns were attempted to be formed 
combine/recog would
try to match:
(plus:V2SF (unspec:V2SF [
(reg:V2SF 100)
(reg:V2SF 101)
(const_int 0 [0])
] UNSPEC_FCMLA270)
(reg:V2SF 99))
instead. This patch fixes the operands of the PLUS RTX in these patterns.
Similar patterns for the dot-product instructions already used the right order.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_fcadd): Rename 
to...
(aarch64_fcadd): ... This.
Fix canonicalization of PLUS operands.
(aarch64_fcmla): Rename to...
(aarch64_fcmla): ... This.
Fix canonicalization of PLUS operands.
(aarch64_fcmla_lane): Rename to...
(aarch64_fcmla_lane): ... This.
Fix canonicalization of PLUS operands.
(aarch64_fcmla_laneqv4hf): Rename to...
(aarch64_fcmla_laneqv4hf): ... This.
Fix canonicalization of PLUS operands.
(aarch64_fcmlaq_lane): Fix canonicalization of PLUS operands.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_9.c: New test.


cmplx-vcz.patch
Description: cmplx-vcz.patch

Re: [COMMITTED 4/4] - Gimple range PHI analyzer and testcases

2023-05-25 Thread Andrew MacLeod via Gcc-patches




On 5/25/23 03:03, Richard Biener wrote:

On Wed, May 24, 2023 at 11:21 PM Andrew MacLeod via Gcc-patches


   There is about a 1.5% slowdown to VRP to invoke and utilize the
analyzer in all 3 passes of VRP.  overall compile time is 0.06% slower.

Bootstraps on x86_64-pc-linux-gnu  with no regressions.  Pushed.
Hm.  What I've noticed the last time looking at how ranger deals
with PHIs is that it diverts to SCEV analysis for all of them but
it could restrict itself to analyze PHIs in loop headers
(bb->loop_father->header == bb).  That only handles natural
loops of course but that was good enough for the old VRP implementation.
That might also help to keep the PHI anlyzer leaner by less entires.




I've only quickly looked at the PHI analyzer and I failed to understand
how you discover cycles.  I'm pointing you to the SCC value-numbering
cycle finding which you can find for example on the GCC 7 branch
(it's gone for quite some time) in tree-ssa-sccvn.c:DFS - that collects
strongly connected SSA components (it walks all uses, you probably
want to ignore virtuals).  SCEV also has its own cycle finding
(well, sort of) with the scev_dfs class and it restricts itself to
operations it handles (so it's more close to what you do).

I fear you're developing sth very ad-hoc here.


Not something Ad-hoc in this compiler!

This is primarily an initial value estimator.  There is no attempt to do 
any loop analysis or anything like that.


It doesn't look for cycles per se, merely PHI nodes which feed each 
other and are modified in a straight forward way.. ie initialized on one 
edge and modified via one statement that we can then look at to decide 
how it affects the range of all the PHI nodes. This can eventually be 
changed to a sequence of a few statements, but one gets us started with 
the simple cases. All the rest of the PHI arguments come from PHI nodes 
and share the same value.  This can allow us to project a range which is 
better than VARYING.  SCEV doesnt seem to help much in these cases.


 It's pretty straightforward which is why it isn't much code. all 
handled in  phi_analyzer::process_phi().   Add phi node to worklist, 
examine each argument, if it iis a PHI def, add it to the worklist  if 
it hasnt been processed, otherwise, its an external input to the group, 
and bail if we get more than 2 of these.


Andrew

Andrew

[PATCH][committed] arm: Implement ACLE Data Intrinsics

2023-05-25 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

This patch implements a number of scalar data processing intrinsics from ACLE
that were requested by some users. Some of these have fast single-instruction
sequences for Armv6 and later, but even for earlier versions they can still emit
an inline sequence or a call to libgcc (and ACLE recommends them being 
unconditionally
available).

Chris Sidebottom wrote most of the patch, I just cleaned it up, wired up some 
builtins
and adjusted the tests.

Bootstrapped and tested on arm-none-linux-gnueabihf.
Pushing to trunk.
Thanks,
Kyrill

Co-authored-by: Chris Sidebottom 

gcc/ChangeLog:

2023-05-24  Chris Sidebottom  
Kyrylo Tkachov  

* config/arm/arm.md (rbitsi2): Rename to...
(arm_rbit): ... This.
(ctzsi2): Adjust for the above.
(arm_rev16si2): Convert to define_expand.
(arm_rev16si2_alt1): New pattern.
(arm_rev16si2_alt): Rename to...
(*arm_rev16si2_alt2): ... This.
* config/arm/arm_acle.h (__ror, __rorl, __rorll, __clz, __clzl, __clzll,
__cls, __clsl, __clsll, __revsh, __rev, __revl, __revll, __rev16,
__rev16l, __rev16ll, __rbit, __rbitl, __rbitll): Define intrinsics.
* config/arm/arm_acle_builtins.def (rbit, rev16si2): Define builtins.

gcc/testsuite/ChangeLog:

* gcc.target/arm/acle/data-intrinsics-armv6.c: New test.
* gcc.target/arm/acle/data-intrinsics-assembly.c: New test.
* gcc.target/arm/acle/data-intrinsics-rbit.c: New test.
* gcc.target/arm/acle/data-intrinsics.c: New test.


arm-acle.patch
Description: arm-acle.patch

Re: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.

2023-05-25 Thread Manolis Tsamis

On Thu, May 25, 2023 at 4:42 PM Jeff Law  wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> >
> > This pass tries to optimize memory offset calculations by moving them
> > from add immediate instructions to the memory loads/stores.
> > For example it can transform this:
> >
> >addi t4,sp,16
> >add  t2,a6,t4
> >shl  t3,t2,1
> >ld   a2,0(t3)
> >addi a2,1
> >sd   a2,8(t2)
> >
> > into the following (one instruction less):
> >
> >add  t2,a6,sp
> >shl  t3,t2,1
> >ld   a2,32(t3)
> >addi a2,1
> >sd   a2,24(t2)
> >
> > Although there are places where this is done already, this pass is more
> > powerful and can handle the more difficult cases that are currently not
> > optimized. Also, it runs late enough and can optimize away unnecessary
> > stack pointer calculations.
> >
> > The first patch in the series contains the implementation of this pass
> > while the second is a minor change that enables cprop_hardreg's
> > propgation of the stack pointer, because this pass depends on cprop
> > to do the propagation of optimized operations. If preferred I can split
> > this into two different patches (in which cases some of the testcases
> > included will fail temporarily).
> Thanks Manolis.  Do you happen to know if this includes the fixes I
> passed along to Philipp a few months back?  My recollection is one fixed
> stale DF data which prevented an ICE during bootstrapping, the other
> needed to ignore debug insns in one or two places so that the behavior
> didn't change based on the existence of debug insns.
>

Hi Jeff,

Yes this does include your fixes for DF and debug insns, along with
some other minor improvements.
Also, thanks for catching these!

Manolis

>
> Jeff

Re: [PATCH] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

2023-05-25 Thread Alexandre Oliva via Gcc-patches

On May 25, 2023, Segher Boessenkool  wrote:

> Fwiw, updating the insn counts blindly like this

... is a claim that carries a wildly incorrect and insulting underlying
assumption: I've actually identified the corresponding change to the
lp64 tests, compared the effects of the codegen changes, and concluded
the tests needed this changing for ilp32 to keep on testing for the same
thing after code changes brought about by changes that AFAICT had been
well understood when making the lp64 adjustments.

> If it is not possible to keep these tests up-to-date easily

The counts have been stable for a couple of release cycles already.

The change that caused the codegen differences is identified and
understood; the PR confirmed my findings, naming the root cause and the
incomplete testsuite adjustment.

I suspect there may also be ABI-related assumptions implied by the 'add'
counts, but I don't know enough about all the ppc variants to be sure.

Now, if your implied claim is correct that counting 'add/addi'
instructions in these tests is fragile, dropping the checks for those
would probably be best.  But if ppc maintainers seem to have different
opinions as to how to deal with the fallout of that one-time codegen
change, it would be foolish for me to get pulled into the cross fire.

Here's the patch that corrects the long-broken counts, with the
requested adjustments, retested with ppc- and ppc64-vx7r2.  Ok?

[testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

Codegen changes caused add instruction count mismatches on
ppc-*-linux-gnu and other 32-bit ppc targets.  At some point the
expected counts were adjusted for lp64, but ilp32 differences
remained, and published test results confirm it.

for  gcc/testsuite/ChangeLog

PR testsuite/101169
* gcc.target/powerpc/fold-vec-extract-char.p7.c: Adjust addi
counts for ilp32.
* gcc.target/powerpc/fold-vec-extract-double.p7.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-float.p7.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-float.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-int.p7.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-short.p7.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.
---
 .../gcc.target/powerpc/fold-vec-extract-char.p7.c  |3 ++-
 .../powerpc/fold-vec-extract-double.p7.c   |3 +--
 .../gcc.target/powerpc/fold-vec-extract-float.p7.c |3 +--
 .../gcc.target/powerpc/fold-vec-extract-float.p8.c |2 +-
 .../gcc.target/powerpc/fold-vec-extract-int.p7.c   |3 +--
 .../gcc.target/powerpc/fold-vec-extract-int.p8.c   |2 +-
 .../gcc.target/powerpc/fold-vec-extract-short.p7.c |3 +--
 .../gcc.target/powerpc/fold-vec-extract-short.p8.c |2 +-
 8 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
index 29a8aa84db282..c6647431d09c9 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
@@ -11,7 +11,8 @@
 /* one extsb (extend sign-bit) instruction generated for each test against
unsigned types */

-/* { dg-final { scan-assembler-times {\maddi\M} 9 } } */
+/* { dg-final { scan-assembler-times {\maddi\M} 9 { target { lp64 } } } } */
+/* { dg-final { scan-assembler-times {\maddi\M} 6 { target { ilp32 } } } } */
 /* { dg-final { scan-assembler-times {\mli\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mstxvw4x\M|\mstvx\M|\mstxv\M} 6 } } */
 /* -m32 target uses rlwinm in place of rldicl. */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
index 3cae644b90b71..cbf6cffbeba17 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
@@ -13,8 +13,7 @@
 /* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mli\M} 1 } } */
 /* -m32 target has an 'add' in place of one of the 'addi'. */
-/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } } } 
*/
-/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } } } 
*/
+/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 } } */
 /* -m32 target has a rlwinm in place of a rldic .  */
 /* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mstxvd2x\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
index 59a4979457dcb..c9abb6c1f352c 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
@@ -12,8

Re: RISC-V Bootstrap problems

2023-05-25 Thread Jeff Law





On 5/24/23 22:19, juzhe.zh...@rivai.ai wrote:

 >> It's highly unlikely we'll switch from the mechanisms we're using.

They're pretty deeply embedded into how all the ports are developed and
work.


We just take a look at the build file. It seems that the functions 
generated by define_insn

are so many. Do we have the chance optimize it?
I believe the tablegen mechanism in LLVM is well optimized in case of 
generated files and functions

so that they won't be affected to much as instructions go up.
Any define_insn or define_expand with a name that does not begin with a 
'*' will result in a function in insn-emit.


Those functions allow us to generate those patterns either from generic 
parts of the compiler or from within the target.  So if I have


(define_insn "fubar" ...)

I can say

rtx x = gen_fubar (...);

To constuct  RTL matching the pattern for "fubar".


So if there are patterns with names that are not used in this way, you 
can prefix their name with '*' to suppress creation of the generator 
function.  I have not looked to see if this would help our situation.


jeff

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-05-25 Thread Richard Biener via Gcc-patches

On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
> > On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis  
> > wrote:
> >>
> >> Implementation of the new RISC-V optimization pass for memory offset
> >> calculations, documentation and testcases.
> >
> > Why do fwprop or combine not what you want to do?
> I think a lot of them end up coming from register elimination.

Why isn't this a problem for other targets then?  Or maybe it is and this
shouldn't be a machine specific pass?  Maybe postreload-gcse should
perform strength reduction (I can't think of any other post reload pass
that would do something even remotely related).

Richard.

> jeff

Re: [PATCH] RISC-V: In pipeline scheduling, insns should not be fusion in different BB blocks.

2023-05-25 Thread Jeff Law via Gcc-patches





On 5/25/23 03:22, Richard Sandiford wrote:

Jin Ma  writes:

When the last insn1 of BB1 and the first insn2 of BB2 are fusion, insn2 will
clear all dependencies in the function chain_to_prev_insn, resulting in insn2
may mov to any BB, and the program calculation result is wrong.

gcc/ChangeLog:

* sched-deps.cc (sched_macro_fuse_insns): Insns should not be fusion
in different BB blocks
---
  gcc/sched-deps.cc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index 2aa6623ad2e..998fe930804 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -2833,7 +2833,7 @@ sched_macro_fuse_insns (rtx_insn *insn)
   compile time complexity.  */
if (DEBUG_INSN_P (insn))
  return;
-  prev = prev_nonnote_nondebug_insn (insn);
+  prev = prev_nonnote_nondebug_insn_bb (insn);
if (!prev)
  return;


Huh, kind-of impressed we managed to go so long without hitting this.
Yea.  I suspect a lot of the fusion cases we've handled until now have 
been more focused on things like compare+branch on x86 which are going 
to be in the same block.




The patch is OK, thanks (and for branches too if necessary).
I don't think Jin has commit privs, so I'll push it to the trunk and 
active release branches.


jeff

Re: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.

2023-05-25 Thread Jeff Law via Gcc-patches





On 5/25/23 06:35, Manolis Tsamis wrote:


This pass tries to optimize memory offset calculations by moving them
from add immediate instructions to the memory loads/stores.
For example it can transform this:

   addi t4,sp,16
   add  t2,a6,t4
   shl  t3,t2,1
   ld   a2,0(t3)
   addi a2,1
   sd   a2,8(t2)

into the following (one instruction less):

   add  t2,a6,sp
   shl  t3,t2,1
   ld   a2,32(t3)
   addi a2,1
   sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

The first patch in the series contains the implementation of this pass
while the second is a minor change that enables cprop_hardreg's
propgation of the stack pointer, because this pass depends on cprop
to do the propagation of optimized operations. If preferred I can split
this into two different patches (in which cases some of the testcases
included will fail temporarily).
Thanks Manolis.  Do you happen to know if this includes the fixes I 
passed along to Philipp a few months back?  My recollection is one fixed 
stale DF data which prevented an ICE during bootstrapping, the other 
needed to ignore debug insns in one or two places so that the behavior 
didn't change based on the existence of debug insns.



Jeff

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-05-25 Thread Jeff Law via Gcc-patches





On 5/25/23 06:35, Manolis Tsamis wrote:

Propagation of the stack pointer in cprop_hardreg is currenty forbidden
in all cases, due to maybe_mode_change returning NULL. Relax this
restriction and allow propagation when no mode change is requested.

gcc/ChangeLog:

 * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
I can't see how this can be correct given the stack pointer equality 
tests elsewhere in the compiler, particularly the various targets.


The problem is if you change the mode then you end up with multiple REG 
expressions that reference the stack pointer.


See rev: d1446456c3fcaa7be628726c9de4a877729490ca and the thread around 
the change which introduced this code.



Jeff

Re: [PATCH] [x86] reenable dword MOVE_MAX for better memmove inlining

2023-05-25 Thread Richard Biener via Gcc-patches

On Thu, May 25, 2023 at 3:25 PM Alexandre Oliva  wrote:
>
> On May 25, 2023, Richard Biener  wrote:
>
> > On Thu, May 25, 2023 at 1:10 PM Alexandre Oliva  wrote:
> >>
> >> On May 25, 2023, Richard Biener  wrote:
> >>
> >> > I mean we could do what RTL expansion would do later and do
> >> > by-pieces, thus emit multiple loads/stores but not n loads and then
> >> > n stores but interleaved.
> >>
> >> That wouldn't help e.g. gcc.dg/memcpy-6.c's fold_move_8, because
> >> MOVE_MAX and MOVE_MAX_PIECES currently limits inline expansion to 4
> >> bytes on x86 without SSE, both in gimple and RTL, and interleaved loads
> >> and stores wouldn't help with memmove.  We can't fix that by changing
> >> code that uses MOVE_MAX and/or MOVE_MAX_PIECES, when these limits are
> >> set too low.
>
> > Btw, there was a short period where the MOVE_MAX limit was restricted
> > but that had fallout and we've reverted since then.
>
> Erhm...  Are we even talking about the same issue?
>
> i386/i386.h reduced the 32-bit non-SSE MOVE_MAX from 16 to 4, which
> broke this test; I'm proposing to bounce it back up to 8, so that we get
> a little more memmove inlining, enough for tests that expect that much
> to pass.
>
> You may be focusing on the gimple-fold bit, because I mentioned it, but
> even the rtl expander is failing to expand the memmove because of the
> setting, as evidenced by the test's failure in the scan for memmove in
> the final dump.

So indeed fold_move_8 expands to the following, even with -minline-all-stringops

fold_move_8:
.LFB5:
.cfi_startproc
pushl   %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl%esp, %ebp
.cfi_def_cfa_register 5
subl$8, %esp
movl$a+3, %eax
subl$4, %esp
pushl   $8
pushl   $a
pushl   %eax
callmemmove
addl$16, %esp
nop

I do think it's still up to RTL expansion or the target to decide whether
its worth spending two registers to handle the overlap or maybe
emit a compare & jump to do forward and backward variants.

Yes, increasing MOVE_MAX to 8 makes this expand at the GIMPLE
level already, which I belive is premature and difficult to undo.

> That MOVE_MAX change was a significant regression in codegen for 32-bit
> non-SSE x86, and I'm proposing to fix that.  Compensating for that
> regression elsewhere doesn't seem desirable to me: MOVE_MAX can be much
> higher even on other x86 variants, so the effects of such attempts may
> harm quite significantly more modern CPUs.
>
> Conversely, I don't expect the reduction of MOVE_MAX on SSE-less x86 a
> couple of years ago to have been measured for performance effects, given
> the little overall relevance of such CPUs, and the very visible and
> undesirable effects on codegen that change brought onto them.  And yet,
> I'm being very conservative in the proposed reversion, because
> benchmarking such targets in any meaningful way would be somewhat
> challenging for myself as well.
>
> So, could we please have this narrow fix of this limited regression at
> the spot where it was introduced accepted, rather than debating
> tangents?
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-05-25 Thread Jeff Law via Gcc-patches





On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:

On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis  wrote:


Implementation of the new RISC-V optimization pass for memory offset
calculations, documentation and testcases.


Why do fwprop or combine not what you want to do?

I think a lot of them end up coming from register elimination.

jeff

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-05-25 Thread Manolis Tsamis

On Thu, May 25, 2023 at 4:03 PM Richard Biener
 wrote:
>
> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis  
> wrote:
> >
> > Implementation of the new RISC-V optimization pass for memory offset
> > calculations, documentation and testcases.
>
> Why do fwprop or combine not what you want to do?
>

Hi Richard,

At least from my experiments, the existing mechanisms (fwprop,
combine, ...) cannot handle the more difficult cases for which this
was created.

As can be seen in the example presented in the cover letter, this pass
is designed to work with partially-overlapping offset calculations,
multiple memory operations sharing some intermediate calculations
while also taking care of the offset range restrictions.
Also some offset calculation is introduced late enough (mostly
involving the stack pointer, local vars etc) that I think fwprop
cannot do something about them. Please correct me if I am wrong.

Prior to implementing this I did analyze the code generated for
benchmarks and found out that a lot of the harder cases are missed,
but they require powerful analysis and cannot be handled with combine.

Thanks,
Manolis

On Thu, May 25, 2023 at 4:03 PM Richard Biener
 wrote:
>
> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis  
> wrote:
> >
> > Implementation of the new RISC-V optimization pass for memory offset
> > calculations, documentation and testcases.
>
> Why do fwprop or combine not what you want to do?
>
> > gcc/ChangeLog:
> >
> > * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> > * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> > pass.
> > * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> > * config/riscv/riscv.opt: New options.
> > * config/riscv/t-riscv: New build rule.
> > * doc/invoke.texi: Document new option.
> > * config/riscv/riscv-fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> > * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> > * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/config.gcc|   2 +-
> >  gcc/config/riscv/riscv-fold-mem-offsets.cc| 637 ++
> >  gcc/config/riscv/riscv-passes.def |   1 +
> >  gcc/config/riscv/riscv-protos.h   |   1 +
> >  gcc/config/riscv/riscv.opt|   4 +
> >  gcc/config/riscv/t-riscv  |   4 +
> >  gcc/doc/invoke.texi   |   8 +
> >  .../gcc.target/riscv/fold-mem-offsets-1.c |  16 +
> >  .../gcc.target/riscv/fold-mem-offsets-2.c |  24 +
> >  .../gcc.target/riscv/fold-mem-offsets-3.c |  17 +
> >  10 files changed, 713 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/config/riscv/riscv-fold-mem-offsets.cc
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index d88071773c9..5dffd21b4c8 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -529,7 +529,7 @@ pru-*-*)
> > ;;
> >  riscv*)
> > cpu_type=riscv
> > -   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
> > riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
> > +   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
> > riscv-shorten-memrefs.o riscv-fold-mem-offsets.o riscv-selftests.o 
> > riscv-v.o riscv-vsetvl.o"
> > extra_objs="${extra_objs} riscv-vector-builtins.o 
> > riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
> > extra_objs="${extra_objs} thead.o"
> > d_target_objs="riscv-d.o"
> > diff --git a/gcc/config/riscv/riscv-fold-mem-offsets.cc 
> > b/gcc/config/riscv/riscv-fold-mem-offsets.cc
> > new file mode 100644
> > index 000..81325bb3beb
> > --- /dev/null
> > +++ b/gcc/config/riscv/riscv-fold-mem-offsets.cc
> > @@ -0,0 +1,637 @@
> > +/* Fold memory offsets pass for RISC-V.
> > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify
> > +it under the terms of the GNU General Public License as published by
> > +the Free Software Foundation; either version 3, or (at your option)
> > +any later version.
> > +
> > +GCC is distributed in the hope that it will be useful,
> > +but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +GNU General Public License for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +.  */
> > +
> > +#define

Re: [PATCH] [x86] reenable dword MOVE_MAX for better memmove inlining

2023-05-25 Thread Alexandre Oliva via Gcc-patches

On May 25, 2023, Richard Biener  wrote:

> On Thu, May 25, 2023 at 1:10 PM Alexandre Oliva  wrote:
>> 
>> On May 25, 2023, Richard Biener  wrote:
>> 
>> > I mean we could do what RTL expansion would do later and do
>> > by-pieces, thus emit multiple loads/stores but not n loads and then
>> > n stores but interleaved.
>> 
>> That wouldn't help e.g. gcc.dg/memcpy-6.c's fold_move_8, because
>> MOVE_MAX and MOVE_MAX_PIECES currently limits inline expansion to 4
>> bytes on x86 without SSE, both in gimple and RTL, and interleaved loads
>> and stores wouldn't help with memmove.  We can't fix that by changing
>> code that uses MOVE_MAX and/or MOVE_MAX_PIECES, when these limits are
>> set too low.

> Btw, there was a short period where the MOVE_MAX limit was restricted
> but that had fallout and we've reverted since then.

Erhm...  Are we even talking about the same issue?

i386/i386.h reduced the 32-bit non-SSE MOVE_MAX from 16 to 4, which
broke this test; I'm proposing to bounce it back up to 8, so that we get
a little more memmove inlining, enough for tests that expect that much
to pass.

You may be focusing on the gimple-fold bit, because I mentioned it, but
even the rtl expander is failing to expand the memmove because of the
setting, as evidenced by the test's failure in the scan for memmove in
the final dump.

That MOVE_MAX change was a significant regression in codegen for 32-bit
non-SSE x86, and I'm proposing to fix that.  Compensating for that
regression elsewhere doesn't seem desirable to me: MOVE_MAX can be much
higher even on other x86 variants, so the effects of such attempts may
harm quite significantly more modern CPUs.

Conversely, I don't expect the reduction of MOVE_MAX on SSE-less x86 a
couple of years ago to have been measured for performance effects, given
the little overall relevance of such CPUs, and the very visible and
undesirable effects on codegen that change brought onto them.  And yet,
I'm being very conservative in the proposed reversion, because
benchmarking such targets in any meaningful way would be somewhat
challenging for myself as well.

So, could we please have this narrow fix of this limited regression at
the spot where it was introduced accepted, rather than debating
tangents?

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [PATCH V16] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread Richard Biener via Gcc-patches

On Thu, 25 May 2023, Richard Sandiford wrote:

> This looks good to me.  Just a couple of very minor cosmetic things:
> 
> juzhe.zh...@rivai.ai writes:
> > @@ -753,17 +846,35 @@ vect_set_loop_condition_partial_vectors (class loop 
> > *loop,
> >   continue;
> >   }
> >  
> > -   /* See whether zero-based IV would ever generate all-false masks
> > -  or zero length before wrapping around.  */
> > -   bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
> > -
> > -   /* Set up all controls for this group.  */
> > -   test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
> > -_seq,
> > -_seq,
> > -loop_cond_gsi, rgc,
> > -niters, niters_skip,
> > -might_wrap_p);
> > +   if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) || !iv_rgc
> > +   || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
> > +   != rgc->max_nscalars_per_iter * rgc->factor))
> 
> Coding style is to put each subcondition on a separate line when the
> whole condition doesn't fit on a single line.  So:
> 
>   if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
>   || !iv_rgc
>   || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
>   != rgc->max_nscalars_per_iter * rgc->factor))
> 
> > @@ -2725,6 +2726,17 @@ start_over:
> >&& !vect_verify_loop_lens (loop_vinfo))
> >  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> >  
> > +  /* If we're vectorizing an loop that uses length "controls" and
> 
> s/an loop/a loop/(Sorry for not noticing earlier.)
> 
> OK for trunk from my POV with those changes; no need to repost unless
> your policies require it.  Please give Richi a chance to comment too
> though.

LGTM as well.

Thanks,
Richard.

Re: Re: [PATCH v2] RISC-V: Implement autovec abs, vneg, vnot.

2023-05-25 Thread 钟居哲

>> Yes, this is the emitted sequence, but the vsetvli mask is indeed
>> wrong.  Just got lucky there.  Or what else did you mean with
>> logically incorrect?
Oh, sorry. I didn't mean this patch logically incorrect.
I mean the MASK_ANY is logicall incorrect.
This patch is ok to me as long as you change MASK TAIL into MASK_UNDISTURBED.

Beside, V2 patch should change this:
emit_vlmax_masked_insn (unsigned icode, int op_num, rtx *ops)

change it into emit_vlmax_masked_mu_insn .


Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-25 20:32
To: juzhe.zh...@rivai.ai; gcc-patches; kito.cheng; palmer; jeffreyalaw
CC: rdapp.gcc
Subject: Re: [PATCH v2] RISC-V: Implement autovec abs, vneg, vnot.
> I think it's logically incorrect.  For ABS, you want:
> 
> operands[0] = operads[1] > 0 ? operands[1] :  (-operands[1])
> So you should do this following sequence:
> 
> vmslt v0,v1,0
> vneg v1,v1v0.t (should use Mask undisturbed)
 
Yes, this is the emitted sequence, but the vsetvli mask is indeed
wrong.  Just got lucky there.  Or what else did you mean with
logically incorrect?
 
> Here I see you set:
> e.set_policy (MASK_ANY); which is incorrect.
> You should use e.set_policy (MASK_UNDISTURBED); instead.> 
> Your testcases fail to catch this issue (you should create a testcase
> to catch this bug with this patch implementation.)
 
Added a regex to look for "ta,mu".
 
> You should not use RVV_UNOP+2. Instead, you should add an enum call
> RVV_UNOP_MU and replace it.
 
I was a bit weary of adding yet another, would rather have that
unified somehow, but well ;) Another time.  Adjusted locally.

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-05-25 Thread Richard Biener via Gcc-patches

On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis  wrote:
>
> Implementation of the new RISC-V optimization pass for memory offset
> calculations, documentation and testcases.

Why do fwprop or combine not what you want to do?

> gcc/ChangeLog:
>
> * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> pass.
> * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> * config/riscv/riscv.opt: New options.
> * config/riscv/t-riscv: New build rule.
> * doc/invoke.texi: Document new option.
> * config/riscv/riscv-fold-mem-offsets.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> * gcc.target/riscv/fold-mem-offsets-3.c: New test.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
>  gcc/config.gcc|   2 +-
>  gcc/config/riscv/riscv-fold-mem-offsets.cc| 637 ++
>  gcc/config/riscv/riscv-passes.def |   1 +
>  gcc/config/riscv/riscv-protos.h   |   1 +
>  gcc/config/riscv/riscv.opt|   4 +
>  gcc/config/riscv/t-riscv  |   4 +
>  gcc/doc/invoke.texi   |   8 +
>  .../gcc.target/riscv/fold-mem-offsets-1.c |  16 +
>  .../gcc.target/riscv/fold-mem-offsets-2.c |  24 +
>  .../gcc.target/riscv/fold-mem-offsets-3.c |  17 +
>  10 files changed, 713 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/config/riscv/riscv-fold-mem-offsets.cc
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index d88071773c9..5dffd21b4c8 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -529,7 +529,7 @@ pru-*-*)
> ;;
>  riscv*)
> cpu_type=riscv
> -   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
> riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
> +   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
> riscv-shorten-memrefs.o riscv-fold-mem-offsets.o riscv-selftests.o riscv-v.o 
> riscv-vsetvl.o"
> extra_objs="${extra_objs} riscv-vector-builtins.o 
> riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
> extra_objs="${extra_objs} thead.o"
> d_target_objs="riscv-d.o"
> diff --git a/gcc/config/riscv/riscv-fold-mem-offsets.cc 
> b/gcc/config/riscv/riscv-fold-mem-offsets.cc
> new file mode 100644
> index 000..81325bb3beb
> --- /dev/null
> +++ b/gcc/config/riscv/riscv-fold-mem-offsets.cc
> @@ -0,0 +1,637 @@
> +/* Fold memory offsets pass for RISC-V.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#define IN_TARGET_CODE 1
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "expr.h"
> +#include "backend.h"
> +#include "regs.h"
> +#include "target.h"
> +#include "memmodel.h"
> +#include "emit-rtl.h"
> +#include "insn-config.h"
> +#include "recog.h"
> +#include "predict.h"
> +#include "df.h"
> +#include "tree-pass.h"
> +#include "cfgrtl.h"
> +
> +/* This pass tries to optimize memory offset calculations by moving them
> +   from add immediate instructions to the memory loads/stores.
> +   For example it can transform this:
> +
> + addi t4,sp,16
> + add  t2,a6,t4
> + shl  t3,t2,1
> + ld   a2,0(t3)
> + addi a2,1
> + sd   a2,8(t2)
> +
> +   into the following (one instruction less):
> +
> + add  t2,a6,sp
> + shl  t3,t2,1
> + ld   a2,32(t3)
> + addi a2,1
> + sd   a2,24(t2)
> +
> +   Usually, the code generated from the previous passes tries to have the
> +   offsets in the memory instructions but this pass is still beneficial
> +   because:
> +
> +- There are cases where add instructions are added in a late rtl pass
> +  and the rest of the pipeline cannot eliminate them.  Specifically,
> +  arrays and structs allocated on the stack can result in multiple
> +  unnecessary add instructions that cannot be

[PATCH] RISC-V: Add ZVFHMIN extension to the -march= option

2023-05-25 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch would like to add new sub extension (aka ZVFHMIN) to the
-march= option. To make it simple, only the sub extension itself is
involved in this patch, and the underlying FP16 related RVV intrinsic
API depends on the TARGET_ZVFHMIN.

You can locate more information about ZVFHMIN from below spec doc.

https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#zvfhmin-vector-extension-for-minimal-half-precision-floating-point

Signed-off-by: Pan Li 

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
(riscv_implied_info): Add zvfhmin item.
(riscv_ext_version_table): Ditto.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv-opts.h (MASK_ZVFHMIN): New macro.
(TARGET_ZFHMIN): Align indent.
(TARGET_ZFH): Ditto.
(TARGET_ZVFHMIN): New macro.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-20.c: New test.
* gcc.target/riscv/predef-26.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc|  3 ++
 gcc/config/riscv/riscv-opts.h  |  6 ++-
 gcc/testsuite/gcc.target/riscv/arch-20.c   |  5 +++
 gcc/testsuite/gcc.target/riscv/predef-26.c | 51 ++
 4 files changed, 63 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-20.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-26.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index c2ec74b9d92..72f2f8f2753 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -104,6 +104,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
 
   {"zfh", "zfhmin"},
   {"zfhmin", "f"},
+  {"zvfhmin", "f"},
 
   {"zhinx", "zhinxmin"},
   {"zhinxmin", "zfinx"},
@@ -216,6 +217,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 
   {"zfh",   ISA_SPEC_CLASS_NONE, 1, 0},
   {"zfhmin",ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvfhmin",   ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},
 
@@ -1259,6 +1261,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"zfhmin",_options::x_riscv_zf_subext, MASK_ZFHMIN},
   {"zfh",   _options::x_riscv_zf_subext, MASK_ZFH},
+  {"zvfhmin",   _options::x_riscv_zf_subext, MASK_ZVFHMIN},
 
   {"zmmul", _options::x_riscv_zm_subext, MASK_ZMMUL},
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 2a16402265a..f34ca993689 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -200,9 +200,11 @@ enum riscv_entity
 
 #define MASK_ZFHMIN   (1 << 0)
 #define MASK_ZFH  (1 << 1)
+#define MASK_ZVFHMIN  (1 << 2)
 
-#define TARGET_ZFHMIN ((riscv_zf_subext & MASK_ZFHMIN) != 0)
-#define TARGET_ZFH((riscv_zf_subext & MASK_ZFH) != 0)
+#define TARGET_ZFHMIN  ((riscv_zf_subext & MASK_ZFHMIN) != 0)
+#define TARGET_ZFH ((riscv_zf_subext & MASK_ZFH) != 0)
+#define TARGET_ZVFHMIN ((riscv_zf_subext & MASK_ZVFHMIN) != 0)
 
 #define MASK_ZMMUL  (1 << 0)
 #define TARGET_ZMMUL((riscv_zm_subext & MASK_ZMMUL) != 0)
diff --git a/gcc/testsuite/gcc.target/riscv/arch-20.c 
b/gcc/testsuite/gcc.target/riscv/arch-20.c
new file mode 100644
index 000..8f8da1ecd65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-20.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32gcv_zvfhmin -mabi=ilp32 -mcmodel=medlow" } */
+int foo()
+{
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-26.c 
b/gcc/testsuite/gcc.target/riscv/predef-26.c
new file mode 100644
index 000..285f64bd6c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-26.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64i_zvfhmin -mabi=lp64f -mcmodel=medlow 
-misa-spec=20191213" } */
+
+int main () {
+
+#ifndef __riscv_arch_test
+#error "__riscv_arch_test"
+#endif
+
+#if __riscv_xlen != 64
+#error "__riscv_xlen"
+#endif
+
+#if !defined(__riscv_i)
+#error "__riscv_i"
+#endif
+
+#if !defined(__riscv_f)
+#error "__riscv_f"
+#endif
+
+#if !defined(__riscv_zvfhmin)
+#error "__riscv_zvfhmin"
+#endif
+
+#if defined(__riscv_v)
+#error "__riscv_v"
+#endif
+
+#if defined(__riscv_d)
+#error "__riscv_d"
+#endif
+
+#if defined(__riscv_c)
+#error "__riscv_c"
+#endif
+
+#if defined(__riscv_a)
+#error "__riscv_a"
+#endif
+
+#if defined(__riscv_zfh)
+#error "__riscv_zfh"
+#endif
+
+#if defined(__riscv_zfhmin)
+#error "__riscv_zfhmin"
+#endif
+
+  return 0;
+}
-- 
2.34.1

Re: Re: [PATCH V16] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread 钟居哲

Thanks Richard so much.
I have sent V17 patch for commit (fix format as you suggested).
You don't need to reply that.

I am waiting for Richi's final approval.

Thanks.

juzhe.zh...@rivai.ai

From: Richard Sandiford
Date: 2023-05-25 20:36
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V16] VECT: Add decrement IV iteration loop control by 
variable amount support
This looks good to me.  Just a couple of very minor cosmetic things:

juzhe.zh...@rivai.ai writes:
> @@ -753,17 +846,35 @@ vect_set_loop_condition_partial_vectors (class loop 
> *loop,
>continue;
>}
>  
> - /* See whether zero-based IV would ever generate all-false masks
> -or zero length before wrapping around.  */
> - bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
> -
> - /* Set up all controls for this group.  */
> - test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
> -  _seq,
> -  _seq,
> -  loop_cond_gsi, rgc,
> -  niters, niters_skip,
> -  might_wrap_p);
> + if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) || !iv_rgc
> + || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
> + != rgc->max_nscalars_per_iter * rgc->factor))

Coding style is to put each subcondition on a separate line when the
whole condition doesn't fit on a single line.  So:

if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
|| !iv_rgc
|| (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
!= rgc->max_nscalars_per_iter * rgc->factor))

> @@ -2725,6 +2726,17 @@ start_over:
>&& !vect_verify_loop_lens (loop_vinfo))
>  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>  
> +  /* If we're vectorizing an loop that uses length "controls" and

s/an loop/a loop/(Sorry for not noticing earlier.)

OK for trunk from my POV with those changes; no need to repost unless
your policies require it.  Please give Richi a chance to comment too
though.

Thanks for your patience with the review process.  The final result
seems pretty clean to me.

Richard

[PATCH V17] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Fix format for Richard.

This patch is supporting decrement IV by following the flow designed by Richard:

(1) In vect_set_loop_condition_partial_vectors, for the first iteration of:
call vect_set_loop_controls_directly.

(2) vect_set_loop_controls_directly calculates "step" as in your patch.
If rgc has 1 control, this step is the SSA name created for that control.
Otherwise the step is a fresh SSA name, as in your patch.

(3) vect_set_loop_controls_directly stores this step somewhere for later
use, probably in LOOP_VINFO.  Let's use "S" to refer to this stored step.

(4) After the vect_set_loop_controls_directly call above, and outside
the "if" statement that now contains vect_set_loop_controls_directly,
check whether rgc->controls.length () > 1.  If so, use
vect_adjust_loop_lens_control to set the controls based on S.

Then the only caller of vect_adjust_loop_lens_control is
vect_set_loop_condition_partial_vectors.  And the starting
step for vect_adjust_loop_lens_control is always S.

This patch has well tested for single-rgroup and multiple-rgroup (SLP) and
passed all testcase in RISC-V port.

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_adjust_loop_lens_control): New function.
(vect_set_loop_controls_directly): Add decrement IV support.
(vect_set_loop_condition_partial_vectors): Ditto.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): New variable.
* tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New macro.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c: New 
test.

---
 .../rvv/autovec/partial/multiple_rgroup-3.c   | 288 ++
 .../rvv/autovec/partial/multiple_rgroup-4.c   |  75 +
 .../autovec/partial/multiple_rgroup_run-3.c   |  36 +++
 .../autovec/partial/multiple_rgroup_run-4.c   |  15 +
 gcc/tree-vect-loop-manip.cc   | 136 -
 gcc/tree-vect-loop.cc |  12 +
 gcc/tree-vectorizer.h |   8 +
 7 files changed, 558 insertions(+), 12 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c
new file mode 100644
index 000..9579749c285
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c
@@ -0,0 +1,288 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+
+void __attribute__ ((noinline, noclone))
+f0 (int8_t *__restrict x, int16_t *__restrict y, int n)
+{
+  for (int i = 0, j = 0; i < n; i += 4, j += 8)
+{
+  x[i + 0] += 1;
+  x[i + 1] += 2;
+  x[i + 2] += 3;
+  x[i + 3] += 4;
+  y[j + 0] += 1;
+  y[j + 1] += 2;
+  y[j + 2] += 3;
+  y[j + 3] += 4;
+  y[j + 4] += 5;
+  y[j + 5] += 6;
+  y[j + 6] += 7;
+  y[j + 7] += 8;
+}
+}
+
+void __attribute__ ((optimize (0)))
+f0_init (int8_t *__restrict x, int8_t *__restrict x2, int16_t *__restrict y,
+int16_t *__restrict y2, int n)
+{
+  for (int i = 0, j = 0; i < n; i += 4, j += 8)
+{
+  x[i + 0] = i % 120;
+  x[i + 1] = i % 78;
+  x[i + 2] = i % 55;
+  x[i + 3] = i % 27;
+  y[j + 0] = j % 33;
+  y[j + 1] = j % 44;
+  y[j + 2] = j % 66;
+  y[j + 3] = j % 88;
+  y[j + 4] = j % 99;
+  y[j + 5] = j % 39;
+  y[j + 6] = j % 49;
+  y[j + 7] = j % 101;
+
+  x2[i + 0] = i % 120;
+  x2[i + 1] = i % 78;
+  x2[i + 2] = i % 55;
+  x2[i + 3] = i % 27;
+  y2[j + 0] = j % 33;
+  y2[j + 1] = j % 44;
+  y2[j + 2] = j % 66;
+  y2[j + 3] = j % 88;
+  y2[j + 4] = j % 99;
+  y2[j + 5] = j % 39;
+  y2[j + 6] = j % 49;
+  y2[j + 7] = j % 101;
+}
+}
+
+void __attribute__ ((optimize (0)))
+f0_golden (int8_t *__restrict x, int16_t *__restrict y, int n)
+{
+  for (int i = 0, j = 0; i < n; i += 4, j += 8)
+{
+  x[i + 0] += 1;
+  x[i + 1] += 2;
+  x[i + 2] += 3;
+  x[i + 3] += 4;
+  y[j + 0] += 1;
+  y[j + 1] += 2;
+  y[j + 2] += 3;
+  y[j + 3] += 4;
+  y[j + 4] += 5;
+  y[j + 5] += 6;
+  y[j + 6] += 7;
+  y[j + 7] += 8;
+}
+}
+
+void __attribute__ ((optimize (0)))
+f0_check (int8_t *__restrict x,

Re: [PATCH v4] libgfortran: Replace mutex with rwlock

2023-05-25 Thread Zhu, Lipeng via Gcc-patches





On 1/1/1970 8:00 AM, Thomas Koenig wrote:

Hi Lipeng,


May I know any comment or concern on this patch, thanks for your time :)



Thanks for your patience in getting this reviewed.

A few remarks / questions.

Which strategy is used in this implementation, read-preferring or 
write-preferring?  And if read-
preferring is used, is there a danger of deadlock if people do unreasonable 
things?
Maybe you could explain that, also in a comment in the code >
Can you add some sort of torture test case(s) which does a lot of 
opening/closing/reading/writing,
possibly with asynchronous I/O and/or pthreads, to catch possible problems?  If 
there is a system
dependency or some race condition, chances are that regression testers will 
catch this.


Hi Thomas,

Thanks for your time for the review.
Sure, I will add test case according to your suggestions and update the 
comment based on the implementation of "read-preferring" strategy.


Thanks,
Lipeng Zhu


With this, the libgfortran parts are OK, unless somebody else has more 
comments, so give this a couple
of days.  I cannot approve the libgcc parts, that would be somebody else 
(Jakub?)

Best regards

Thomas

Re: [PATCH V16] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread Richard Sandiford via Gcc-patches

This looks good to me.  Just a couple of very minor cosmetic things:

juzhe.zh...@rivai.ai writes:
> @@ -753,17 +846,35 @@ vect_set_loop_condition_partial_vectors (class loop 
> *loop,
> continue;
> }
>  
> - /* See whether zero-based IV would ever generate all-false masks
> -or zero length before wrapping around.  */
> - bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
> -
> - /* Set up all controls for this group.  */
> - test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
> -  _seq,
> -  _seq,
> -  loop_cond_gsi, rgc,
> -  niters, niters_skip,
> -  might_wrap_p);
> + if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) || !iv_rgc
> + || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
> + != rgc->max_nscalars_per_iter * rgc->factor))

Coding style is to put each subcondition on a separate line when the
whole condition doesn't fit on a single line.  So:

if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
|| !iv_rgc
|| (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
!= rgc->max_nscalars_per_iter * rgc->factor))

> @@ -2725,6 +2726,17 @@ start_over:
>&& !vect_verify_loop_lens (loop_vinfo))
>  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>  
> +  /* If we're vectorizing an loop that uses length "controls" and

s/an loop/a loop/(Sorry for not noticing earlier.)

OK for trunk from my POV with those changes; no need to repost unless
your policies require it.  Please give Richi a chance to comment too
though.

Thanks for your patience with the review process.  The final result
seems pretty clean to me.

Richard

[PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-05-25 Thread Manolis Tsamis

Implementation of the new RISC-V optimization pass for memory offset
calculations, documentation and testcases.

gcc/ChangeLog:

* config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
pass.
* config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
* config/riscv/riscv.opt: New options.
* config/riscv/t-riscv: New build rule.
* doc/invoke.texi: Document new option.
* config/riscv/riscv-fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis 
---

 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv-fold-mem-offsets.cc| 637 ++
 gcc/config/riscv/riscv-passes.def |   1 +
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv.opt|   4 +
 gcc/config/riscv/t-riscv  |   4 +
 gcc/doc/invoke.texi   |   8 +
 .../gcc.target/riscv/fold-mem-offsets-1.c |  16 +
 .../gcc.target/riscv/fold-mem-offsets-2.c |  24 +
 .../gcc.target/riscv/fold-mem-offsets-3.c |  17 +
 10 files changed, 713 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/riscv-fold-mem-offsets.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index d88071773c9..5dffd21b4c8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -529,7 +529,7 @@ pru-*-*)
;;
 riscv*)
cpu_type=riscv
-   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
+   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-fold-mem-offsets.o riscv-selftests.o riscv-v.o 
riscv-vsetvl.o"
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o"
d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/riscv-fold-mem-offsets.cc 
b/gcc/config/riscv/riscv-fold-mem-offsets.cc
new file mode 100644
index 000..81325bb3beb
--- /dev/null
+++ b/gcc/config/riscv/riscv-fold-mem-offsets.cc
@@ -0,0 +1,637 @@
+/* Fold memory offsets pass for RISC-V.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tree.h"
+#include "expr.h"
+#include "backend.h"
+#include "regs.h"
+#include "target.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "predict.h"
+#include "df.h"
+#include "tree-pass.h"
+#include "cfgrtl.h"
+
+/* This pass tries to optimize memory offset calculations by moving them
+   from add immediate instructions to the memory loads/stores.
+   For example it can transform this:
+
+ addi t4,sp,16
+ add  t2,a6,t4
+ shl  t3,t2,1
+ ld   a2,0(t3)
+ addi a2,1
+ sd   a2,8(t2)
+
+   into the following (one instruction less):
+
+ add  t2,a6,sp
+ shl  t3,t2,1
+ ld   a2,32(t3)
+ addi a2,1
+ sd   a2,24(t2)
+
+   Usually, the code generated from the previous passes tries to have the
+   offsets in the memory instructions but this pass is still beneficial
+   because:
+
+- There are cases where add instructions are added in a late rtl pass
+  and the rest of the pipeline cannot eliminate them.  Specifically,
+  arrays and structs allocated on the stack can result in multiple
+  unnecessary add instructions that cannot be eliminated easily
+  otherwise.
+
+- The existing mechanisms that move offsets to memory instructions
+  usually apply only to specific patterns or have other limitations.
+  This pass is very generic and can fold offsets through complex
+  calculations with multiple memory uses and partially overlapping
+  calculations.  As a result

[PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.

2023-05-25 Thread Manolis Tsamis



This pass tries to optimize memory offset calculations by moving them
from add immediate instructions to the memory loads/stores.
For example it can transform this:

  addi t4,sp,16
  add  t2,a6,t4
  shl  t3,t2,1
  ld   a2,0(t3)
  addi a2,1
  sd   a2,8(t2)

into the following (one instruction less):

  add  t2,a6,sp
  shl  t3,t2,1
  ld   a2,32(t3)
  addi a2,1
  sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

The first patch in the series contains the implementation of this pass
while the second is a minor change that enables cprop_hardreg's
propgation of the stack pointer, because this pass depends on cprop
to do the propagation of optimized operations. If preferred I can split
this into two different patches (in which cases some of the testcases
included will fail temporarily).



Manolis Tsamis (2):
  Implementation of new RISCV optimizations pass: fold-mem-offsets.
  cprop_hardreg: Enable propagation of the stack pointer if possible.

 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv-fold-mem-offsets.cc| 637 ++
 gcc/config/riscv/riscv-passes.def |   1 +
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv.opt|   4 +
 gcc/config/riscv/t-riscv  |   4 +
 gcc/doc/invoke.texi   |   8 +
 gcc/regcprop.cc   |   7 +-
 .../gcc.target/riscv/fold-mem-offsets-1.c |  16 +
 .../gcc.target/riscv/fold-mem-offsets-2.c |  24 +
 .../gcc.target/riscv/fold-mem-offsets-3.c |  17 +
 11 files changed, 719 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-fold-mem-offsets.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c

-- 
2.34.1

[PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-05-25 Thread Manolis Tsamis

Propagation of the stack pointer in cprop_hardreg is currenty forbidden
in all cases, due to maybe_mode_change returning NULL. Relax this
restriction and allow propagation when no mode change is requested.

gcc/ChangeLog:

* regcprop.cc (maybe_mode_change): Enable stack pointer propagation.

Signed-off-by: Manolis Tsamis 
---

 gcc/regcprop.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
index f426f4fedcd..6cbfadb181f 100644
--- a/gcc/regcprop.cc
+++ b/gcc/regcprop.cc
@@ -422,7 +422,12 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
 
  It's unclear if we need to do the same for other special registers.  */
   if (regno == STACK_POINTER_REGNUM)
-return NULL_RTX;
+{
+  if (orig_mode == new_mode)
+   return stack_pointer_rtx;
+  else
+   return NULL_RTX;
+}
 
   if (orig_mode == new_mode)
 return gen_raw_REG (new_mode, regno);
-- 
2.34.1

Re: [PATCH v2] RISC-V: Implement autovec abs, vneg, vnot.

2023-05-25 Thread Robin Dapp via Gcc-patches

> I think it's logically incorrect.  For ABS, you want:
> 
> operands[0] = operads[1] > 0 ? operands[1] :  (-operands[1])
> So you should do this following sequence:
> 
> vmslt v0,v1,0
> vneg v1,v1v0.t (should use Mask undisturbed)

Yes, this is the emitted sequence, but the vsetvli mask is indeed
wrong.  Just got lucky there.  Or what else did you mean with
logically incorrect?

> Here I see you set:
> e.set_policy (MASK_ANY); which is incorrect.
> You should use e.set_policy (MASK_UNDISTURBED); instead.> 
> Your testcases fail to catch this issue (you should create a testcase
> to catch this bug with this patch implementation.)

Added a regex to look for "ta,mu".

> You should not use RVV_UNOP+2. Instead, you should add an enum call
> RVV_UNOP_MU and replace it.

I was a bit weary of adding yet another, would rather have that
unified somehow, but well ;) Another time.  Adjusted locally.

RE: [PATCH] arm: Fix ICE due to infinite splitting [PR109800]

2023-05-25 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Kyrylo
> Tkachov via Gcc-patches
> Sent: Thursday, May 25, 2023 11:48 AM
> To: Alex Coplan 
> Cc: gcc-patches@gcc.gnu.org; ni...@redhat.com; Richard Earnshaw
> ; Ramana Radhakrishnan
> 
> Subject: RE: [PATCH] arm: Fix ICE due to infinite splitting [PR109800]
> 
> 
> 
> > -Original Message-
> > From: Alex Coplan 
> > Sent: Thursday, May 25, 2023 11:26 AM
> > To: Kyrylo Tkachov 
> > Cc: gcc-patches@gcc.gnu.org; ni...@redhat.com; Richard Earnshaw
> > ; Ramana Radhakrishnan
> > 
> > Subject: Re: [PATCH] arm: Fix ICE due to infinite splitting [PR109800]
> >
> > Hi Kyrill,
> >
> > On 23/05/2023 11:14, Kyrylo Tkachov wrote:
> > > Hi Alex,
> > > diff --git a/gcc/testsuite/gcc.target/arm/pr109800.c
> > b/gcc/testsuite/gcc.target/arm/pr109800.c
> > > new file mode 100644
> > > index 000..71d1ede13dd
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/arm/pr109800.c
> > > @@ -0,0 +1,3 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -march=armv7-m -mfloat-abi=hard -mfpu=fpv4-sp-
> > d16 -mbig-endian -mpure-code" } */
> > > +double f() { return 5.0; }
> > >
> > > ... The arm testsuite options are kinda hard to get right with all the
> effective
> > targets and multilibs and such hardcoded abi and march options tend to
> > break in some target.
> > > I suggest you put this testcase in gcc.target/arm/pure-code and add a dg-
> > skip-if to skip the test if the multilib options specify a different 
> > float-abi.
> >
> > How about this instead:
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr109800.c
> > b/gcc/testsuite/gcc.target/arm/pure-code/pr109800.c
> > new file mode 100644
> > index 000..d797b790232
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/pure-code/pr109800.c
> > @@ -0,0 +1,4 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target arm_hard_ok } */
> > +/* { dg-options "-O2 -march=armv7-m -mfloat-abi=hard -mfpu=fpv4-sp-
> d16 -
> > mbig-endian -mpure-code" } */
> > +double f() { return 5.0; }
> >
> > Full v2 patch attached.
> 
> Thanks, looks better but I think you'll still want to have a dg-skip-if to 
> avoid
> explicit -mfloat-abi=soft and -mfloat-abi=softfp in the multilib options. You
> can grep in that test directory for examples

Actually, as discussed offline this patch is okay as it has the arm_hard_ok 
check.
Thanks,
Kyrill

> Kyrill
> 
> >
> > Thanks,
> > Alex

[PATCH 1/1] arm: merge MVE_5 and MVE_6 iterators

2023-05-25 Thread Christophe Lyon via Gcc-patches

MVE_5 and MVE_6 iterators are the same: this patch replaces MVE_6 with
MVE_5 everywhere in mve.md and removes MVE_6 from iterators.md.

2023-05-25  Christophe Lyon 

gcc/
* config/arm/iterators.md (MVE_6): Remove.
* config/arm/mve.md: Replace MVE_6 with MVE_5.
---
 gcc/config/arm/iterators.md |  1 -
 gcc/config/arm/mve.md   | 68 ++---
 2 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 597c1dae640..9e77af55d60 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -272,7 +272,6 @@
 (define_mode_iterator MVE_3 [V16QI V8HI])
 (define_mode_iterator MVE_2 [V16QI V8HI V4SI])
 (define_mode_iterator MVE_5 [V8HI V4SI])
-(define_mode_iterator MVE_6 [V8HI V4SI])
 (define_mode_iterator MVE_7 [V16BI V8BI V4BI V2QI])
 (define_mode_iterator MVE_7_HI [HI V16BI V8BI V4BI V2QI])
 (define_mode_iterator MVE_V8HF [V8HF])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 9e3570c5264..74909ce47e1 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -3732,9 +3732,9 @@
 ;; [vldrhq_gather_offset_s vldrhq_gather_offset_u]
 ;;
 (define_insn "mve_vldrhq_gather_offset_"
-  [(set (match_operand:MVE_6 0 "s_register_operand" "=")
-   (unspec:MVE_6 [(match_operand: 1 "memory_operand" "Us")
-  (match_operand:MVE_6 2 "s_register_operand" "w")]
+  [(set (match_operand:MVE_5 0 "s_register_operand" "=")
+   (unspec:MVE_5 [(match_operand: 1 "memory_operand" "Us")
+  (match_operand:MVE_5 2 "s_register_operand" "w")]
VLDRHGOQ))
   ]
   "TARGET_HAVE_MVE"
@@ -3755,9 +3755,9 @@
 ;; [vldrhq_gather_offset_z_s vldrhq_gather_offset_z_u]
 ;;
 (define_insn "mve_vldrhq_gather_offset_z_"
-  [(set (match_operand:MVE_6 0 "s_register_operand" "=")
-   (unspec:MVE_6 [(match_operand: 1 "memory_operand" "Us")
-  (match_operand:MVE_6 2 "s_register_operand" "w")
+  [(set (match_operand:MVE_5 0 "s_register_operand" "=")
+   (unspec:MVE_5 [(match_operand: 1 "memory_operand" "Us")
+  (match_operand:MVE_5 2 "s_register_operand" "w")
   (match_operand: 3 "vpr_register_operand" "Up")
]VLDRHGOQ))
   ]
@@ -3780,9 +3780,9 @@
 ;; [vldrhq_gather_shifted_offset_s vldrhq_gather_shifted_offset_u]
 ;;
 (define_insn "mve_vldrhq_gather_shifted_offset_"
-  [(set (match_operand:MVE_6 0 "s_register_operand" "=")
-   (unspec:MVE_6 [(match_operand: 1 "memory_operand" "Us")
-  (match_operand:MVE_6 2 "s_register_operand" "w")]
+  [(set (match_operand:MVE_5 0 "s_register_operand" "=")
+   (unspec:MVE_5 [(match_operand: 1 "memory_operand" "Us")
+  (match_operand:MVE_5 2 "s_register_operand" "w")]
VLDRHGSOQ))
   ]
   "TARGET_HAVE_MVE"
@@ -3803,9 +3803,9 @@
 ;; [vldrhq_gather_shifted_offset_z_s vldrhq_gather_shited_offset_z_u]
 ;;
 (define_insn "mve_vldrhq_gather_shifted_offset_z_"
-  [(set (match_operand:MVE_6 0 "s_register_operand" "=")
-   (unspec:MVE_6 [(match_operand: 1 "memory_operand" "Us")
-  (match_operand:MVE_6 2 "s_register_operand" "w")
+  [(set (match_operand:MVE_5 0 "s_register_operand" "=")
+   (unspec:MVE_5 [(match_operand: 1 "memory_operand" "Us")
+  (match_operand:MVE_5 2 "s_register_operand" "w")
   (match_operand: 3 "vpr_register_operand" "Up")
]VLDRHGSOQ))
   ]
@@ -3828,8 +3828,8 @@
 ;; [vldrhq_s, vldrhq_u]
 ;;
 (define_insn "mve_vldrhq_"
-  [(set (match_operand:MVE_6 0 "s_register_operand" "=w")
-   (unspec:MVE_6 [(match_operand: 1 "mve_memory_operand" "Ux")]
+  [(set (match_operand:MVE_5 0 "s_register_operand" "=w")
+   (unspec:MVE_5 [(match_operand: 1 "mve_memory_operand" "Ux")]
 VLDRHQ))
   ]
   "TARGET_HAVE_MVE"
@@ -3870,8 +3870,8 @@
 ;; [vldrhq_z_s vldrhq_z_u]
 ;;
 (define_insn "mve_vldrhq_z_"
-  [(set (match_operand:MVE_6 0 "s_register_operand" "=w")
-   (unspec:MVE_6 [(match_operand: 1 "mve_memory_operand" "Ux")
+  [(set (match_operand:MVE_5 0 "s_register_operand" "=w")
+   (unspec:MVE_5 [(match_operand: 1 "mve_memory_operand" "Ux")
(match_operand: 2 "vpr_register_operand" "Up")]
 VLDRHQ))
   ]
@@ -4449,7 +4449,7 @@
 (define_insn "mve_vstrhq_p_"
   [(set (match_operand: 0 "mve_memory_operand" "=Ux")
(unspec:
-[(match_operand:MVE_6 1 "s_register_operand" "w")
+[(match_operand:MVE_5 1 "s_register_operand" "w")
  (match_operand: 2 "vpr_register_operand" "Up")
  (match_dup 0)]
 VSTRHQ))
@@ -4470,8 +4470,8 @@
 ;;
 (define_expand "mve_vstrhq_scatter_offset_p_"
   [(match_operand: 0 "mve_scatter_memory")
-   (match_operand:MVE_6 1 "s_register_operand")
-   (match_operand:MVE_6 2 "s_register_operand")
+   (match_operand:MVE_5 1 "s_register_operand")
+   (match_operand:MVE_5 2 "s_register_operand")
(match_operand: 3

RE: [PATCH] RISC-V: Add autovec sign/zero extension and truncation.

2023-05-25 Thread Li, Pan2 via Gcc-patches

The zero-scratch-regs-3.c comes from below PATCH. 

https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615494.html

Hi Yanzhang,

Could you please help to double check the issue reported by Robin? Aka: " 
zero-scratch-regs-3.c seems to FAIL in vcondu but that already happens on 
trunk."

Thanks a lot.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Robin Dapp via Gcc-patches
Sent: Thursday, May 25, 2023 5:03 PM
To: gcc-patches ; Kito Cheng ; 
palmer ; juzhe.zh...@rivai.ai; jeffreyalaw 

Cc: rdapp@gmail.com
Subject: [PATCH] RISC-V: Add autovec sign/zero extension and truncation.

Hi,

this patch implements the autovec expanders for sign and zero extension 
patterns as well as the accompanying truncations.  In order to use them 
additional mode_attr iterators as well as vectorizer hooks are required.
Using these hooks we can e.g. vectorize with VNx4QImode as base mode and extend 
VNx4SI to VNx4DI.  They are still going to be expanded in the future.

vf4 and vf8 truncations are emulated by truncating two and three times 
respectively.

The patch also adds tests and changes some expectations for already existing 
ones.

Combine does not yet handle binary operations of two widened operands as we are 
missing the necessary split/rewrite patterns.  These will be added at a later 
time.

Co-authored-by: Juzhe Zhong 

riscv.exp testsuite is unchanged.  zero-scratch-regs-3.c seems to FAIL in 
vcondu but that already happens on trunk.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (2): New
expander.
(2): Dito.
(2): Dito.
(trunc2): Dito.
(trunc2): Dito.
(trunc2): Dito.
* config/riscv/riscv-protos.h (riscv_v_ext_mode_p): Declare.
(vectorize_related_mode): Define.
(autovectorize_vector_modes): Define.
* config/riscv/riscv-v.cc (vectorize_related_mode): Implement
hook.
(autovectorize_vector_modes): Implement hook.
* config/riscv/riscv.cc (riscv_v_ext_tuple_mode_p): Export.
(riscv_autovectorize_vector_modes): Implement target hook.
(riscv_vectorize_related_mode): Implement target hook.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
(TARGET_VECTORIZE_RELATED_MODE): Define.
* config/riscv/vector-iterators.md: Add lowercase versions of
mode_attr iterators.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Adjust
expectation.
* gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: Dito.
* gcc.target/riscv/rvv/rvv.exp: Add new conversion tests.
* gcc.target/riscv/rvv/vsetvl/avl_single-38.c: Do not vectorize.
* gcc.target/riscv/rvv/vsetvl/avl_single-47.c: Dito.
* gcc.target/riscv/rvv/vsetvl/avl_single-48.c: Dito.
* gcc.target/riscv/rvv/vsetvl/avl_single-49.c: Dito.
* gcc.target/riscv/rvv/vsetvl/imm_switch-8.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vsext-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vsext-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vsext-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vsext-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vzext-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vzext-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vzext-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vzext-template.h: New test.
---
 gcc/config/riscv/autovec.md   | 104 ++
 gcc/config/riscv/riscv-protos.h   |   5 +
 gcc/config/riscv/riscv-v.cc   |  83 ++
 gcc/config/riscv/riscv.cc |  31 +-
 gcc/config/riscv/vector-iterators.md  |  33 +-
 .../riscv/rvv/autovec/binop/shift-rv32gcv.c   |   1 -
 .../riscv/rvv/autovec/binop/shift-rv64gcv.c   |

Re: Re: [PATCH V15] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread 钟居哲

Thank you so much for your patience.
Could you take a look at V16 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619652.html 
whether it is ok for trunk ?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-25 18:19
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V15] VECT: Add decrement IV iteration loop control by 
variable amount support
"juzhe.zh...@rivai.ai"  writes:
> Hi， Richard. Thanks for the comments.
>
>>> if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
>>> || !iv_rgc
>>> || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
>>> != rgc->max_nscalars_per_iter * rgc->factor))
>>>   {
>   >>   /* See whether zero-based IV would ever generate all-false 
> masks
>>> or zero length before wrapping around.  */
>>>  bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, 
> rgc);
>  
>>>  /* Set up all controls for this group.  */
>  >>test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
> >>  _seq,
> >>  _seq,
> >>  loop_cond_gsi, 
> rgc,
> >>  niters, 
> niters_skip,
> >>  might_wrap_p);
>  
>>>  iv_rgc = rgc;
>   >> }
>
>
> Could you tell me why you add:
> (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
>>> != rgc->max_nscalars_per_iter * rgc->factor) ?
 
The patch creates IVs with the following step:
 
  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
 
If nitems_step is the same for two IVs, those IVs will always be equal.
 
So having multiple IVs with the same nitems_step is redundant.
 
nitems_step is calculated as follows:
 
  unsigned int nitems_per_iter = rgc->max_nscalars_per_iter * rgc->factor;
  ...
  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
  ...
 
  if (nitems_per_iter != 1)
{
  ...
  tree iv_factor = build_int_cst (iv_type, nitems_per_iter);
  ...
  nitems_step = gimple_build (preheader_seq, MULT_EXPR, iv_type,
  nitems_step, iv_factor);
  ...
}
 
so nitems_per_step is equal to:
 
  rgc->max_nscalars_per_iter * rgc->factor * VF
 
VF is fixed for a loop, so nitems_step is equal for two different
rgroup_controls if:
 
  rgc->max_nscalars_per_iter * rgc->factor
 
is the same for those rgroup_controls.
 
Please try the example I posted earlier today. I think you'll see that,
without the:
 
  (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
   != rgc->max_nscalars_per_iter * rgc->factor)
 
you'll have two IVs with the same step (because their MIN_EXPRs have
the same bound).
 
Thanks,
Richard

1 2 >

1 - 100 of 194 matches

Mail list logo