Re: [Patch, fortran] PR87477 - (associate) - [meta-bug] [F03] issues concerning the ASSOCIATE statement

2023-06-07 Thread Paul Richard Thomas via Gcc-patches
Hi Harald,

In answer to your question:
void
gfc_replace_expr (gfc_expr *dest, gfc_expr *src)
{
  free_expr0 (dest);
  *dest = *src;
  free (src);
}
So it does indeed do the job.

I should perhaps have remarked that, following the divide error,
gfc_simplify_expr was returning a mutilated version of the expression
and this was somehow connected with successfully simplifying the
parentheses. Copying and replacing on no errors deals with the
problem.

Thanks

Paul

On Wed, 7 Jun 2023 at 19:38, Harald Anlauf  wrote:
>
> Hi Paul!
>
> On 6/7/23 18:10, Paul Richard Thomas via Gcc-patches wrote:
> > Hi All,
> >
> > Three more fixes for PR87477. Please note that PR99350 was a blocker
> > but, as pointed out in comment #5 of the PR, this has nothing to do
> > with the associate construct.
> >
> > All three fixes are straight forward and the .diff + ChangeLog suffice
> > to explain them. 'rankguessed' was made redundant by the last PR87477
> > fix.
> >
> > Regtests on x86_64 - good for mainline?
> >
> > Paul
> >
> > Fortran: Fix some more blockers in associate meta-bug [PR87477]
> >
> > 2023-06-07  Paul Thomas  
> >
> > gcc/fortran
> > PR fortran/99350
> > * decl.cc (char_len_param_value): Simplify a copy of the expr
> > and replace the original if there is no error.
>
> This seems to lack a gfc_free_expr (p) in case the gfc_replace_expr
> is not executed, leading to a possible memleak.  Can you check?
>
> @@ -1081,10 +1082,10 @@ char_len_param_value (gfc_expr **expr, bool
> *deferred)
> if (!gfc_expr_check_typed (*expr, gfc_current_ns, false))
>   return MATCH_ERROR;
>
> -  /* If gfortran gets an EXPR_OP, try to simplify it.  This catches things
> - like CHARACTER(([1])).   */
> -  if ((*expr)->expr_type == EXPR_OP)
> -gfc_simplify_expr (*expr, 1);
> +  /* Try to simplify the expression to catch things like
> CHARACTER(([1])).   */
> +  p = gfc_copy_expr (*expr);
> +  if (gfc_is_constant_expr (p) && gfc_simplify_expr (p, 1))
> +gfc_replace_expr (*expr, p);
> else
>   gfc_free_expr (p);
>
> > * gfortran.h : Remove the redundant field 'rankguessed' from
> > 'gfc_association_list'.
> > * resolve.cc (resolve_assoc_var): Remove refs to 'rankguessed'.
> >
> > PR fortran/107281
> > * resolve.cc (resolve_variable): Associate names with constant
> > or structure constructor targets cannot have array refs.
> >
> > PR fortran/109451
> > * trans-array.cc (gfc_conv_expr_descriptor): Guard expression
> > character length backend decl before using it. Suppress the
> > assignment if lhs equals rhs.
> > * trans-io.cc (gfc_trans_transfer): Scalarize transfer of
> > associate variables pointing to a variable. Add comment.
> > * trans-stmt.cc (trans_associate_var): Remove requirement that
> > the character length be deferred before assigning the value
> > returned by gfc_conv_expr_descriptor. Also, guard the backend
> > decl before testing with VAR_P.
> >
> > gcc/testsuite/
> > PR fortran/99350
> > * gfortran.dg/pr99350.f90 : New test.
> >
> > PR fortran/107281
> > * gfortran.dg/associate_5.f03 : Changed error message.
> > * gfortran.dg/pr107281.f90 : New test.
> >
> > PR fortran/109451
> > * gfortran.dg/associate_61.f90 : New test
>
> Otherwise LGTM.
>
> Thanks for the patch!
>
> Harald
>
>


-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [PATCH] optabs: Implement double-word ctz and ffs expansion

2023-06-07 Thread Richard Biener via Gcc-patches



> Am 07.06.2023 um 18:59 schrieb Jakub Jelinek via Gcc-patches 
> :
> 
> Hi!
> 
> We have expand_doubleword_clz for a couple of years, where we emit
> double-word CLZ as if (high_word == 0) return CLZ (low_word) + word_size;
> else return CLZ (high_word);
> We can do something similar for CTZ and FFS IMHO, just with the 2
> words swapped.  So if (low_word == 0) return CTZ (high_word) + word_size;
> else return CTZ (low_word); for CTZ and
> if (low_word == 0) { return high_word ? FFS (high_word) + word_size : 0;
> else return FFS (low_word);
> 
> The following patch implements that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 
> Note, on some targets which implement both word_mode ctz and ffs patterns,
> it might be better to incrementally implement those double-word ffs expansion
> patterns in md files, because we aren't able to optimize it correctly;
> nothing can detect we have just made sure that argument is not 0 and so
> don't need to bother with handling that case.  So, on ia32 just using
> CTZ patterns would be better there, but I think we can even do better and
> instead of doing the comparisons of the operands against 0 do the CTZ
> expansion followed by testing of flags.
> 
> 2023-06-07  Jakub Jelinek  
> 
>* optabs.cc (expand_ffs): Add forward declaration.
>(expand_doubleword_clz): Rename to ...
>(expand_doubleword_clz_ctz_ffs): ... this.  Add UNOPTAB argument,
>handle also doubleword CTZ and FFS in addition to CLZ.
>(expand_unop): Adjust caller.  Also call it for doubleword
>ctz_optab and ffs_optab.
> 
>* gcc.target/i386/ctzll-1.c: New test.
>* gcc.target/i386/ffsll-1.c: New test.
> 
> --- gcc/optabs.cc.jj2023-06-07 09:42:14.701130305 +0200
> +++ gcc/optabs.cc2023-06-07 14:35:04.909879272 +0200
> @@ -2697,10 +2697,14 @@ expand_clrsb_using_clz (scalar_int_mode
>   return temp;
> }
> 
> -/* Try calculating clz of a double-word quantity as two clz's of word-sized
> -   quantities, choosing which based on whether the high word is nonzero.  */
> +static rtx expand_ffs (scalar_int_mode, rtx, rtx);
> +
> +/* Try calculating clz, ctz or ffs of a double-word quantity as two clz, ctz 
> or
> +   ffs operations on word-sized quantities, choosing which based on whether 
> the
> +   high (for clz) or low (for ctz and ffs) word is nonzero.  */
> static rtx
> -expand_doubleword_clz (scalar_int_mode mode, rtx op0, rtx target)
> +expand_doubleword_clz_ctz_ffs (scalar_int_mode mode, rtx op0, rtx target,
> +   optab unoptab)
> {
>   rtx xop0 = force_reg (mode, op0);
>   rtx subhi = gen_highpart (word_mode, xop0);
> @@ -2709,6 +2713,7 @@ expand_doubleword_clz (scalar_int_mode m
>   rtx_code_label *after_label = gen_label_rtx ();
>   rtx_insn *seq;
>   rtx temp, result;
> +  int addend = 0;
> 
>   /* If we were not given a target, use a word_mode register, not a
>  'mode' register.  The result will fit, and nobody is expecting
> @@ -2721,6 +2726,9 @@ expand_doubleword_clz (scalar_int_mode m
>  'target' to tag a REG_EQUAL note on.  */
>   result = gen_reg_rtx (word_mode);
> 
> +  if (unoptab != clz_optab)
> +std::swap (subhi, sublo);
> +
>   start_sequence ();
> 
>   /* If the high word is not equal to zero,
> @@ -2728,7 +2736,13 @@ expand_doubleword_clz (scalar_int_mode m
>   emit_cmp_and_jump_insns (subhi, CONST0_RTX (word_mode), EQ, 0,
>   word_mode, true, hi0_label);
> 
> -  temp = expand_unop_direct (word_mode, clz_optab, subhi, result, true);
> +  if (optab_handler (unoptab, word_mode) != CODE_FOR_nothing)
> +temp = expand_unop_direct (word_mode, unoptab, subhi, result, true);
> +  else
> +{
> +  gcc_assert (unoptab == ffs_optab);
> +  temp = expand_ffs (word_mode, subhi, result);
> +}
>   if (!temp)
> goto fail;
> 
> @@ -2739,14 +2753,32 @@ expand_doubleword_clz (scalar_int_mode m
>   emit_barrier ();
> 
>   /* Else clz of the full value is clz of the low word plus the number
> - of bits in the high word.  */
> + of bits in the high word.  Similarly for ctz/ffs of the high word,
> + except that ffs should be 0 when both words are zero.  */
>   emit_label (hi0_label);
> 
> -  temp = expand_unop_direct (word_mode, clz_optab, sublo, 0, true);
> +  if (unoptab == ffs_optab)
> +{
> +  convert_move (result, const0_rtx, true);
> +  emit_cmp_and_jump_insns (sublo, CONST0_RTX (word_mode), EQ, 0,
> +   word_mode, true, after_label);
> +}
> +
> +  if (optab_handler (unoptab, word_mode) != CODE_FOR_nothing)
> +temp = expand_unop_direct (word_mode, unoptab, sublo, NULL_RTX, true);
> +  else
> +{
> +  gcc_assert (unoptab == ffs_optab);
> +  temp = expand_unop_direct (word_mode, ctz_optab, sublo, NULL_RTX, 
> true);
> +  addend = 1;
> +}
> +
>   if (!temp)
> goto fail;
> +
>   temp = expand_binop (word_mode, add_optab, temp,
> -   gen_int_mode (GET_MODE_BITSIZE (word_mode), word_mode),
> +

Re: [PATCH] i386: Fix endless recursion in ix86_expand_vector_init_general with MMX [PR110152]

2023-06-07 Thread Richard Biener via Gcc-patches



> Am 07.06.2023 um 18:52 schrieb Jakub Jelinek via Gcc-patches 
> :
> 
> Hi!
> 
> I'm getting
> +FAIL: gcc.target/i386/3dnow-1.c (internal compiler error: Segmentation fault 
> signal terminated program cc1)
> +FAIL: gcc.target/i386/3dnow-1.c (test for excess errors)
> +FAIL: gcc.target/i386/3dnow-2.c (internal compiler error: Segmentation fault 
> signal terminated program cc1)
> +FAIL: gcc.target/i386/3dnow-2.c (test for excess errors)
> +FAIL: gcc.target/i386/mmx-1.c (internal compiler error: Segmentation fault 
> signal terminated program cc1)
> +FAIL: gcc.target/i386/mmx-1.c (test for excess errors)
> +FAIL: gcc.target/i386/mmx-2.c (internal compiler error: Segmentation fault 
> signal terminated program cc1)
> +FAIL: gcc.target/i386/mmx-2.c (test for excess errors)
> regressions on i686-linux since r14-1166.  The problem is when
> ix86_expand_vector_init_general is called with mmx_ok = true and
> mode = V4HImode, it newly recurses with mmx_ok = false and mode = V2SImode,
> but as mmx_ok is false and !TARGET_SSE, we recurse again with the same
> arguments (ok, fresh new tmp and vals) infinitely.
> The following patch fixes that by passing mmx_ok to that recursive call.
> For n_words == 4 it isn't needed, because we only care about mmx_ok for
> V2SImode or V2SFmode and no other modes.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Richard 

> 2023-06-07  Jakub Jelinek  
> 
>PR target/110152
>* config/i386/i386-expand.cc (ix86_expand_vector_init_general): For
>n_words == 2 recurse with mmx_ok as first argument rather than false.
> 
> --- gcc/config/i386/i386-expand.cc.jj2023-06-03 15:32:04.489410367 +0200
> +++ gcc/config/i386/i386-expand.cc2023-06-07 10:31:34.715981752 +0200
> @@ -16371,7 +16371,7 @@ quarter:
>  machine_mode concat_mode = tmp_mode == DImode ? V2DImode : V2SImode;
>  rtx tmp = gen_reg_rtx (concat_mode);
>  vals = gen_rtx_PARALLEL (concat_mode, gen_rtvec_v (2, words));
> -  ix86_expand_vector_init_general (false, concat_mode, tmp, vals);
> +  ix86_expand_vector_init_general (mmx_ok, concat_mode, tmp, vals);
>  emit_move_insn (target, gen_lowpart (mode, tmp));
>}
>   else if (n_words == 4)
> 
>Jakub
> 


Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-06-07 Thread Jeff Law via Gcc-patches




On 5/25/23 06:35, Manolis Tsamis wrote:

Implementation of the new RISC-V optimization pass for memory offset
calculations, documentation and testcases.

gcc/ChangeLog:

* config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
pass.
* config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
* config/riscv/riscv.opt: New options.
* config/riscv/t-riscv: New build rule.
* doc/invoke.texi: Document new option.
* config/riscv/riscv-fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

So not going into the guts of the patch yet.

From a benchmark standpoint the only two that get out of the +-0.05% 
range are mcf and deepsjeng (from a dynamic instruction standpoint).  So 
from an evaluation standpoint we can probably focus our efforts there. 
And as we know, mcf is actually memory bound, so while improving its 
dynamic instruction count is good, the end performance improvement may 
be marginal.


As I mentioned to Philipp many months ago this reminds me a lot of a 
problem I've seen before.  Basically register elimination emits code 
that can be terrible in some circumstances.  So I went and poked at this 
again.


I think the key difference between now and what I was dealing with 
before is for the cases that really matter for rv64 we have a shNadd 
insn in the sequence.  That private port I was working on before did not 
have shNadd (don't ask, I probably can't tell).  Our target also had 
reg+reg addressing modes.  What I can't remember was if we were trying 
harder to fold the constant terms into the memory reference or if we 
were more focused on the reg+reg.  Ultimately it's probably not that 
important to remember -- the key is there are very significant 
differences in the target's capabilities which impact how we should be 
generating code in this case.  Those differences affect the code we 
generate *and* the places where we can potentially get control and do 
some address rewriting.


A  key sequence in mcf looks something like this in IRA, others have 
similar structure:



(insn 237 234 239 26 (set (reg:DI 377)
(plus:DI (ashift:DI (reg:DI 200 [ _173 ])
(const_int 3 [0x3]))
(reg/f:DI 65 frame))) "pbeampp.c":139:15 333 {*shNadd}
 (nil))
(insn 239 237 235 26 (set (reg/f:DI 380)
(plus:DI (reg:DI 513)
(reg:DI 377))) "pbeampp.c":139:15 5 {adddi3}
 (expr_list:REG_DEAD (reg:DI 377)
(expr_list:REG_EQUAL (plus:DI (reg:DI 377)
(const_int -32768 [0x8000]))
(nil

[ ... ]

(insn 240 235 255 26 (set (reg/f:DI 204 [ _177 ])
(mem/f:DI (plus:DI (reg/f:DI 380)
(const_int 280 [0x118])) [7 *_176+0 S8 A64])) 
"pbeampp.c":139:15 179 {*movdi_64bit}
 (expr_list:REG_DEAD (reg/f:DI 380)
(nil)))



The key here is insn 237.  It's generally going to be bad to have FP 
show up in a shadd insn because its going to be eliminated into 
sp+offset.  That'll generate an input reload before insn 237 and we 
can't do any combination with the constant in insn 239.


After LRA it looks like this:


(insn 1540 234 1541 26 (set (reg:DI 11 a1 [750])
(const_int 32768 [0x8000])) "pbeampp.c":139:15 179 {*movdi_64bit}
 (nil))
(insn 1541 1540 1611 26 (set (reg:DI 12 a2 [749])
(plus:DI (reg:DI 11 a1 [750])
(const_int -272 [0xfef0]))) "pbeampp.c":139:15 5 
{adddi3}
 (expr_list:REG_EQUAL (const_int 32496 [0x7ef0])
(nil))) 
(insn 1611 1541 1542 26 (set (reg:DI 29 t4 [795])

(plus:DI (reg/f:DI 2 sp)
(const_int 64 [0x40]))) "pbeampp.c":139:15 5 {adddi3}
 (nil))
(insn 1542 1611 237 26 (set (reg:DI 12 a2 [749])
(plus:DI (reg:DI 12 a2 [749])
(reg:DI 29 t4 [795]))) "pbeampp.c":139:15 5 {adddi3}
 (nil))
(insn 237 1542 239 26 (set (reg:DI 12 a2 [377])
(plus:DI (ashift:DI (reg:DI 14 a4 [orig:200 _173 ] [200])
(const_int 3 [0x3]))
(reg:DI 12 a2 [749]))) "pbeampp.c":139:15 333 {*shNadd}
 (nil))
(insn 239 237 235 26 (set (reg/f:DI 12 a2 [380])
(plus:DI (reg:DI 10 a0 [513])
(reg:DI 12 a2 [377]))) "pbeampp.c":139:15 5 {adddi3}
 (expr_list:REG_EQUAL (plus:DI (reg:DI 12 a2 [377])
(const_int -32768 [0x8000]))
(nil))) 

[ ... ]

(insn 240 235 255 26 (set (reg/f:DI 14 a4 [orig:204 _177 ] [204])
(mem/f:DI (plus:DI (reg/f:DI 12 a2 [380])
(const_int 280 [0x118])) [7 *_176+0 S8 A64])) 
"pbeampp.c":139:15 179 {*movdi_64bit}
 (nil))



Reload/LRA made an absolute mess of that code.

But before we add a new pass (target specific or generic), I think it 
may be in our best 

[PATCH v6] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

2023-06-07 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to refactor the requirement of both the ZVFH
and ZVFHMIN. By default, the ZVFHMIN will enable FP16 for all the
iterators of RVV. And then the ZVFH will leverage one function as
the gate for FP16 supported or not.

Please note the ZVFH will cover the ZVFHMIN instructions. This patch
add one test for this.

Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (float_point_mode_supported_p):
New function to float point is supported by extension.
* config/riscv/riscv-v.cc (float_point_mode_supported_p):
Ditto.
* config/riscv/vector-iterators.md: Fix V_WHOLE and V_FRACT.
* config/riscv/vector.md: Add condition to FP define insn.
---
 gcc/config/riscv/riscv-protos.h  |   1 +
 gcc/config/riscv/riscv-v.cc  |  12 +++
 gcc/config/riscv/vector-iterators.md |  23 +++--
 gcc/config/riscv/vector.md   | 144 +++
 4 files changed, 105 insertions(+), 75 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index ebbaac255f9..e4881786b53 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -177,6 +177,7 @@ rtx expand_builtin (unsigned int, tree, rtx);
 bool check_builtin_call (location_t, vec, unsigned int,
   tree, unsigned int, tree *);
 bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
+bool float_point_mode_supported_p (machine_mode mode);
 bool legitimize_move (rtx, rtx);
 void emit_vlmax_vsetvl (machine_mode, rtx);
 void emit_hard_vlmax_vsetvl (machine_mode, rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 49752cd8899..1cc157f1858 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -418,6 +418,18 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
  && IN_RANGE (INTVAL (elt), minval, maxval));
 }
 
+/* Return true if the inner of mode is HFmode when ZVFH enabled, or other
+   float point machine mode.  */
+bool
+float_point_mode_supported_p (machine_mode mode)
+{
+  machine_mode inner_mode = GET_MODE_INNER (mode);
+
+  gcc_assert (FLOAT_MODE_P (inner_mode));
+
+  return inner_mode == HFmode ? TARGET_ZVFH : true;
+}
+
 /* Return true if VEC is a constant in which every element is in the range
[MINVAL, MAXVAL].  The elements do not need to have the same value.
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f4946d84449..234b712bc9d 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -453,9 +453,8 @@ (define_mode_iterator V_WHOLE [
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
 
-  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
-  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 32")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 64")
   (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
@@ -477,7 +476,11 @@ (define_mode_iterator V_WHOLE [
 (define_mode_iterator V_FRACT [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI (VNx4QI "TARGET_MIN_VLEN > 32") 
(VNx8QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") (VNx2HI "TARGET_MIN_VLEN > 32") (VNx4HI 
"TARGET_MIN_VLEN >= 128")
-  (VNx1HF "TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_MIN_VLEN > 32") (VNx4HF 
"TARGET_MIN_VLEN >= 128")
+
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+
   (VNx1SI "TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN < 128") (VNx2SI 
"TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN 
< 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
@@ -497,12 +500,12 @@ (define_mode_iterator VWEXTI [
 ])
 
 (define_mode_iterator VWEXTF [
-  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
-  (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
-  (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
+  (VNx1SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN < 128")
+  (VNx2SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
+  (VNx4SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
+  (VNx8SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
+  (VNx16SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 

[PATCH v2] LoongArch: Modify the register constraints for template "jumptable" and "indirect_jump" from "r" to "e" [PR110136]

2023-06-07 Thread Lulu Cheng
Micro-architecture unconditionally treats a "jr $ra" as "return from 
subroutine",
hence doing "jr $ra" would interfere with both subroutine return prediction and
the more general indirect branch prediction.

Therefore, a problem like PR110136 can cause a significant increase in branch 
error
prediction rate and affect performance. The same problem exists with 
"indirect_jump".

gcc/ChangeLog:

* config/loongarch/loongarch.md: Modify the register constraints for 
template
"jumptable" and "indirect_jump" from "r" to "e".

Co-authored-by: Andrew Pinski 
---
v1 -> v2:
  1. Modify the description
  2. Modify the register constraints of the template "indirect_jump".
---
 gcc/config/loongarch/loongarch.md | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 816a943d155..43a2ecc8957 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2895,6 +2895,10 @@ (define_insn "*jump_pic"
 }
   [(set_attr "type" "branch")])
 
+;; Micro-architecture unconditionally treats a "jr $ra" as "return from 
subroutine",
+;; hence doing "jr $ra" would interfere with both subroutine return prediction 
and
+;; the more general indirect branch prediction.
+
 (define_expand "indirect_jump"
   [(set (pc) (match_operand 0 "register_operand"))]
   ""
@@ -2905,7 +2909,7 @@ (define_expand "indirect_jump"
 })
 
 (define_insn "@indirect_jump"
-  [(set (pc) (match_operand:P 0 "register_operand" "r"))]
+  [(set (pc) (match_operand:P 0 "register_operand" "e"))]
   ""
   "jr\t%0"
   [(set_attr "type" "jump")
@@ -2928,7 +2932,7 @@ (define_expand "tablejump"
 
 (define_insn "@tablejump"
   [(set (pc)
-   (match_operand:P 0 "register_operand" "r"))
+   (match_operand:P 0 "register_operand" "e"))
(use (label_ref (match_operand 1 "" "")))]
   ""
   "jr\t%0"
-- 
2.31.1



[PATCH V5] VECT: Add SELECT_VL support

2023-06-07 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Co-authored-by: Richard Sandiford
Co-authored-by: Richard Biener 

This patch address comments from Richard && Richi and rebase to trunk.

This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.

This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750

The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:

1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration

Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
Co-authored-by: Richard Biener 

---
 gcc/doc/md.texi | 22 ++
 gcc/internal-fn.def |  1 +
 gcc/optabs.def  |  1 +
 gcc/tree-vect-loop-manip.cc | 32 ++
 gcc/tree-vect-loop.cc   | 72 +++
 gcc/tree-vect-stmts.cc  | 86 -
 gcc/tree-vectorizer.h   |  6 +++
 7 files changed, 201 insertions(+), 19 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6a435eb4461..95f7fe1f802 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of scalar iterations that should be handled
+by one iteration of a vector loop.  Operand 1 is the total number of
+scalar iterations that the loop needs to process and operand 2 is a
+maximum bound on the result (also known as the maximum ``vectorization
+factor'').
+
+The maximum value of operand 0 is given by:
+@smallexample
+operand0 = MIN (operand1, operand2)
+@end smallexample
+However, targets might choose a lower value than this, based on
+target-specific criteria.  Each iteration of the vector loop might
+therefore process a different number of scalar iterations, which in turn
+means that induction variables will have a variable step.  Because of
+this, it is generally not useful to define this instruction if it will
+always calculate the maximum value.
+
+This optab is only useful on targets that implement @samp{len_load_@var{m}}
+and/or @samp{len_store_@var{m}}.
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 3ac9d82aace..5d638de6d06 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -177,6 +177,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
 
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
 DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 6c064ff4993..f31b69c5d85 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -488,3 +488,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
 OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
 OPTAB_D (len_load_optab, "len_load_$a")
 OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3f735945e67..1c8100c1a1c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   _10 = (unsigned long) count_12(D);
   ...
   # ivtmp_9 = PHI 
-  _36 = MIN_EXPR ;
+  _36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
-_gsi, insert_after, _before_incr,
-_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
-   index_before_incr,
-   nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   {
+ create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, _gsi,
+insert_after, _before_incr, _after_incr);
+ tree len = 

[PATCH 3/4] rs6000: build constant via li/lis;rldicl/rldicr

2023-06-07 Thread Jiufu Guo via Gcc-patches
Hi,

This patch checks if a constant is possible left/right cleaned on a rotated
value from a negative value of "li/lis".  If so, we can build the constant
through "li/lis ; rldicl/rldicr".

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): New
function.
(can_be_built_by_li_lis_and_rldicr): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rldicr and
can_be_built_by_li_lis_and_rldicl.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.
---
 gcc/config/rs6000/rs6000.cc   | 61 ++-
 .../gcc.target/powerpc/const-build.c  | 44 +
 2 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 03cd9d5e952..2a3fa733b45 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10332,6 +10332,61 @@ can_be_built_by_li_lis_and_rotldi (HOST_WIDE_INT c, 
int *shift,
   return false;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li or lis',
+   another is rldicl.
+
+   If so, *SHIFT is set to the shift operand of rldicl, and *MASK is set to
+   the mask operand of rldicl, and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_lis_and_rldicl (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  /* Leading zeros may be cleaned by rldicl with a mask.  Change leading zeros
+ to ones and then recheck it.  */
+  int lz = clz_hwi (c);
+  HOST_WIDE_INT unmask_c
+= c | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - lz));
+  int n;
+  if (can_be_rotated_to_negative_li (unmask_c, )
+  || can_be_rotated_to_negative_lis (unmask_c, ))
+{
+  *mask = HOST_WIDE_INT_M1U >> lz;
+  *shift = n == 0 ? 0 : HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
+/* Check if value C can be built by 2 instructions: one is 'li or lis',
+   another is rldicr.
+
+   If so, *SHIFT is set to the shift operand of rldicr, and *MASK is set to
+   the mask operand of rldicr, and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  /* Tailing zeros may be cleaned by rldicr with a mask.  Change tailing zeros
+ to ones and then recheck it.  */
+  int tz = ctz_hwi (c);
+  HOST_WIDE_INT unmask_c = c | ((HOST_WIDE_INT_1U << tz) - 1);
+  int n;
+  if (can_be_rotated_to_negative_li (unmask_c, )
+  || can_be_rotated_to_negative_lis (unmask_c, ))
+{
+  *mask = HOST_WIDE_INT_M1U << tz;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10378,7 +10433,9 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
-  else if (can_be_built_by_li_lis_and_rotldi (c, , ))
+  else if (can_be_built_by_li_lis_and_rotldi (c, , )
+  || can_be_built_by_li_lis_and_rldicl (c, , )
+  || can_be_built_by_li_lis_and_rldicr (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
@@ -10387,6 +10444,8 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (temp, GEN_INT (imm));
   if (shift != 0)
temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  if (mask != HOST_WIDE_INT_M1)
+   temp = gen_rtx_AND (DImode, temp, GEN_INT (mask));
   emit_move_insn (dest, temp);
 }
   else if (ud3 == 0 && ud4 == 0)
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
index c38a1dd91f2..8c209921d41 100644
--- a/gcc/testsuite/gcc.target/powerpc/const-build.c
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -46,6 +46,42 @@ lis_rotldi_6 (void)
   return 0x5318LL;
 }
 
+long long NOIPA
+li_rldicl_7 (void)
+{
+  return 0x3ffa1LL;
+}
+
+long long NOIPA
+li_rldicl_8 (void)
+{
+  return 0xff8531LL;
+}
+
+long long NOIPA
+lis_rldicl_9 (void)
+{
+  return 0x00ff8531LL;
+}
+
+long long NOIPA
+li_rldicr_10 (void)
+{
+  return 0x8531fff0LL;
+}
+
+long long NOIPA
+li_rldicr_11 (void)
+{
+  return 0x21f0LL;
+}
+
+long long NOIPA
+lis_rldicr_12 (void)
+{
+  return 0x5310LL;
+}
+
 struct fun arr[] = {
   {li_rotldi_1, 0x75310LL},
   {li_rotldi_2, 0x2164LL},
@@ -53,9 +89,17 @@ struct fun arr[] = {
   {li_rotldi_4, 0x2194LL},
   

[PATCH 2/4] rs6000: build constant via lis;rotldi

2023-06-07 Thread Jiufu Guo via Gcc-patches
Hi,

This patch checks if a constant is possible to be rotated to/from a negative
value from "lis".  If so, we could use "lis;rotldi" to build it.
The positive value of "lis" does not need to be analyzed.  Because if a
constant can be rotated from the positive value of "lis", it also can be
rotated from a positive value of "li".

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): New
function.
(can_be_built_by_li_and_rotldi): Rename to ...
(can_be_built_by_li_lis_and_rotldi): ... this function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.
---
 gcc/config/rs6000/rs6000.cc   | 42 ---
 .../gcc.target/powerpc/const-build.c  | 16 ++-
 2 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 1dd0072350a..03cd9d5e952 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10278,19 +10278,51 @@ can_be_rotated_to_negative_li (HOST_WIDE_INT c, int 
*rot)
   return can_be_rotated_to_lowbits (~c, 15, rot);
 }
 
-/* Check if value C can be built by 2 instructions: one is 'li', another is
-   rotldi.
+/* Check if C can be rotated to a negative value which 'lis' instruction is
+   able to load: 1..1xx0..0.  If so, set *ROT to the number by which C is
+   rotated, and return true.  Return false otherwise.  */
+
+static bool
+can_be_rotated_to_negative_lis (HOST_WIDE_INT c, int *rot)
+{
+  /* case a. 1..1xxx0..01..1: up to 15 x's, at least 16 0's.  */
+  int leading_ones = clz_hwi (~c);
+  int tailing_ones = ctz_hwi (~c);
+  int middle_zeros = ctz_hwi (c >> tailing_ones);
+  if (middle_zeros >= 16 && leading_ones + tailing_ones >= 33)
+{
+  *rot = HOST_BITS_PER_WIDE_INT - tailing_ones;
+  return true;
+}
+
+  /* case b. xx0..01..1xx: some of 15 x's (and some of 16 0's) are
+ rotated over the highest bit.  */
+  int pos_one = clz_hwi ((c << 16) >> 16);
+  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_one));
+  int middle_ones = clz_hwi (~(c << pos_one));
+  if (middle_zeros >= 16 && middle_ones >= 33)
+{
+  *rot = pos_one;
+  return true;
+}
+
+  return false;
+}
+
+/* Check if value C can be built by 2 instructions: one is 'li or lis',
+   another is rotldi.
 
If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
is set to -1, and return true.  Return false otherwise.  */
 
 static bool
-can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+can_be_built_by_li_lis_and_rotldi (HOST_WIDE_INT c, int *shift,
   HOST_WIDE_INT *mask)
 {
   int n;
   if (can_be_rotated_to_positive_li (c, )
-  || can_be_rotated_to_negative_li (c, ))
+  || can_be_rotated_to_negative_li (c, )
+  || can_be_rotated_to_negative_lis (c, ))
 {
   *mask = HOST_WIDE_INT_M1;
   *shift = HOST_BITS_PER_WIDE_INT - n;
@@ -10346,7 +10378,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
-  else if (can_be_built_by_li_and_rotldi (c, , ))
+  else if (can_be_built_by_li_lis_and_rotldi (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
index 70f095f6bf2..c38a1dd91f2 100644
--- a/gcc/testsuite/gcc.target/powerpc/const-build.c
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -34,14 +34,28 @@ li_rotldi_4 (void)
   return 0x2194LL;
 }
 
+long long NOIPA
+lis_rotldi_5 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+lis_rotldi_6 (void)
+{
+  return 0x5318LL;
+}
+
 struct fun arr[] = {
   {li_rotldi_1, 0x75310LL},
   {li_rotldi_2, 0x2164LL},
   {li_rotldi_3, 0x8531LL},
   {li_rotldi_4, 0x2194LL},
+  {lis_rotldi_5, 0x8531LL},
+  {lis_rotldi_6, 0x5318LL},
 };
 
-/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mrotldi\M} 6 } } */
 
 int
 main ()
-- 
2.39.1



[PATCH 4/4] rs6000: build constant via li/lis;rldic

2023-06-07 Thread Jiufu Guo via Gcc-patches
Hi,

This patch checks if a constant is possible to be built by "li;rldic".
We only need to take care of "negative li", other forms do not need to check.
For example, "negative lis" is just a "negative li" with an additional shift.

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rldic): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rldic.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.
---
 gcc/config/rs6000/rs6000.cc   | 61 ++-
 .../gcc.target/powerpc/const-build.c  | 28 +
 2 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 2a3fa733b45..cd04b6b5c82 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10387,6 +10387,64 @@ can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, 
int *shift,
   return false;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   rldic.
+
+   If so, *SHIFT is set to the 'shift' operand of rldic; and *MASK is set
+   to the mask value about the 'mb' operand of rldic; and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int *shift, HOST_WIDE_INT *mask)
+{
+  /* There are 49 successive ones in the negative value of 'li'.  */
+  int ones = 49;
+
+  /* 1..1xx1..1: negative value of li --> 0..01..1xx0..0:
+ right bits are shifted as 0's, and left 1's(and x's) are cleaned.  */
+  int tz = ctz_hwi (c);
+  int lz = clz_hwi (c);
+  int middle_ones = clz_hwi (~(c << lz));
+  if (tz + lz + middle_ones >= ones)
+{
+  *mask = ((1LL << (HOST_BITS_PER_WIDE_INT - tz - lz)) - 1LL) << tz;
+  *shift = tz;
+  return true;
+}
+
+  /* 1..1xx1..1 --> 1..1xx0..01..1: some 1's(following x's) are cleaned. */
+  int leading_ones = clz_hwi (~c);
+  int tailing_ones = ctz_hwi (~c);
+  int middle_zeros = ctz_hwi (c >> tailing_ones);
+  if (leading_ones + tailing_ones + middle_zeros >= ones)
+{
+  *mask = ~(((1ULL << middle_zeros) - 1ULL) << tailing_ones);
+  *shift = tailing_ones + middle_zeros;
+  return true;
+}
+
+  /* xx1..1xx: --> xx0..01..1xx: some 1's(following x's) are cleaned. */
+  /* Get the position for the first bit of successive 1.
+ The 24th bit would be in successive 0 or 1.  */
+  HOST_WIDE_INT low_mask = (1LL << 24) - 1LL;
+  int pos_first_1 = ((c & (low_mask + 1)) == 0)
+ ? clz_hwi (c & low_mask)
+ : HOST_BITS_PER_WIDE_INT - ctz_hwi (~(c | low_mask));
+  middle_ones = clz_hwi (~c << pos_first_1);
+  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_first_1));
+  if (pos_first_1 < HOST_BITS_PER_WIDE_INT
+  && middle_ones + middle_zeros < HOST_BITS_PER_WIDE_INT
+  && middle_ones + middle_zeros >= ones)
+{
+  *mask = ~(((1ULL << middle_zeros) - 1LL)
+   << (HOST_BITS_PER_WIDE_INT - pos_first_1));
+  *shift = HOST_BITS_PER_WIDE_INT - pos_first_1 + middle_zeros;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10435,7 +10493,8 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 }
   else if (can_be_built_by_li_lis_and_rotldi (c, , )
   || can_be_built_by_li_lis_and_rldicl (c, , )
-  || can_be_built_by_li_lis_and_rldicr (c, , ))
+  || can_be_built_by_li_lis_and_rldicr (c, , )
+  || can_be_built_by_li_and_rldic (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
index 8c209921d41..b503ee31c7c 100644
--- a/gcc/testsuite/gcc.target/powerpc/const-build.c
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -82,6 +82,29 @@ lis_rldicr_12 (void)
   return 0x5310LL;
 }
 
+long long NOIPA
+li_rldic_13 (void)
+{
+  return 0x000f8531LL;
+}
+long long NOIPA
+li_rldic_14 (void)
+{
+  return 0x853100ffLL;
+}
+
+long long NOIPA
+li_rldic_15 (void)
+{
+  return 0x8031LL;
+}
+
+long long NOIPA
+li_rldic_16 (void)
+{
+  return 0x8f31LL;
+}
+
 struct fun arr[] = {
   {li_rotldi_1, 0x75310LL},
   {li_rotldi_2, 0x2164LL},
@@ -95,11 +118,16 @@ struct fun arr[] = {
   {li_rldicr_10, 0x8531fff0LL},
   {li_rldicr_11, 0x21f0LL},
   {lis_rldicr_12, 0x5310LL},
+  {li_rldic_13, 0x000f8531LL},
+  {li_rldic_14, 0x853100ffLL},
+  {li_rldic_15, 0x8031LL},
+  {li_rldic_16, 0x8f31LL}
 };
 
 /* { dg-final { scan-assembler-times 

[PATCH 1/4] rs6000: build constant via li;rotldi

2023-06-07 Thread Jiufu Guo via Gcc-patches
Hi,

This patch checks if a constant is possible to be rotated to/from a positive
or negative value from "li". If so, we could use "li;rotldi" to build it.

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_rotated_to_positive_li): New function.
(can_be_rotated_to_negative_li): New function.
(can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.
---
 gcc/config/rs6000/rs6000.cc   | 64 +--
 .../gcc.target/powerpc/const-build.c  | 54 
 2 files changed, 112 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 42f49e4a56b..1dd0072350a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10258,6 +10258,48 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }
 
+/* Check if C can be rotated to a positive value which 'li' instruction
+   is able to load.  If so, set *ROT to the number by which C is rotated,
+   and return true.  Return false otherwise.  */
+
+static bool
+can_be_rotated_to_positive_li (HOST_WIDE_INT c, int *rot)
+{
+  /* 49 leading zeros and 15 low bits on the positive value
+ generated by 'li' instruction.  */
+  return can_be_rotated_to_lowbits (c, 15, rot);
+}
+
+/* Like can_be_rotated_to_positive_li, but check the negative value of 'li'.  
*/
+
+static bool
+can_be_rotated_to_negative_li (HOST_WIDE_INT c, int *rot)
+{
+  return can_be_rotated_to_lowbits (~c, 15, rot);
+}
+
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   rotldi.
+
+   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
+   is set to -1, and return true.  Return false otherwise.  */
+
+static bool
+can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  int n;
+  if (can_be_rotated_to_positive_li (c, )
+  || can_be_rotated_to_negative_li (c, ))
+{
+  *mask = HOST_WIDE_INT_M1;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10266,15 +10308,14 @@ static void
 rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
+  int shift;
+  HOST_WIDE_INT mask;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
 
   ud1 = c & 0x;
-  c = c >> 16;
-  ud2 = c & 0x;
-  c = c >> 16;
-  ud3 = c & 0x;
-  c = c >> 16;
-  ud4 = c & 0x;
+  ud2 = (c >> 16) & 0x;
+  ud3 = (c >> 32) & 0x;
+  ud4 = (c >> 48) & 0x;
 
   if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
@@ -10305,6 +10346,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
+  else if (can_be_built_by_li_and_rotldi (c, , ))
+{
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  unsigned HOST_WIDE_INT imm = (c | ~mask);
+  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
+
+  emit_move_insn (temp, GEN_INT (imm));
+  if (shift != 0)
+   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  emit_move_insn (dest, temp);
+}
   else if (ud3 == 0 && ud4 == 0)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
new file mode 100644
index 000..70f095f6bf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+#define NOIPA __attribute__ ((noipa))
+
+struct fun
+{
+  long long (*f) (void);
+  long long val;
+};
+
+long long NOIPA
+li_rotldi_1 (void)
+{
+  return 0x75310LL;
+}
+
+long long NOIPA
+li_rotldi_2 (void)
+{
+  return 0x2164LL;
+}
+
+long long NOIPA
+li_rotldi_3 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+li_rotldi_4 (void)
+{
+  return 0x2194LL;
+}
+
+struct fun arr[] = {
+  {li_rotldi_1, 0x75310LL},
+  {li_rotldi_2, 0x2164LL},
+  {li_rotldi_3, 0x8531LL},
+  {li_rotldi_4, 0x2194LL},
+};
+
+/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+
+int
+main ()
+{
+  for (int i = 0; i < sizeof (arr) / sizeof (arr[0]); i++)
+if ((*arr[i].f) () != arr[i].val)
+  

[PATCH V2 0/4] rs6000: build constant via li/lis;rldicX

2023-06-07 Thread Jiufu Guo via Gcc-patches
Hi,

These patches are just minor changes based on previous version/comments.
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611286.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620489.html
And also update the wording for patches in this series.

For a given constant, it would be profitable if we can use 2 insns to build.
This patch enables more constants building through 2 insns: one is "li or lis",
another is 'rldicl, rldicr or rldic'.
Through checking and analyzing the characters of the insns "li/lis;rldicX",
all the possible constant values are considered by this patch.

The below patches are in this series.

Considering the functionality and size, 4 patches are split as below:
1. Support the constants which can be built by "li;rotldi"
   Both positive and negative values from insn "li" are analyzed.
2. Support the constants which can be built by "lis;rotldi"
   We only need to analyze the negative value from "lis".
   And this patch uses more code to check leading 1s and tailing 0s from "lis".
3. Support the constants which can be built by "li/lis;rldicl/rldicr":
   Leverage the APIs defined/analyzed in patches 1 and 2,
   this patch checks the characters for the mask of "rldicl/rldicr"
   to support more constants.
4. Support the constants which can be built by "li/lis;rldic":
   The mask of "rldic" is relatively complicated, it is analyzed in this
   patch to support more constants.

BR,
Jeff (Jiufu)


Re: [PATCH] In the pipeline, UNRECOG INSN is not executed in advance if it starts a live range.

2023-06-07 Thread Jin Ma via Gcc-patches
ping: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619951.html

Ref: 
http://patchwork.ozlabs.org/project/gcc/patch/20230323080734.423-1-ji...@linux.alibaba.com/

Re: Followup on PR/109279: large constants on RISCV

2023-06-07 Thread Jeff Law via Gcc-patches




On 6/1/23 20:38, Vineet Gupta wrote:

Hi Jeff,

I finally got around to collecting various observations on PR/109279 - 
more importantly the state of large constants in RV backend, apologies 
in advance for the long email.


It seems the various commits in area have improved the original test 
case of 0x1010101_01010101


   Before 2e886eef7f2b  |   With 2e886eef7f2b   | With 
0530254413f8 | With c104ef4b5eb1
Right.  The handling of that constant shows a nice progression.  On our 
architecture the latter two versions are probably equivalent from a 
latency standpoint, but the last is obviously best as it's smaller and 
probably better on in-order architectures as well.





But same commits seem to have regressed Andrew's test from same PR 
(which is the theme of this email).

The seemingly contrived test turned out to be much more than I'd hoped for.

    long long f(void)
    {
  unsigned t = 0x101_0101;
  long long t1 = t;
  long long t2 = ((unsigned long long )t) << 32;
  asm("":"+r"(t1));
  return t1 | t2;
    }

[ ... ]
It may be more instructions, but I suspect they end up being the same 
performance for us across all three varaints.  Fusion and out-of-order 
execution save the day.  But I realize there may be targets where the 
first is going to be preferred.






   Before 2e886eef7f2b  |   With 2e886eef7f2b    | With 0530254413f8
     (ideal code)   | define_insn_and_split  | "splitter relaxed new
    |    |  pseudos"
    li   a0,0x101   |    li   a5,0x101   |    li a0,0x101_
    addi a0,a0,0x101    |    addi a5,a5,0x101    |    addi a0,a0,0x101
    slli a5,a0,32   |    mv   a0,a5  |    li a5,0x101_
    or   a0,a0,a5   |    slli a5,a5,32   |    slli a0,a0,32
    ret |    or   a0,a0,a5   |    addi a5,a5,0x101
    |    ret |    or   a0,a5,a0
     |    ret

As a baseline, RTL just before cse1 (in 260r.dfinit) in all of above is:

[ ... ]
Right. Standard looking synthesis.





Prior to 2e886eef7f2b, cse1 could do its job: finding oldest equivalent 
registers for the fragments of const and reusing the reg.

Right.  That's what I would expect.

[ ... ]




With 2e886eef7f2b, define_insn_and_split "*mvconst_internal" recog() 
kicks in during cse1, eliding insns for a const_int.


    (insn 7 6 8 2 (set (reg:DI 137)
     (const_int [0x1010101])) {*mvconst_internal}
     (expr_list:REG_EQUAL (const_int [0x1010101])))
    [...]

    (insn 11 10 12 2 (set (reg:DI 140)
     (const_int [0x1010101_])) {*mvconst_internal}
     (expr_list:REG_EQUAL (const_int  [0x1010101_]) ))
Understood.  Not ideal, but we generally don't have good ways to limit 
patterns to being available at different times during the optimization 
phase.  One thing you might want to try (which I thought we used at one 
point) was make the pattern conditional on cse_not_expected.  The goal 
would be to avoid exposing the pattern until a later point in the 
optimizer pipeline.  It may have been the case that we dropped that over 
time during development.  It's all getting fuzzy at this point.




Eventually split1 breaks it up using same mvconst_internal splitter, but 
the cse opportunity has been lost.
Right.  I'd have to look at the pass definitions, but I suspect the 
splitting pass where this happens is after the last standard CSE pass. 
So we don't get a chance to CSE the constant synthesis.



*This is a now a baseline for large consts handling for RV backend which 
we all need to be aware of*.
Understood.  Though it's not as bad as you might think :-)  You can 
spend an inordinate amount of time improving constant synthesis, 
generate code that looks really good, but in the end it may not make a 
bit of different in real performance.  Been there, done that.  I'm not 
saying we give up, but we need to keep in mind that we're often better 
off trading a bit on the constant synthesis if doing so helps code where 
those constants get used.






(2) Now on to the nuances as to why things get progressively worse after 
commit 0530254413f8.


It all seems to get down to register allocation passes:

sched1 before 0530254413f8

    ;; 0--> b  0: i  22 r140=0x101    :alu
    ;; 1--> b  0: i  20 r137=0x101    :alu
    ;; 2--> b  0: i  23 r140=r140+0x101   :alu
    ;; 3--> b  0: i  21 r137=r137+0x101   :alu
    ;; 4--> b  0: i  24 r140=r140<<0x20   :alu
    ;; 5--> b  0: i  25 r136=r137 :alu
    ;; 6--> b  0: i   8 r136=asm_operands :nothing
    ;; 7--> b  0: i  17 a0=r136|r140  :alu
    ;; 8--> b  0: i  18 use a0    :nothing

sched1 with 0530254413f8

    ;; 0--> b  0: i  22 r144=0x101    :alu
    ;; 1--> b  0: i  20 r143=0x101    :alu
    ;; 2--> b  0: i  23 r145=r144+0x101   :alu
    ;; 3--> b  0: i  21 

Re: [PATCH 2/3] Change the `zero_one ==/!= 0) ? y : z y` patterns to use multiply rather than `(-zero_one) & z`

2023-06-07 Thread Andrew Pinski via Gcc-patches
On Wed, Jun 7, 2023 at 4:11 PM Jeff Law  wrote:
>
>
>
> On 6/7/23 17:05, Andrew Pinski wrote:
> > On Wed, Jun 7, 2023 at 3:57 PM Jeff Law via Gcc-patches
> >  wrote:
> >>
> >>
> >>
> >> On 6/7/23 15:32, Andrew Pinski via Gcc-patches wrote:
> >>> Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` 
> >>> already,
> >>> it is better if we don't do a secondary transformation. This reduces the 
> >>> extra
> >>> statements produced by match-and-simplify on the gimple level too.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>* match.pd (`zero_one ==/!= 0) ? y : z  y`): Use
> >>>multiply rather than negation/bit_and.
> >> Don't you need to check the types in a manner similar to what the A & -Y
> >> -> X * Y pattern does before you make this transformation?
> >
> > No, because the convert is in a different order than in that
> > transformation; a very subtle difference which makes it work.
> >
> > In A & -Y it was matching:
> > (bit_and  (convert? (negate
> > But here we have:
> > (bit_and (negate (convert
> > Notice the convert is in a different location, in the `A & -Y` case,
> > the convert needs to be a sign extending (or a truncation) of the
> > negative value. Here we are converting the one_zero_value to the new
> > type so we get zero_one in the new type and then doing the negation
> > getting us 0 or -1 value.
> THanks for the clarification.  OK for the trunk.

So even though my transformation is correct based on what was done in
match.pd but that was broken already for signed one bit integers:
```
struct s
{
  int t : 1;
};
int f(struct s t, int a, int b)
{
int bd = t.t;
if (bd) a|=b;
return a;
}
```
I am going to withdraw this patch and fix that up first.

Thanks,
Andrew

>
> jeff


Re: [PATCH 3/3] Add Plus to the op list of `(zero_one == 0) ? y : z y` pattern

2023-06-07 Thread Jeff Law via Gcc-patches




On 6/7/23 15:32, Andrew Pinski via Gcc-patches wrote:

This adds plus to the op list of `(zero_one == 0) ? y : z  y` patterns
which currently has bit_ior and bit_xor.
This shows up now in GCC after the boolization work that Uroš has been doing.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/97711
PR tree-optimization/110155

gcc/ChangeLog:

* match.pd ((zero_one == 0) ? y : z  y): Add plus to the op.
((zero_one != 0) ? z  y : y): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/branchless-cond-add-2.c: New test.
* gcc.dg/tree-ssa/branchless-cond-add.c: New test.

OK
jeff


Re: [PATCH 2/3] Change the `zero_one ==/!= 0) ? y : z y` patterns to use multiply rather than `(-zero_one) & z`

2023-06-07 Thread Jeff Law via Gcc-patches




On 6/7/23 17:05, Andrew Pinski wrote:

On Wed, Jun 7, 2023 at 3:57 PM Jeff Law via Gcc-patches
 wrote:




On 6/7/23 15:32, Andrew Pinski via Gcc-patches wrote:

Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` 
already,
it is better if we don't do a secondary transformation. This reduces the extra
statements produced by match-and-simplify on the gimple level too.

gcc/ChangeLog:

   * match.pd (`zero_one ==/!= 0) ? y : z  y`): Use
   multiply rather than negation/bit_and.

Don't you need to check the types in a manner similar to what the A & -Y
-> X * Y pattern does before you make this transformation?


No, because the convert is in a different order than in that
transformation; a very subtle difference which makes it work.

In A & -Y it was matching:
(bit_and  (convert? (negate
But here we have:
(bit_and (negate (convert
Notice the convert is in a different location, in the `A & -Y` case,
the convert needs to be a sign extending (or a truncation) of the
negative value. Here we are converting the one_zero_value to the new
type so we get zero_one in the new type and then doing the negation
getting us 0 or -1 value.

THanks for the clarification.  OK for the trunk.

jeff


[nvptx PATCH] Update nvptx's bitrev2 pattern to use BITREVERSE rtx.

2023-06-07 Thread Roger Sayle

This minor tweak to the nvptx backend switches the representation of
of the brev instruction from an UNSPEC to instead use the new BITREVERSE
rtx.  This allows various RTL optimizations including evaluation (constant
folding) of integer constant arguments at compile-time.

This patch has been tested on nvptx-none with make and make -k check
with no new failures.  Ok for mainline?


2023-06-07  Roger Sayle  

gcc/ChangeLog
* config/nvptx/nvptx.md (UNSPEC_BITREV): Delete.
(bitrev2): Represent using bitreverse.


Thanks in advance,
Roger
--

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 1bb9304..7a7c994 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -34,8 +34,6 @@
UNSPEC_FPINT_CEIL
UNSPEC_FPINT_NEARBYINT
 
-   UNSPEC_BITREV
-
UNSPEC_ALLOCA
 
UNSPEC_SET_SOFTSTACK
@@ -636,8 +634,7 @@
 
 (define_insn "bitrev2"
   [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
-   (unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")]
-UNSPEC_BITREV))]
+   (bitreverse:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")))]
   ""
   "%.\\tbrev.b%T0\\t%0, %1;")
 


Re: [PATCH 2/3] Change the `zero_one ==/!= 0) ? y : z y` patterns to use multiply rather than `(-zero_one) & z`

2023-06-07 Thread Andrew Pinski via Gcc-patches
On Wed, Jun 7, 2023 at 3:57 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 6/7/23 15:32, Andrew Pinski via Gcc-patches wrote:
> > Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` 
> > already,
> > it is better if we don't do a secondary transformation. This reduces the 
> > extra
> > statements produced by match-and-simplify on the gimple level too.
> >
> > gcc/ChangeLog:
> >
> >   * match.pd (`zero_one ==/!= 0) ? y : z  y`): Use
> >   multiply rather than negation/bit_and.
> Don't you need to check the types in a manner similar to what the A & -Y
> -> X * Y pattern does before you make this transformation?

No, because the convert is in a different order than in that
transformation; a very subtle difference which makes it work.

In A & -Y it was matching:
(bit_and  (convert? (negate
But here we have:
(bit_and (negate (convert
Notice the convert is in a different location, in the `A & -Y` case,
the convert needs to be a sign extending (or a truncation) of the
negative value. Here we are converting the one_zero_value to the new
type so we get zero_one in the new type and then doing the negation
getting us 0 or -1 value.

Thanks,
Andrew

>
> jeff
>


Re: [PATCH 2/3] Change the `zero_one ==/!= 0) ? y : z y` patterns to use multiply rather than `(-zero_one) & z`

2023-06-07 Thread Jeff Law via Gcc-patches




On 6/7/23 15:32, Andrew Pinski via Gcc-patches wrote:

Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` 
already,
it is better if we don't do a secondary transformation. This reduces the extra
statements produced by match-and-simplify on the gimple level too.

gcc/ChangeLog:

* match.pd (`zero_one ==/!= 0) ? y : z  y`): Use
multiply rather than negation/bit_and.
Don't you need to check the types in a manner similar to what the A & -Y 
-> X * Y pattern does before you make this transformation?


jeff



[Committed] Bug fix to new wi::bitreverse_large function.

2023-06-07 Thread Roger Sayle

Richard Sandiford was, of course, right to be warry of new code without
much test coverage.  Converting the nvptx backend to use the BITREVERSE
rtx infrastructure, has resulted in far more exhaustive testing and
revealed a subtle bug in the new wi::bitreverse implementation.  The
code needs to use HOST_WIDE_INT_1U (instead of 1) to avoid unintended
sign extension.

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(with a minor tweak to use BITREVERSE), where it fixes regressions of
the 32-bit test vectors in gcc.target/nvptx/brev-2.c and the 64-bit
test vectors in gcc.target/nvptx/brevll-2.c.  Committed as obvious.


2023-06-07  Roger Sayle  

gcc/ChangeLog
* wide-int.cc (wi::bitreverse_large): Use HOST_WIDE_INT_1U to
avoid sign extension/undefined behaviour when setting each bit.


Thanks,
Roger
--

diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc
index 24bdce2..ab92ee6 100644
--- a/gcc/wide-int.cc
+++ b/gcc/wide-int.cc
@@ -786,7 +786,7 @@ wi::bitreverse_large (HOST_WIDE_INT *val, const 
HOST_WIDE_INT *xval,
  unsigned int d = (precision - 1) - s;
  block = d / HOST_BITS_PER_WIDE_INT;
  offset = d & (HOST_BITS_PER_WIDE_INT - 1);
-  val[block] |= 1 << offset;
+  val[block] |= HOST_WIDE_INT_1U << offset;
}
 }
 


Re: [PATCH 1/3] MATCH: Allow unsigned types for `X & -Y -> X * Y` pattern

2023-06-07 Thread Jeff Law via Gcc-patches




On 6/7/23 15:32, Andrew Pinski via Gcc-patches wrote:

This allows unsigned types if the inner type where the negation is
located has greater than or equal to precision than the outer type.

branchless-cond.c needs to be updated since now we change it to
use a multiply rather than still having (-a) in there.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd (`X & -Y -> X * Y`): Allow for truncation
and the same type for unsigned types.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.

OK.
jeff


Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-07 Thread Jeff Law via Gcc-patches




On 5/25/23 06:35, Manolis Tsamis wrote:

Propagation of the stack pointer in cprop_hardreg is currenty forbidden
in all cases, due to maybe_mode_change returning NULL. Relax this
restriction and allow propagation when no mode change is requested.

gcc/ChangeLog:

 * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
Thanks for the clarification.  This is OK for the trunk.  It looks 
generic enough to have value going forward now rather than waiting.


jeff


Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-07 Thread Jeff Law via Gcc-patches




On 5/31/23 06:15, Manolis Tsamis wrote:

On Thu, May 25, 2023 at 4:38 PM Jeff Law  wrote:




On 5/25/23 06:35, Manolis Tsamis wrote:

Propagation of the stack pointer in cprop_hardreg is currenty forbidden
in all cases, due to maybe_mode_change returning NULL. Relax this
restriction and allow propagation when no mode change is requested.

gcc/ChangeLog:

  * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.

I can't see how this can be correct given the stack pointer equality
tests elsewhere in the compiler, particularly the various targets.

The problem is if you change the mode then you end up with multiple REG
expressions that reference the stack pointer.

See rev: d1446456c3fcaa7be628726c9de4a877729490ca and the thread around
the change which introduced this code.



Hi Jeff,

Isn't this fine for this case since:

   1) stack_pointer_rtx is used which won't cause issues with pointer
equalities (If I understand correctly).
   2) Propagation is guarded with `if (orig_mode == new_mode)` so only
when there is no mode change.
I must have missed #2 -- is that something that changed since the first 
iteration for Ventana many months ago?


Anyway, hoping to make meaningful progress on these two patches over the 
next couple days.


jeff


[pushed] c++: allow NRV and non-NRV returns [PR58487]

2023-06-07 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Now that we support NRV from an inner block, we can also support non-NRV
returns from other blocks, since once the NRV is out of scope a later return
expression can't possibly alias it.

This fixes 58487 and half-fixes 53637: now one of the returns is elided, but
not the other.

Fixing the remaining xfails in these testcases will require a very different
approach, probably involving a full tree/block walk from finalize_nrv, and
check_return_expr only adding to a list of potential return variables.

PR c++/58487
PR c++/53637

gcc/cp/ChangeLog:

* cp-tree.h (INIT_EXPR_NRV_P): New.
* semantics.cc (finalize_nrv_r): Check it.
* name-lookup.h (decl_in_scope_p): Declare.
* name-lookup.cc (decl_in_scope_p): New.
* typeck.cc (check_return_expr): Allow non-NRV
returns if the NRV is no longer in scope.

gcc/testsuite/ChangeLog:

* g++.dg/opt/nrv26.C: New test.
* g++.dg/opt/nrv26a.C: New test.
* g++.dg/opt/nrv27.C: New test.
---
 gcc/cp/cp-tree.h  |  5 +
 gcc/cp/name-lookup.h  |  1 +
 gcc/cp/name-lookup.cc | 22 ++
 gcc/cp/semantics.cc   |  8 +++
 gcc/cp/typeck.cc  | 37 ---
 gcc/testsuite/g++.dg/opt/nrv26.C  | 19 
 gcc/testsuite/g++.dg/opt/nrv26a.C | 18 +++
 gcc/testsuite/g++.dg/opt/nrv27.C  | 23 +++
 8 files changed, 121 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/nrv26.C
 create mode 100644 gcc/testsuite/g++.dg/opt/nrv26a.C
 create mode 100644 gcc/testsuite/g++.dg/opt/nrv27.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 87572e3574d..83982233111 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -444,6 +444,7 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   REINTERPRET_CAST_P (in NOP_EXPR)
   ALIGNOF_EXPR_STD_P (in ALIGNOF_EXPR)
   OVL_DEDUP_P (in OVERLOAD)
+  INIT_EXPR_NRV_P (in INIT_EXPR)
   ATOMIC_CONSTR_MAP_INSTANTIATED_P (in ATOMIC_CONSTR)
   contract_semantic (in ASSERTION_, PRECONDITION_, POSTCONDITION_STMT)
1: IDENTIFIER_KIND_BIT_1 (in IDENTIFIER_NODE)
@@ -4078,6 +4079,10 @@ struct GTY(()) lang_decl {
 #define DELETE_EXPR_USE_VEC(NODE) \
   TREE_LANG_FLAG_1 (DELETE_EXPR_CHECK (NODE))
 
+/* True iff this represents returning a potential named return value.  */
+#define INIT_EXPR_NRV_P(NODE) \
+  TREE_LANG_FLAG_0 (INIT_EXPR_CHECK (NODE))
+
 #define CALL_OR_AGGR_INIT_CHECK(NODE) \
   TREE_CHECK2 ((NODE), CALL_EXPR, AGGR_INIT_EXPR)
 
diff --git a/gcc/cp/name-lookup.h b/gcc/cp/name-lookup.h
index b3e708561d8..613745ba501 100644
--- a/gcc/cp/name-lookup.h
+++ b/gcc/cp/name-lookup.h
@@ -449,6 +449,7 @@ extern void resort_type_member_vec (void *, void *,
 extern vec *set_class_bindings (tree, int extra = 0);
 extern void insert_late_enum_def_bindings (tree, tree);
 extern tree innermost_non_namespace_value (tree);
+extern bool decl_in_scope_p (tree);
 extern cxx_binding *outer_binding (tree, cxx_binding *, bool);
 extern void cp_emit_debug_info_for_using (tree, tree);
 
diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index eb5c333b5ea..b8ca7306a28 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -7451,6 +7451,28 @@ innermost_non_namespace_value (tree name)
   return binding ? binding->value : NULL_TREE;
 }
 
+/* True iff current_binding_level is within the potential scope of local
+   variable DECL. */
+
+bool
+decl_in_scope_p (tree decl)
+{
+  gcc_checking_assert (DECL_FUNCTION_SCOPE_P (decl));
+
+  tree name = DECL_NAME (decl);
+
+  for (cxx_binding *iter = NULL;
+   (iter = outer_binding (name, iter, /*class_p=*/false)); )
+{
+  if (!LOCAL_BINDING_P (iter))
+   return false;
+  if (iter->value == decl)
+   return true;
+}
+
+  return false;
+}
+
 /* Look up NAME in the current binding level and its superiors in the
namespace of variables, functions and typedefs.  Return a ..._DECL
node of some kind representing its definition if there is only one
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 1d397b6f257..a2e74a5d2c7 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -4956,7 +4956,7 @@ finalize_nrv_r (tree* tp, int* walk_subtrees, void* data)
   /* If there's a label, we might need to destroy the NRV on goto (92407).  */
   else if (TREE_CODE (*tp) == LABEL_EXPR)
 dp->simple = false;
-  /* Change all returns to just refer to the RESULT_DECL; this is a nop,
+  /* Change NRV returns to just refer to the RESULT_DECL; this is a nop,
  but differs from using NULL_TREE in that it indicates that we care
  about the value of the RESULT_DECL.  But preserve anything appended
  by check_return_expr.  */
@@ -4965,9 +4965,9 @@ finalize_nrv_r (tree* tp, int* walk_subtrees, void* data)
   tree *p = _OPERAND (*tp, 0);
   while 

Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-07 Thread Joseph Myers
On Wed, 7 Jun 2023, Qing Zhao via Gcc-patches wrote:

> Are you suggesting to use identifier directly as the argument of the 
> attribute?
> I tried this in the beginning, however, the current parser for the attribute 
> argument can not identify that this identifier is a field identifier inside 
> the same structure. 
> 
> For example:
> 
> int count;
> struct trailing_array_7 {
>   Int count;
>   int array_7[] __attribute ((element_count (count))); 
> };
> 
> The identifier “count” inside the attribute will refer to the variable 
> “int count” outside of the structure.

c_parser_attribute_arguments is supposed to allow an identifier as an 
attribute argument - and not look it up (the user of the attribute would 
later need to look it up in the context of the containing structure).  
Callers use attribute_takes_identifier_p to determine which attributes 
take identifiers (versus expressions) as arguments, which would need 
updating to cover the new attribute.

There is a ??? comment about the case where the identifier is declared as 
a type name.  That would simply be one of the cases carried over from the 
old Bison parser, and it would seem reasonable to remove that 
special-casing so that the attribute works even when the identifier is 
declared as a typedef name as an ordinary identifier, since it's fine for 
structure members to have the same name as a typedef name.

Certainly taking an identifier directly seems like cleaner syntax than 
taking a string that then needs reinterpreting as an identifier.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH 3/3] Add Plus to the op list of `(zero_one == 0) ? y : z y` pattern

2023-06-07 Thread Andrew Pinski via Gcc-patches
This adds plus to the op list of `(zero_one == 0) ? y : z  y` patterns
which currently has bit_ior and bit_xor.
This shows up now in GCC after the boolization work that Uroš has been doing.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/97711
PR tree-optimization/110155

gcc/ChangeLog:

* match.pd ((zero_one == 0) ? y : z  y): Add plus to the op.
((zero_one != 0) ? z  y : y): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/branchless-cond-add-2.c: New test.
* gcc.dg/tree-ssa/branchless-cond-add.c: New test.
---
 gcc/match.pd   |  4 ++--
 .../gcc.dg/tree-ssa/branchless-cond-add-2.c|  8 
 .../gcc.dg/tree-ssa/branchless-cond-add.c  | 18 ++
 3 files changed, 28 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add.c

diff --git a/gcc/match.pd b/gcc/match.pd
index c38b39fb45c..f633271f76c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3689,7 +3689,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (max @2 @1))
 
 /* (zero_one == 0) ? y : z  y -> ((typeof(y))zero_one * z)  y */
-(for op (bit_xor bit_ior)
+(for op (bit_xor bit_ior plus)
  (simplify
   (cond (eq zero_one_valued_p@0
 integer_zerop)
@@ -3701,7 +3701,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(op (mult (convert:type @0) @2) @1
 
 /* (zero_one != 0) ? z  y : y -> ((typeof(y))zero_one * z)  y */
-(for op (bit_xor bit_ior)
+(for op (bit_xor bit_ior plus)
  (simplify
   (cond (ne zero_one_valued_p@0
 integer_zerop)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add-2.c
new file mode 100644
index 000..27607e10f88
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add-2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* PR tree-optimization/97711 */
+
+int f (int x) { return x & 1 ? x - 1 : x; }
+
+/* { dg-final { scan-tree-dump-times " & -2" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add.c
new file mode 100644
index 000..0d81c07b03a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* PR tree-optimization/110155 */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z + y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z + y : y;
+}
+
+/* { dg-final { scan-tree-dump-times " \\\*" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\+ " 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
-- 
2.31.1



[PATCH 2/3] Change the `zero_one ==/!= 0) ? y : z y` patterns to use multiply rather than `(-zero_one) & z`

2023-06-07 Thread Andrew Pinski via Gcc-patches
Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` 
already,
it is better if we don't do a secondary transformation. This reduces the extra
statements produced by match-and-simplify on the gimple level too.

gcc/ChangeLog:

* match.pd (`zero_one ==/!= 0) ? y : z  y`): Use
multiply rather than negation/bit_and.
---
 gcc/match.pd | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 7b95b63cee4..c38b39fb45c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3688,7 +3688,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
   (max @2 @1))
 
-/* (zero_one == 0) ? y : z  y -> (-(typeof(y))zero_one & z)  y */
+/* (zero_one == 0) ? y : z  y -> ((typeof(y))zero_one * z)  y */
 (for op (bit_xor bit_ior)
  (simplify
   (cond (eq zero_one_valued_p@0
@@ -3698,9 +3698,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type)
&& TYPE_PRECISION (type) > 1
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
-   (op (bit_and (negate (convert:type @0)) @2) @1
+   (op (mult (convert:type @0) @2) @1
 
-/* (zero_one != 0) ? z  y : y -> (-(typeof(y))zero_one & z)  y */
+/* (zero_one != 0) ? z  y : y -> ((typeof(y))zero_one * z)  y */
 (for op (bit_xor bit_ior)
  (simplify
   (cond (ne zero_one_valued_p@0
@@ -3710,7 +3710,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type)
&& TYPE_PRECISION (type) > 1
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
-   (op (bit_and (negate (convert:type @0)) @2) @1
+   (op (mult (convert:type @0) @2) @1
 
 /* Simplifications of shift and rotates.  */
 
-- 
2.31.1



[PATCH 1/3] MATCH: Allow unsigned types for `X & -Y -> X * Y` pattern

2023-06-07 Thread Andrew Pinski via Gcc-patches
This allows unsigned types if the inner type where the negation is
located has greater than or equal to precision than the outer type.

branchless-cond.c needs to be updated since now we change it to
use a multiply rather than still having (-a) in there.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd (`X & -Y -> X * Y`): Allow for truncation
and the same type for unsigned types.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
---
 gcc/match.pd| 5 -
 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c | 6 +++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 4ad037d641a..7b95b63cee4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2058,7 +2058,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type)
   && INTEGRAL_TYPE_P (TREE_TYPE (@0))
   && TREE_CODE (TREE_TYPE (@0)) != BOOLEAN_TYPE
-  && !TYPE_UNSIGNED (TREE_TYPE (@0)))
+  /* Sign extending of the neg or a truncation of the neg
+ is needed. */
+  && (!TYPE_UNSIGNED (TREE_TYPE (@0))
+ || TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0
   (mult (convert @0) @1)))
 
 /* Narrow integer multiplication by a zero_one_valued_p operand.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
index 68087ae6568..e063dc4bb5f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -21,6 +21,6 @@ int f4(unsigned int x, unsigned int y, unsigned int z)
   return ((x & 1) != 0) ? z | y : y;
 }
 
-/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
-/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
-/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
-- 
2.31.1



Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-07 Thread Qing Zhao via Gcc-patches


> On Jun 7, 2023, at 4:53 PM, Joseph Myers  wrote:
> 
> On Wed, 7 Jun 2023, Qing Zhao via Gcc-patches wrote:
> 
>> Hi, Joseph,
>> 
>> A question here:  can an identifier in C be a wide char string? 
> 
> Identifiers and strings are different kinds of tokens; an identifier can't 
> be a string of any kind, wide or narrow.  It just so happens that the 
> proposed interface here involves interpreting the contents of a string as 
> referring to an identifier (presumably for parsing convenience compared to 
> using an identifier directly in an attribute).

Are you suggesting to use identifier directly as the argument of the attribute?
I tried this in the beginning, however, the current parser for the attribute 
argument can not identify that this identifier is a field identifier inside the 
same structure. 

For example:

int count;
struct trailing_array_7 {
  Int count;
  int array_7[] __attribute ((element_count (count))); 
};

The identifier “count” inside the attribute will refer to the variable “int 
count” outside of the structure.

We need to introduce new syntax for this and also need to update the parser of 
the attribute.
Not sure at this moment whether the extra effort is necessary or not?
Any suggestions?

thanks.

Qing

> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com



[PATCH] MATCH: Fix comment for `(zero_one ==/!= 0) ? y : z y` patterns

2023-06-07 Thread Andrew Pinski via Gcc-patches
The patterns match more than just `a & 1` so change the comment
for these two patterns to say that.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd: Fix comment for the
`(zero_one ==/!= 0) ? y : z  y` patterns.
---
 gcc/match.pd | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index dc36927cd0f..8f3d99239ce 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3688,7 +3688,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
   (max @2 @1))
 
-/* ((x & 0x1) == 0) ? y : z  y -> (-(typeof(y))(x & 0x1) & z)  y */
+/* (zero_one == 0) ? y : z  y -> (-(typeof(y))zero_one & z)  y */
 (for op (bit_xor bit_ior)
  (simplify
   (cond (eq zero_one_valued_p@0
@@ -3700,7 +3700,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
(op (bit_and (negate (convert:type @0)) @2) @1
 
-/* ((x & 0x1) == 0) ? z  y : y -> (-(typeof(y))(x & 0x1) & z)  y */
+/* (zero_one != 0) ? z  y : y -> (-(typeof(y))zero_one & z)  y */
 (for op (bit_xor bit_ior)
  (simplify
   (cond (ne zero_one_valued_p@0
-- 
2.31.1



Re: [committed] Convert H8 port to LRA

2023-06-07 Thread Jeff Law via Gcc-patches




On 6/7/23 08:06, Andrew Pinski wrote:

On Sun, Jun 4, 2023 at 10:43 AM Jeff Law via Gcc-patches
 wrote:


With Vlad's recent LRA fix to the elimination code, the H8 can be
converted to LRA.


Could you update the h8300 entry on https://gcc.gnu.org/backends.html
for this change?
Thanks for the reminder.  I also updated the state for the ports I 
converted several weeks back.


jeff


Re: [PATCH] RISC-V: Add Veyron V1 pipeline description

2023-06-07 Thread Jeff Law via Gcc-patches




On 6/7/23 08:43, Jeff Law wrote:



On 6/7/23 08:13, Kito Cheng wrote:
I would like vendor cpu name start with vendor name, like 
ventana-veyron-v1 which is consistent with all other vendor cpu, and 
llvm are using same convention too.
Fair enough.  Better to get it right now than have this stuff be 
inconsistent.  It'll be a little more pain for our internal folks, but 
we'll deal with that :-)
I should have also noted that this seems to get a pretty consistent 1-2% 
improvement across spec2017.  Not surprisingly it reduces stalls at the 
retirement unit due to instructions not being completed.  We can see 
impacts elsewhere like fewer stalls due to conflicting resources at the 
dispatch stage.


It does make it more likely that we'll blow out the register file on 
x264's key SATD routine which shows up as a single digit regression for 
input #1.  The fix there is pretty simple, use register pressure 
scheduling, which we'll have some hard data on relatively soon.


jeff


Re: [PATCH] c++: unsynthesized defaulted constexpr fn [PR110122]

2023-06-07 Thread Jason Merrill via Gcc-patches

On 6/6/23 14:29, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

In the second testcase of PR110122, during regeneration of the generic
lambda with V=Bar{}, substitution followed by coerce_template_parms for
A's template argument naturally yields a copy of V in terms of Bar's
(implicitly) defaulted copy constructor.

This however happens inside a template context so although we introduced
a use of the copy constructor, mark_used didn't actually synthesize it,
which causes subsequent constant evaluation of the template argument to
fail with:

   nontype-class58.C: In instantiation of ‘void f() [with Bar V = Bar{Foo()}]’:
   nontype-class58.C:22:11:   required from here
   nontype-class58.C:18:18: error: ‘constexpr Bar::Bar(const Bar&)’ used before 
its definition

Conveniently we already make sure to instantiate eligible constexpr
functions before such (manifestly) constant evaluation, as per P0859R0.
So this patch fixes this by making sure to synthesize eligible defaulted
constexpr functions beforehand as well.


We probably also want to do this in cxx_eval_call_expression, under


  /* We can't defer instantiating the function any longer.  */


Jason



Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-07 Thread Joseph Myers
On Wed, 7 Jun 2023, Qing Zhao via Gcc-patches wrote:

> Hi, Joseph,
> 
> A question here:  can an identifier in C be a wide char string? 

Identifiers and strings are different kinds of tokens; an identifier can't 
be a string of any kind, wide or narrow.  It just so happens that the 
proposed interface here involves interpreting the contents of a string as 
referring to an identifier (presumably for parsing convenience compared to 
using an identifier directly in an attribute).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] riscv: Fix scope for memory model calculation

2023-06-07 Thread Jeff Law via Gcc-patches




On 6/7/23 13:15, Dimitar Dimitrov wrote:

On Tue, Jun 06, 2023 at 08:38:14PM -0600, Jeff Law wrote:




Regression tested for riscv32-none-elf. No changes in gcc.sum and
g++.sum.  I don't have setup to test riscv64.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Calculate
memmodel only when it is valid.

Good to see you poking around in the RISC-V world Dimitar!  Are you still
poking at the PRU as well?


Hi Jeff,

Yes, I'm still maintaining the PRU backend.

For this patch I was actually poking at the middle end, trying to
implement a small optimization for PRU (PR 106562).  And I wanted
to test if other targets would also benefit from it.
Ah!  Too bad, I'd love to have another engineer poking at RV stuff on a 
regular basis, but I'll take any cleanups/fixes/improvements you may 
have, of course!


RV32 isn't a bad test target though.  Certainly more modern than some of 
the ports you could have tested against.


Jeff


Re: [RFC] RISC-V: Eliminate extension after for *w instructions

2023-06-07 Thread Jeff Law via Gcc-patches



On 5/24/23 17:14, Jivan Hakobyan via Gcc-patches wrote:

Subject:
[RFC] RISC-V: Eliminate extension after for *w instructions
From:
Jivan Hakobyan via Gcc-patches 
Date:
5/24/23, 17:14

To:
gcc-patches@gcc.gnu.org


`This patch tries to prevent generating unnecessary sign extension
after *w instructions like "addiw" or "divw".

The main idea of it is to add SUBREG_PROMOTED fields during expanding.

I have tested on SPEC2017 there is no regression.
Only gcc.dg/pr30957-1.c test failed.
To solve that I did some changes in loop-iv.cc, but not sure that it is
suitable.


gcc/ChangeLog:
 * config/riscv/bitmanip.md (rotrdi3): New pattern.
 (rotrsi3): Likewise.
 (rotlsi3): Likewise.
 * config/riscv/riscv-protos.h (riscv_emit_binary): New function
 declaration
 * config/riscv/riscv.cc (riscv_emit_binary): Removed static
 * config/riscv/riscv.md (addsi3): New pattern
 (subsi3): Likewise.
 (negsi2): Likewise.
 (mulsi3): Likewise.
 (si3): New pattern for any_div.
 (si3): New pattern for any_shift.
 * loop-iv.cc (get_biv_step_1):  Process src of extension when it
PLUS

gcc/testsuite/ChangeLog:
 * testsuite/gcc.target/riscv/shift-and-2.c: New test
 * testsuite/gcc.target/riscv/shift-shift-2.c: New test
 * testsuite/gcc.target/riscv/sign-extend.c: New test
 * testsuite/gcc.target/riscv/zbb-rol-ror-03.c: New test


-- With the best regards Jivan Hakobyan


extend.diff

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 
96d31d92670b27d495dc5a9fbfc07e8767f40976..0430af7c95b1590308648dc4d5aaea78ada71760
 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -304,9 +304,9 @@
[(set_attr "type" "bitmanip,load")
 (set_attr "mode" "HI")])
  
-(define_expand "rotr3"

-  [(set (match_operand:GPR 0 "register_operand")
-   (rotatert:GPR (match_operand:GPR 1 "register_operand")
+(define_expand "rotrdi3"
+  [(set (match_operand:DI 0 "register_operand")
+   (rotatert:DI (match_operand:DI 1 "register_operand")
 (match_operand:QI 2 "arith_operand")))]
"TARGET_ZBB || TARGET_XTHEADBB || TARGET_ZBKB"

The condition for this expander needs to be adjusted.

Previously it used the GPR iterator.  The GPR iterator is defined like this:


(define_mode_iterator GPR [SI (DI "TARGET_64BIT")])

Note how the DI case is conditional on TARGET_64BIT.

This impacts the HAVE_* macros that are generated from the MD file in 
insn-flags.h:


#define HAVE_rotrsi3 (TARGET_ZBB || TARGET_XTHEADBB || TARGET_ZBKB)
#define HAVE_rotrdi3 ((TARGET_ZBB || TARGET_XTHEADBB || TARGET_ZBKB) && 
(TARGET_64BIT))


Note how the rotrdi3 has the && (TARGET_64BIT) on the end.

With your change we would expose rotrdi3 independent of TARGET_64BIT 
which is not what we want.



Sorry I didn't catch that earlier.  I'll fix this minor problem.




@@ -544,7 +562,7 @@
rtx t5 = gen_reg_rtx (DImode);
rtx t6 = gen_reg_rtx (DImode);
  
-  emit_insn (gen_addsi3 (operands[0], operands[1], operands[2]));

+  riscv_emit_binary(PLUS, operands[0], operands[1], operands[2]);
Just a note.  In GCC we always emit a space between the function name 
and the open parenthesis for its argument list.  I fixed a few of these.



@@ -867,8 +938,8 @@
  
emit_insn (gen_smul3_highpart (hp, operands[1], operands[2]));

emit_insn (gen_mul3 (operands[0], operands[1], operands[2]));
-  emit_insn (gen_ashr3 (lp, operands[0],
- GEN_INT (BITS_PER_WORD - 1)));
+  riscv_emit_binary(ASHIFTRT, lp, operands[0],
+ GEN_INT (BITS_PER_WORD - 1));
Another formatting nit.  When we wrap lines for an argument list, we 
line up the arguments.  So something like this


frobit (a, b, c
d, e, f);



Obviously that's not a great example as it doesn't need wrapping, but it 
should clearly show how we indent things in this case.  I've fixed up 
this nit.




diff --git a/gcc/loop-iv.cc b/gcc/loop-iv.cc
index 
6c40db947f7f549303f8bb4d4f38aa98b6561bcc..bec1ea7e4ccf7291bb3dba91161f948e66c7bea9
 100644
--- a/gcc/loop-iv.cc
+++ b/gcc/loop-iv.cc
@@ -637,7 +637,7 @@ get_biv_step_1 (df_ref def, scalar_int_mode outer_mode, rtx 
reg,
  {
rtx set, rhs, op0 = NULL_RTX, op1 = NULL_RTX;
rtx next, nextr;
-  enum rtx_code code;
+  enum rtx_code code, prev_code;
So as I mentioned earlier, PREV_CODE might be used without being 
initialized.  I've initialized it to "UNKNOWN" which is a special RTX 
code which can be used for this purpose.


If we are changing a target independent file the standard is that we 
bootstrap and regression test on at least one primary platform such as 
x86_64 linux.  This would have been caught by that bootstrap process as 
it's a pretty simple uninitialized object use to analyze.




rtx_insn *insn = DF_REF_INSN (def);
df_ref next_def;
enum iv_grd_result res;

Re: [PATCH] libstdc++: Fix up 20_util/to_chars/double.cc test for excess precision [PR110145]

2023-06-07 Thread Jonathan Wakely via Gcc-patches
On Wed, 7 Jun 2023 at 18:26, Jonathan Wakely  wrote:

>
>
> On Wed, 7 Jun 2023, 18:17 Jakub Jelinek via Libstdc++, <
> libstd...@gcc.gnu.org> wrote:
>
>> Hi!
>>
>> This test apparently contains 3 problematic floating point constants,
>> 1e126, 4.91e-6 and 5.547e-6.  These constants suffer from double rounding
>> when -fexcess-precision=standard evaluates double constants in the
>> precision
>> of Intel extended 80-bit long double.
>> As written in the PR, e.g. the first one is
>> 0x1.7a2ecc414a03f7ff6ca1cb527787b130a97d51e51202365p+418
>> in the precision of GCC's internal format, 80-bit long double has
>> 63-bit precision, so the above constant rounded to long double is
>> 0x1.7a2ecc414a03f800p+418L
>> (the least significant bit in the 0 before p isn't there already).
>> 0x1.7a2ecc414a03f800p+418L rounded to IEEE double is
>> 0x1.7a2ecc414a040p+418.
>> Now, if excess precision doesn't happen and we round the GCC's internal
>> format number directly to double, it is
>> 0x1.7a2ecc414a03fp+418 and that is the number the test expects.
>> One can see it on x86-64 (where excess precision to long double doesn't
>> happen) where double(1e126L) != 1e126.
>> The other two constants suffer from the same problem.
>>
>> The following patch tweaks the testcase, such that those problematic
>> constants are used only if FLT_EVAL_METHOD is 0 or 1 (i.e. when we have
>> guarantee the constants will be evaluated in double precision),
>> plus adds corresponding tests with hexadecimal constants which don't
>> suffer from this excess precision problem, they are exact in double
>> and long double can hold all double values.
>>
>> Bootstrapped/regtested on x86_64-linux and i686-linux, additionally
>> tested on the latter with
>> make check RUNTESTFLAGS='--target_board=unix/-fexcess-precision=standard
>> conformance.exp=to_chars/double.cc'
>> Ok for trunk?
>>
>
> Yes, OK.
>
> Thanks for solving this puzzle!
>

I think this would be good for gcc-13, as that has the new
-fexcess-precision semantics for -std=c++NN too, right?


>
>
>
>> 2023-06-07  Jakub Jelinek  
>>
>> PR libstdc++/110145
>> * testsuite/20_util/to_chars/double.cc: Include .
>> (double_to_chars_test_cases,
>> double_scientific_precision_to_chars_test_cases_2,
>> double_fixed_precision_to_chars_test_cases_2): #if out 1e126,
>> 4.91e-6
>> and 5.547e-6 tests if FLT_EVAL_METHOD is negative or larger than
>> 1.
>> Add unconditional tests with corresponding double constants
>> 0x1.7a2ecc414a03fp+418, 0x1.4981285e98e79p-18 and
>> 0x1.7440bbff418b9p-18.
>>
>> --- libstdc++-v3/testsuite/20_util/to_chars/double.cc.jj
>> 2022-11-03 22:16:08.542329555 +0100
>> +++ libstdc++-v3/testsuite/20_util/to_chars/double.cc   2023-06-07
>> 15:41:44.275604870 +0200
>> @@ -40,6 +40,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include 
>>
>> @@ -1968,9 +1969,19 @@ inline constexpr double_to_chars_testcas
>>  {1e125, chars_format::fixed,
>>
>>  
>> "248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
>>  "5841365553228283904"},
>> +#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
>> +// When long double is Intel extended and double constants are
>> evaluated in precision of
>> +// long double, this value is initialized to double(1e126L), which
>> is 0x1.7a2ecc414a040p+418 due to
>> +// double rounding of 0x1.7a2ecc414a03f7ff6p+418L first to
>> 0x1.7a2ecc414a03f800p+418L and
>> +// then to 0x1.7a2ecc414a040p+418, while when double constants are
>> evaluated in precision of
>> +// IEEE double, this is 0x1.7a2ecc414a03fp+418 which the test
>> expects.  See PR110145.
>>  {1e126, chars_format::fixed,
>>
>>  
>> "248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
>>  "58413655532282839040"},
>> +#endif
>> +{0x1.7a2ecc414a03fp+418, chars_format::fixed,
>> +
>>  
>> "248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
>> +   "58413655532282839040"},
>>  {1e127, chars_format::fixed,
>>
>>  
>> "549291066784979473595300225087383524118479625982517885450291174622154390152298057300868772"
>>  "377386949310916067328"},
>> @@ -2816,8 +2827,12 @@ inline constexpr double_to_chars_testcas
>>  {0x1.a6c767640cd71p+879, chars_format::scientific,
>> "6.6564021122018745e+264"},
>>
>>  // Incorrectly handled by dtoa_milo() (Grisu2), which doesn't
>> achieve shortest round-trip.
>> +#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
>>  {4.91e-6, chars_format::scientific, "4.91e-06"},
>>  {5.547e-6, chars_format::scientific, "5.547e-06"},
>> +#endif
>> +{0x1.4981285e98e79p-18, chars_format::scientific, "4.91e-06"},
>> +{0x1.7440bbff418b9p-18, chars_format::scientific, "5.547e-06"},
>>
>>  // Test hexfloat corner cases.
>>  {0x1.728p+0, 

Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-07 Thread Qing Zhao via Gcc-patches
Hi, Joseph,

A question here:  can an identifier in C be a wide char string? 

Qing

> On May 26, 2023, at 2:15 PM, Joseph Myers  wrote:
> 
> On Fri, 26 May 2023, Qing Zhao via Gcc-patches wrote:
> 
>>> What if the string is a wide string?  I don't expect that to work (either 
>>> as a matter of interface design, or in the present code), but I think that 
>>> case should have a specific check and error.
>> 
>> Dump question: how to check whether the string is a wide string? -:)
> 
> By examining the element type; the only valid case for the attribute would 
> be an element type of (const) char.  (I think it's reasonable to reject 
> all of char8_t, char16_t, char32_t, wchar_t strings in this context.)
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com



Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-07 Thread Richard Sandiford via Gcc-patches
Andrew Stubbs  writes:
> On 30/05/2023 07:26, Richard Biener wrote:
>> On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs  wrote:
>>>
>>> Hi all,
>>>
>>> I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
>>> do it because the GCC middle-end models DIVMOD's return value as
>>> "complex int" type, and there are no vector equivalents of that type.
>>>
>>> Therefore, this patch adds minimal support for "complex vector int"
>>> modes.  I have not attempted to provide any means to use these modes
>>> from C, so they're really only useful for DIVMOD.  The actual libfunc
>>> implementation will pack the data into wider vector modes manually.
>>>
>>> A knock-on effect of this is that I needed to increase the range of
>>> "mode_unit_size" (several of the vector modes supported by amdgcn exceed
>>> the previous 255-byte limit).
>>>
>>> Since this change would add a large number of new, unused modes to many
>>> architectures, I have elected to *not* enable them, by default, in
>>> machmode.def (where the other complex modes are created).  The new modes
>>> are therefore inactive on all architectures but amdgcn, for now.
>>>
>>> OK for mainline?  (I've not done a full test yet, but I will.)
>> 
>> I think it makes more sense to map vector CSImode to vector SImode with
>> the double number of lanes.  In fact since divmod is a libgcc function
>> I wonder where your vector variant would reside and how GCC decides to
>> emit calls to it?  That is, there's no way to OMP simd declare this function?
>
> The divmod implementation lives in libgcc. It's not too difficult to 
> write using vector extensions and some asm tricks. I did try an OMP simd 
> declare implementation, but it didn't vectorize well, and that's a yack 
> I don't wish to shave right now.
>
> In any case, the OMP simd declare will not help us here, directly, 
> because the DIVMOD transformation happens too late in the pass pipeline, 
> long after ifcvt and vect. My implementation (not yet posted), uses a 
> libfunc and the TARGET_EXPAND_DIVMOD_LIBFUNC hook in the standard way. 
> It just needs the complex vector modes to exist.
>
> Using vectors twice the length is problematic also. If I create a new 
> V128SImode that spans across two 64-lane vector registers then that will 
> probably have the desired effect ("real" quotient in v8, "imaginary" 
> remainder in v9), but if I use V64SImode to represent two V32SImode 
> vectors then that's a one-register mode, and I'll have to use a 
> permutation (a memory operation) to extract lanes 32-63 into lanes 0-31, 
> and if we ever want to implement instructions that operate on these 
> modes (as opposed to the odd/even add/sub complex patterns we have now) 
> then the masking will be all broken and we'd need to constantly 
> disassemble the double length vectors to operate on them.

I don't know if this helps (probably not), but we have a similar
situation on AArch64: a 64-bit mode like V8QI can be doubled to a
128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
the former and "V2x8QI" for the latter.  V2x8QI is forced to come
after V16QI in the mode list, and so it is only ever used through
explicit choice.  But both modes are functionally vectors of 16 QIs.

Thanks,
Richard


Re: [PATCH] riscv: Fix scope for memory model calculation

2023-06-07 Thread Dimitar Dimitrov
On Tue, Jun 06, 2023 at 08:38:14PM -0600, Jeff Law wrote:
> 
> 
> > Regression tested for riscv32-none-elf. No changes in gcc.sum and
> > g++.sum.  I don't have setup to test riscv64.
> > 
> > gcc/ChangeLog:
> > 
> > * config/riscv/riscv.cc (riscv_print_operand): Calculate
> > memmodel only when it is valid.
> Good to see you poking around in the RISC-V world Dimitar!  Are you still
> poking at the PRU as well?

Hi Jeff,

Yes, I'm still maintaining the PRU backend.

For this patch I was actually poking at the middle end, trying to
implement a small optimization for PRU (PR 106562).  And I wanted
to test if other targets would also benefit from it.

Thanks,
Dimitar

> 
> Anyway, this is fine for the trunk and for backporting to gcc-13 if the
> problem exists there as well.
> 
> jeff


Re: [PATCH] Fortran: add Fortran 2018 IEEE_{MIN,MAX} functions

2023-06-07 Thread Steve Kargl via Gcc-patches
On Wed, Jun 07, 2023 at 08:31:35PM +0200, Harald Anlauf via Fortran wrote:
> Hi FX,
> 
> On 6/6/23 21:11, FX Coudert via Gcc-patches wrote:
> > Hi,
> > 
> > > I cannot see if there is proper support for kind=17 in your patch;
> > > at least the libgfortran/ieee/ieee_arithmetic.F90 part does not
> > > seem to have any related code.
> > 
> > Can real(kind=17) ever be an IEEE mode? If so, something seriously wrong 
> > happened, because the IEEE modules have no kind=17 mention in them anywhere.
> > 
> > Actually, where is the kind=17 documented?
> > 
> > FX
> 
> I was hoping for Thomas to come forward with some comment, as
> he was quite involved in related work.
> 
> There are several threads on IEEE128 for Power on the fortran ML
> e.g. around November/December 2021, January 2022.
> 
> I wasn't meaning to block your work, just wondering if the Power
> platform needs more attention here.
> 

% cd gcc/gccx/libgfortran
% grep HAVE_GFC_REAL_17 ieee/*
% troutmask:sgk[219] ls ieee
% ieee_arithmetic.F90 ieee_features.F90
% ieee_exceptions.F90 ieee_helper.c

There are zero hits for REAL(17) in the IEEE code.  If REAL(17)
is intended to be an IEEE-754 type, then it seems gfortran's
support was never added for it.  If anyone has access to a
power system, it's easy to test

program foo
   use ieee_arithmetic
   print *, ieee_support_datatype(1.e_17)
end program foo
-- 
Steve


Re: [Patch, fortran] PR87477 - (associate) - [meta-bug] [F03] issues concerning the ASSOCIATE statement

2023-06-07 Thread Harald Anlauf via Gcc-patches

Hi Paul!

On 6/7/23 18:10, Paul Richard Thomas via Gcc-patches wrote:

Hi All,

Three more fixes for PR87477. Please note that PR99350 was a blocker
but, as pointed out in comment #5 of the PR, this has nothing to do
with the associate construct.

All three fixes are straight forward and the .diff + ChangeLog suffice
to explain them. 'rankguessed' was made redundant by the last PR87477
fix.

Regtests on x86_64 - good for mainline?

Paul

Fortran: Fix some more blockers in associate meta-bug [PR87477]

2023-06-07  Paul Thomas  

gcc/fortran
PR fortran/99350
* decl.cc (char_len_param_value): Simplify a copy of the expr
and replace the original if there is no error.


This seems to lack a gfc_free_expr (p) in case the gfc_replace_expr
is not executed, leading to a possible memleak.  Can you check?

@@ -1081,10 +1082,10 @@ char_len_param_value (gfc_expr **expr, bool
*deferred)
   if (!gfc_expr_check_typed (*expr, gfc_current_ns, false))
 return MATCH_ERROR;

-  /* If gfortran gets an EXPR_OP, try to simplify it.  This catches things
- like CHARACTER(([1])).   */
-  if ((*expr)->expr_type == EXPR_OP)
-gfc_simplify_expr (*expr, 1);
+  /* Try to simplify the expression to catch things like
CHARACTER(([1])).   */
+  p = gfc_copy_expr (*expr);
+  if (gfc_is_constant_expr (p) && gfc_simplify_expr (p, 1))
+gfc_replace_expr (*expr, p);
   else
 gfc_free_expr (p);


* gfortran.h : Remove the redundant field 'rankguessed' from
'gfc_association_list'.
* resolve.cc (resolve_assoc_var): Remove refs to 'rankguessed'.

PR fortran/107281
* resolve.cc (resolve_variable): Associate names with constant
or structure constructor targets cannot have array refs.

PR fortran/109451
* trans-array.cc (gfc_conv_expr_descriptor): Guard expression
character length backend decl before using it. Suppress the
assignment if lhs equals rhs.
* trans-io.cc (gfc_trans_transfer): Scalarize transfer of
associate variables pointing to a variable. Add comment.
* trans-stmt.cc (trans_associate_var): Remove requirement that
the character length be deferred before assigning the value
returned by gfc_conv_expr_descriptor. Also, guard the backend
decl before testing with VAR_P.

gcc/testsuite/
PR fortran/99350
* gfortran.dg/pr99350.f90 : New test.

PR fortran/107281
* gfortran.dg/associate_5.f03 : Changed error message.
* gfortran.dg/pr107281.f90 : New test.

PR fortran/109451
* gfortran.dg/associate_61.f90 : New test


Otherwise LGTM.

Thanks for the patch!

Harald




Re: [PATCH] Fortran: add Fortran 2018 IEEE_{MIN,MAX} functions

2023-06-07 Thread Harald Anlauf via Gcc-patches

Hi FX,

On 6/6/23 21:11, FX Coudert via Gcc-patches wrote:

Hi,


I cannot see if there is proper support for kind=17 in your patch;
at least the libgfortran/ieee/ieee_arithmetic.F90 part does not
seem to have any related code.


Can real(kind=17) ever be an IEEE mode? If so, something seriously wrong 
happened, because the IEEE modules have no kind=17 mention in them anywhere.

Actually, where is the kind=17 documented?

FX


I was hoping for Thomas to come forward with some comment, as
he was quite involved in related work.

There are several threads on IEEE128 for Power on the fortran ML
e.g. around November/December 2021, January 2022.

I wasn't meaning to block your work, just wondering if the Power
platform needs more attention here.

Harald




Re: [PATCH] libstdc++: Fix up 20_util/to_chars/double.cc test for excess precision [PR110145]

2023-06-07 Thread Jonathan Wakely via Gcc-patches
On Wed, 7 Jun 2023, 18:17 Jakub Jelinek via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> Hi!
>
> This test apparently contains 3 problematic floating point constants,
> 1e126, 4.91e-6 and 5.547e-6.  These constants suffer from double rounding
> when -fexcess-precision=standard evaluates double constants in the
> precision
> of Intel extended 80-bit long double.
> As written in the PR, e.g. the first one is
> 0x1.7a2ecc414a03f7ff6ca1cb527787b130a97d51e51202365p+418
> in the precision of GCC's internal format, 80-bit long double has
> 63-bit precision, so the above constant rounded to long double is
> 0x1.7a2ecc414a03f800p+418L
> (the least significant bit in the 0 before p isn't there already).
> 0x1.7a2ecc414a03f800p+418L rounded to IEEE double is
> 0x1.7a2ecc414a040p+418.
> Now, if excess precision doesn't happen and we round the GCC's internal
> format number directly to double, it is
> 0x1.7a2ecc414a03fp+418 and that is the number the test expects.
> One can see it on x86-64 (where excess precision to long double doesn't
> happen) where double(1e126L) != 1e126.
> The other two constants suffer from the same problem.
>
> The following patch tweaks the testcase, such that those problematic
> constants are used only if FLT_EVAL_METHOD is 0 or 1 (i.e. when we have
> guarantee the constants will be evaluated in double precision),
> plus adds corresponding tests with hexadecimal constants which don't
> suffer from this excess precision problem, they are exact in double
> and long double can hold all double values.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, additionally
> tested on the latter with
> make check RUNTESTFLAGS='--target_board=unix/-fexcess-precision=standard
> conformance.exp=to_chars/double.cc'
> Ok for trunk?
>

Yes, OK.

Thanks for solving this puzzle!



> 2023-06-07  Jakub Jelinek  
>
> PR libstdc++/110145
> * testsuite/20_util/to_chars/double.cc: Include .
> (double_to_chars_test_cases,
> double_scientific_precision_to_chars_test_cases_2,
> double_fixed_precision_to_chars_test_cases_2): #if out 1e126,
> 4.91e-6
> and 5.547e-6 tests if FLT_EVAL_METHOD is negative or larger than 1.
> Add unconditional tests with corresponding double constants
> 0x1.7a2ecc414a03fp+418, 0x1.4981285e98e79p-18 and
> 0x1.7440bbff418b9p-18.
>
> --- libstdc++-v3/testsuite/20_util/to_chars/double.cc.jj2022-11-03
> 22:16:08.542329555 +0100
> +++ libstdc++-v3/testsuite/20_util/to_chars/double.cc   2023-06-07
> 15:41:44.275604870 +0200
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>
> @@ -1968,9 +1969,19 @@ inline constexpr double_to_chars_testcas
>  {1e125, chars_format::fixed,
>
>  
> "248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
>  "5841365553228283904"},
> +#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
> +// When long double is Intel extended and double constants are
> evaluated in precision of
> +// long double, this value is initialized to double(1e126L), which is
> 0x1.7a2ecc414a040p+418 due to
> +// double rounding of 0x1.7a2ecc414a03f7ff6p+418L first to
> 0x1.7a2ecc414a03f800p+418L and
> +// then to 0x1.7a2ecc414a040p+418, while when double constants are
> evaluated in precision of
> +// IEEE double, this is 0x1.7a2ecc414a03fp+418 which the test
> expects.  See PR110145.
>  {1e126, chars_format::fixed,
>
>  
> "248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
>  "58413655532282839040"},
> +#endif
> +{0x1.7a2ecc414a03fp+418, chars_format::fixed,
> +
>  
> "248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
> +   "58413655532282839040"},
>  {1e127, chars_format::fixed,
>
>  
> "549291066784979473595300225087383524118479625982517885450291174622154390152298057300868772"
>  "377386949310916067328"},
> @@ -2816,8 +2827,12 @@ inline constexpr double_to_chars_testcas
>  {0x1.a6c767640cd71p+879, chars_format::scientific,
> "6.6564021122018745e+264"},
>
>  // Incorrectly handled by dtoa_milo() (Grisu2), which doesn't achieve
> shortest round-trip.
> +#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
>  {4.91e-6, chars_format::scientific, "4.91e-06"},
>  {5.547e-6, chars_format::scientific, "5.547e-06"},
> +#endif
> +{0x1.4981285e98e79p-18, chars_format::scientific, "4.91e-06"},
> +{0x1.7440bbff418b9p-18, chars_format::scientific, "5.547e-06"},
>
>  // Test hexfloat corner cases.
>  {0x1.728p+0, chars_format::hex, "1.728p+0"}, // instead of "2.e5p-1"
> @@ -5537,10 +5552,16 @@ inline constexpr double_to_chars_testcas
>  "9."
>
>  
> "9992486776161899288204254467086983483846143922597222529419997579302660316349376281765375153005"
>  "841365553228283904e+124"},
> +#if 

[PATCH] libstdc++: Fix up 20_util/to_chars/double.cc test for excess precision [PR110145]

2023-06-07 Thread Jakub Jelinek via Gcc-patches
Hi!

This test apparently contains 3 problematic floating point constants,
1e126, 4.91e-6 and 5.547e-6.  These constants suffer from double rounding
when -fexcess-precision=standard evaluates double constants in the precision
of Intel extended 80-bit long double.
As written in the PR, e.g. the first one is
0x1.7a2ecc414a03f7ff6ca1cb527787b130a97d51e51202365p+418
in the precision of GCC's internal format, 80-bit long double has
63-bit precision, so the above constant rounded to long double is
0x1.7a2ecc414a03f800p+418L
(the least significant bit in the 0 before p isn't there already).
0x1.7a2ecc414a03f800p+418L rounded to IEEE double is
0x1.7a2ecc414a040p+418.
Now, if excess precision doesn't happen and we round the GCC's internal
format number directly to double, it is
0x1.7a2ecc414a03fp+418 and that is the number the test expects.
One can see it on x86-64 (where excess precision to long double doesn't
happen) where double(1e126L) != 1e126.
The other two constants suffer from the same problem.

The following patch tweaks the testcase, such that those problematic
constants are used only if FLT_EVAL_METHOD is 0 or 1 (i.e. when we have
guarantee the constants will be evaluated in double precision),
plus adds corresponding tests with hexadecimal constants which don't
suffer from this excess precision problem, they are exact in double
and long double can hold all double values.

Bootstrapped/regtested on x86_64-linux and i686-linux, additionally
tested on the latter with
make check RUNTESTFLAGS='--target_board=unix/-fexcess-precision=standard 
conformance.exp=to_chars/double.cc'
Ok for trunk?

2023-06-07  Jakub Jelinek  

PR libstdc++/110145
* testsuite/20_util/to_chars/double.cc: Include .
(double_to_chars_test_cases,
double_scientific_precision_to_chars_test_cases_2,
double_fixed_precision_to_chars_test_cases_2): #if out 1e126, 4.91e-6
and 5.547e-6 tests if FLT_EVAL_METHOD is negative or larger than 1.
Add unconditional tests with corresponding double constants
0x1.7a2ecc414a03fp+418, 0x1.4981285e98e79p-18 and
0x1.7440bbff418b9p-18.

--- libstdc++-v3/testsuite/20_util/to_chars/double.cc.jj2022-11-03 
22:16:08.542329555 +0100
+++ libstdc++-v3/testsuite/20_util/to_chars/double.cc   2023-06-07 
15:41:44.275604870 +0200
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1968,9 +1969,19 @@ inline constexpr double_to_chars_testcas
 {1e125, chars_format::fixed,
 
"248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
 "5841365553228283904"},
+#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
+// When long double is Intel extended and double constants are evaluated 
in precision of
+// long double, this value is initialized to double(1e126L), which is 
0x1.7a2ecc414a040p+418 due to
+// double rounding of 0x1.7a2ecc414a03f7ff6p+418L first to 
0x1.7a2ecc414a03f800p+418L and
+// then to 0x1.7a2ecc414a040p+418, while when double constants are 
evaluated in precision of
+// IEEE double, this is 0x1.7a2ecc414a03fp+418 which the test expects.  
See PR110145.
 {1e126, chars_format::fixed,
 
"248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
 "58413655532282839040"},
+#endif
+{0x1.7a2ecc414a03fp+418, chars_format::fixed,
+   
"248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
+   "58413655532282839040"},
 {1e127, chars_format::fixed,
 
"549291066784979473595300225087383524118479625982517885450291174622154390152298057300868772"
 "377386949310916067328"},
@@ -2816,8 +2827,12 @@ inline constexpr double_to_chars_testcas
 {0x1.a6c767640cd71p+879, chars_format::scientific, 
"6.6564021122018745e+264"},
 
 // Incorrectly handled by dtoa_milo() (Grisu2), which doesn't achieve 
shortest round-trip.
+#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
 {4.91e-6, chars_format::scientific, "4.91e-06"},
 {5.547e-6, chars_format::scientific, "5.547e-06"},
+#endif
+{0x1.4981285e98e79p-18, chars_format::scientific, "4.91e-06"},
+{0x1.7440bbff418b9p-18, chars_format::scientific, "5.547e-06"},
 
 // Test hexfloat corner cases.
 {0x1.728p+0, chars_format::hex, "1.728p+0"}, // instead of "2.e5p-1"
@@ -5537,10 +5552,16 @@ inline constexpr double_to_chars_testcas
 "9."
 
"9992486776161899288204254467086983483846143922597222529419997579302660316349376281765375153005"
 "841365553228283904e+124"},
+#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
 {1e+126, chars_format::scientific, 124,
 "9."
 
"9992486776161899288204254467086983483846143922597222529419997579302660316349376281765375153005"
 "841365553228283904e+125"},
+#endif
+{0x1.7a2ecc414a03fp+418, 

Re: Tighten 'dg-warning' alternatives in 'c-c++-common/Wfree-nonheap-object{,-2,-3}.c' (was: [PATCH] correct -Wmismatched-new-delete (PR 98160, 98166))

2023-06-07 Thread Mike Stump via Gcc-patches
On Jun 7, 2023, at 8:01 AM, Thomas Schwinge  wrote:
> On 2020-12-08T13:46:32-0700, Martin Sebor via Gcc-patches 
>  wrote:
>> The attached changes [...]
> 
> ... eventually became commit fe7f75cf16783589eedbab597e6d0b8d35d7e470
> "Correct/improve maybe_emit_free_warning (PR middle-end/98166, PR c++/57111, 
> PR middle-end/98160)".
> 
>>  * c-c++-common/Wfree-nonheap-object-2.c: New test.
>>  * c-c++-common/Wfree-nonheap-object-3.c: New test.
>>  * c-c++-common/Wfree-nonheap-object.c: New test.
> 
> OK to push the attached
> "Tighten 'dg-warning' alternatives in 
> 'c-c++-common/Wfree-nonheap-object{,-2,-3}.c'"?

Ok.

Re: Remove 'gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s' (was: [PATCH] add -Wmismatched-new-delete to middle end (PR 90629))

2023-06-07 Thread Mike Stump via Gcc-patches
On Jun 7, 2023, at 7:54 AM, Thomas Schwinge  wrote:
> 
> On 2020-11-03T16:56:48-0700, Martin Sebor via Gcc-patches 
>  wrote:
>> Attached is a simple middle end implementation of detection of
>> mismatched pairs of calls to C++ new and delete, along with
>> a substantially enhanced implementation of -Wfree-nonheap-object.
> 
> This eventually became commit dce6c58db87ebf7f4477bd3126228e73e497
> "Add support for detecting mismatched allocation/deallocation calls".
> Already in this original patch submission:
> 
>> diff --git a/gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s 
>> b/gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s
>> new file mode 100644
>> index 000..e69de29bb2d
> 
> OK to push the attached
> "Remove 'gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s'"?

Ok.

Re: [testsuite] bump some tsvc timeouts

2023-06-07 Thread Mike Stump via Gcc-patches
On Jun 7, 2023, at 1:12 AM, Alexandre Oliva  wrote:
> 
> Several tests are timing out when targeting x86-*-vxworks with qemu.
> 
> Bump their timeout factor.

Ok.  I think these are obvious to people that have to work with simulators and 
the testsuite so if you want to self approve you can.



[PATCH] optabs: Implement double-word ctz and ffs expansion

2023-06-07 Thread Jakub Jelinek via Gcc-patches
Hi!

We have expand_doubleword_clz for a couple of years, where we emit
double-word CLZ as if (high_word == 0) return CLZ (low_word) + word_size;
else return CLZ (high_word);
We can do something similar for CTZ and FFS IMHO, just with the 2
words swapped.  So if (low_word == 0) return CTZ (high_word) + word_size;
else return CTZ (low_word); for CTZ and
if (low_word == 0) { return high_word ? FFS (high_word) + word_size : 0;
else return FFS (low_word);

The following patch implements that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, on some targets which implement both word_mode ctz and ffs patterns,
it might be better to incrementally implement those double-word ffs expansion
patterns in md files, because we aren't able to optimize it correctly;
nothing can detect we have just made sure that argument is not 0 and so
don't need to bother with handling that case.  So, on ia32 just using
CTZ patterns would be better there, but I think we can even do better and
instead of doing the comparisons of the operands against 0 do the CTZ
expansion followed by testing of flags.

2023-06-07  Jakub Jelinek  

* optabs.cc (expand_ffs): Add forward declaration.
(expand_doubleword_clz): Rename to ...
(expand_doubleword_clz_ctz_ffs): ... this.  Add UNOPTAB argument,
handle also doubleword CTZ and FFS in addition to CLZ.
(expand_unop): Adjust caller.  Also call it for doubleword
ctz_optab and ffs_optab.

* gcc.target/i386/ctzll-1.c: New test.
* gcc.target/i386/ffsll-1.c: New test.

--- gcc/optabs.cc.jj2023-06-07 09:42:14.701130305 +0200
+++ gcc/optabs.cc   2023-06-07 14:35:04.909879272 +0200
@@ -2697,10 +2697,14 @@ expand_clrsb_using_clz (scalar_int_mode
   return temp;
 }
 
-/* Try calculating clz of a double-word quantity as two clz's of word-sized
-   quantities, choosing which based on whether the high word is nonzero.  */
+static rtx expand_ffs (scalar_int_mode, rtx, rtx);
+
+/* Try calculating clz, ctz or ffs of a double-word quantity as two clz, ctz or
+   ffs operations on word-sized quantities, choosing which based on whether the
+   high (for clz) or low (for ctz and ffs) word is nonzero.  */
 static rtx
-expand_doubleword_clz (scalar_int_mode mode, rtx op0, rtx target)
+expand_doubleword_clz_ctz_ffs (scalar_int_mode mode, rtx op0, rtx target,
+  optab unoptab)
 {
   rtx xop0 = force_reg (mode, op0);
   rtx subhi = gen_highpart (word_mode, xop0);
@@ -2709,6 +2713,7 @@ expand_doubleword_clz (scalar_int_mode m
   rtx_code_label *after_label = gen_label_rtx ();
   rtx_insn *seq;
   rtx temp, result;
+  int addend = 0;
 
   /* If we were not given a target, use a word_mode register, not a
  'mode' register.  The result will fit, and nobody is expecting
@@ -2721,6 +2726,9 @@ expand_doubleword_clz (scalar_int_mode m
  'target' to tag a REG_EQUAL note on.  */
   result = gen_reg_rtx (word_mode);
 
+  if (unoptab != clz_optab)
+std::swap (subhi, sublo);
+
   start_sequence ();
 
   /* If the high word is not equal to zero,
@@ -2728,7 +2736,13 @@ expand_doubleword_clz (scalar_int_mode m
   emit_cmp_and_jump_insns (subhi, CONST0_RTX (word_mode), EQ, 0,
   word_mode, true, hi0_label);
 
-  temp = expand_unop_direct (word_mode, clz_optab, subhi, result, true);
+  if (optab_handler (unoptab, word_mode) != CODE_FOR_nothing)
+temp = expand_unop_direct (word_mode, unoptab, subhi, result, true);
+  else
+{
+  gcc_assert (unoptab == ffs_optab);
+  temp = expand_ffs (word_mode, subhi, result);
+}
   if (!temp)
 goto fail;
 
@@ -2739,14 +2753,32 @@ expand_doubleword_clz (scalar_int_mode m
   emit_barrier ();
 
   /* Else clz of the full value is clz of the low word plus the number
- of bits in the high word.  */
+ of bits in the high word.  Similarly for ctz/ffs of the high word,
+ except that ffs should be 0 when both words are zero.  */
   emit_label (hi0_label);
 
-  temp = expand_unop_direct (word_mode, clz_optab, sublo, 0, true);
+  if (unoptab == ffs_optab)
+{
+  convert_move (result, const0_rtx, true);
+  emit_cmp_and_jump_insns (sublo, CONST0_RTX (word_mode), EQ, 0,
+  word_mode, true, after_label);
+}
+
+  if (optab_handler (unoptab, word_mode) != CODE_FOR_nothing)
+temp = expand_unop_direct (word_mode, unoptab, sublo, NULL_RTX, true);
+  else
+{
+  gcc_assert (unoptab == ffs_optab);
+  temp = expand_unop_direct (word_mode, ctz_optab, sublo, NULL_RTX, true);
+  addend = 1;
+}
+
   if (!temp)
 goto fail;
+
   temp = expand_binop (word_mode, add_optab, temp,
-  gen_int_mode (GET_MODE_BITSIZE (word_mode), word_mode),
+  gen_int_mode (GET_MODE_BITSIZE (word_mode) + addend,
+word_mode),
   result, true, OPTAB_DIRECT);
   if (!temp)
 goto fail;
@@ -2759,7 

[PATCH] i386: Fix endless recursion in ix86_expand_vector_init_general with MMX [PR110152]

2023-06-07 Thread Jakub Jelinek via Gcc-patches
Hi!

I'm getting
+FAIL: gcc.target/i386/3dnow-1.c (internal compiler error: Segmentation fault 
signal terminated program cc1)
+FAIL: gcc.target/i386/3dnow-1.c (test for excess errors)
+FAIL: gcc.target/i386/3dnow-2.c (internal compiler error: Segmentation fault 
signal terminated program cc1)
+FAIL: gcc.target/i386/3dnow-2.c (test for excess errors)
+FAIL: gcc.target/i386/mmx-1.c (internal compiler error: Segmentation fault 
signal terminated program cc1)
+FAIL: gcc.target/i386/mmx-1.c (test for excess errors)
+FAIL: gcc.target/i386/mmx-2.c (internal compiler error: Segmentation fault 
signal terminated program cc1)
+FAIL: gcc.target/i386/mmx-2.c (test for excess errors)
regressions on i686-linux since r14-1166.  The problem is when
ix86_expand_vector_init_general is called with mmx_ok = true and
mode = V4HImode, it newly recurses with mmx_ok = false and mode = V2SImode,
but as mmx_ok is false and !TARGET_SSE, we recurse again with the same
arguments (ok, fresh new tmp and vals) infinitely.
The following patch fixes that by passing mmx_ok to that recursive call.
For n_words == 4 it isn't needed, because we only care about mmx_ok for
V2SImode or V2SFmode and no other modes.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-06-07  Jakub Jelinek  

PR target/110152
* config/i386/i386-expand.cc (ix86_expand_vector_init_general): For
n_words == 2 recurse with mmx_ok as first argument rather than false.

--- gcc/config/i386/i386-expand.cc.jj   2023-06-03 15:32:04.489410367 +0200
+++ gcc/config/i386/i386-expand.cc  2023-06-07 10:31:34.715981752 +0200
@@ -16371,7 +16371,7 @@ quarter:
  machine_mode concat_mode = tmp_mode == DImode ? V2DImode : V2SImode;
  rtx tmp = gen_reg_rtx (concat_mode);
  vals = gen_rtx_PARALLEL (concat_mode, gen_rtvec_v (2, words));
- ix86_expand_vector_init_general (false, concat_mode, tmp, vals);
+ ix86_expand_vector_init_general (mmx_ok, concat_mode, tmp, vals);
  emit_move_insn (target, gen_lowpart (mode, tmp));
}
   else if (n_words == 4)

Jakub



Re: [pushed] [PR109541] RA: Constrain class of pic offset table pseudo to general regs

2023-06-07 Thread Vladimir Makarov via Gcc-patches


On 6/7/23 12:20, Jeff Law wrote:



On 6/7/23 09:35, Vladimir Makarov via Gcc-patches wrote:

The following patch fixes



-ENOPATCH


Sorry, here is the patch.

commit 08ca31fb27841cb7f3bff7086be6f139136be1a7
Author: Vladimir N. Makarov 
Date:   Wed Jun 7 09:51:54 2023 -0400

RA: Constrain class of pic offset table pseudo to general regs

On some targets an integer pseudo can be assigned to a FP reg.  For
pic offset table pseudo it means we will reload the pseudo in this
case and, as a consequence, memory containing the pseudo might be
recognized as wrong one.  The patch fix this problem.

PR target/109541

gcc/ChangeLog:

* ira-costs.cc: (find_costs_and_classes): Constrain classes of pic
  offset table pseudo to a general reg subset.

gcc/testsuite/ChangeLog:

* gcc.target/sparc/pr109541.c: New.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index ae8304ff938..d9e700e8947 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -2016,6 +2016,16 @@ find_costs_and_classes (FILE *dump_file)
 	  ira_assert (regno_aclass[i] != NO_REGS
 			  && ira_reg_allocno_class_p[regno_aclass[i]]);
 	}
+	  if (pic_offset_table_rtx != NULL
+	  && i == (int) REGNO (pic_offset_table_rtx))
+	{
+	  /* For some targets, integer pseudos can be assigned to fp
+		 regs.  As we don't want reload pic offset table pseudo, we
+		 should avoid using non-integer regs.  */
+	  regno_aclass[i]
+		= ira_reg_class_intersect[regno_aclass[i]][GENERAL_REGS];
+	  alt_class = ira_reg_class_intersect[alt_class][GENERAL_REGS];
+	}
 	  if ((new_class
 	   = (reg_class) (targetm.ira_change_pseudo_allocno_class
 			  (i, regno_aclass[i], best))) != regno_aclass[i])
diff --git a/gcc/testsuite/gcc.target/sparc/pr109541.c b/gcc/testsuite/gcc.target/sparc/pr109541.c
new file mode 100644
index 000..1360f101930
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/pr109541.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -mcpu=niagara4 -fpic -w" } */
+
+int rhash_sha512_process_block_A, rhash_sha512_process_block_i,
+rhash_sha512_process_block_block, rhash_sha512_process_block_W_0;
+
+unsigned rhash_sha512_process_block_W_2;
+
+void rhash_sha512_process_block (void)
+{
+  unsigned C, E, F, G, H, W_0, W_4, W_9, W_5, W_3, T1;
+
+  for (; rhash_sha512_process_block_i; rhash_sha512_process_block_i += 6) {
+T1 = F + (rhash_sha512_process_block_W_2 += 6);
+rhash_sha512_process_block_A += H & G + (W_5 += rhash_sha512_process_block_W_0);
+H = C & T1 & E ^ F + (W_9 += rhash_sha512_process_block_W_0);
+G = T1 ^ 6 + (W_0 += rhash_sha512_process_block_block);
+F = (unsigned) 
+T1 = (unsigned) ( + (W_3 += rhash_sha512_process_block_block > 9 > W_4));
+C = (unsigned) (T1 + );
+W_4 += W_5 += rhash_sha512_process_block_W_0;
+  }
+}


Re: [pushed] [PR109541] RA: Constrain class of pic offset table pseudo to general regs

2023-06-07 Thread Jeff Law via Gcc-patches




On 6/7/23 09:35, Vladimir Makarov via Gcc-patches wrote:

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109541

The patch was successfully bootstrapped and tested on x86-64, aarcha64, 
and ppc64le.



-ENOPATCH


[Patch, fortran] PR87477 - (associate) - [meta-bug] [F03] issues concerning the ASSOCIATE statement

2023-06-07 Thread Paul Richard Thomas via Gcc-patches
Hi All,

Three more fixes for PR87477. Please note that PR99350 was a blocker
but, as pointed out in comment #5 of the PR, this has nothing to do
with the associate construct.

All three fixes are straight forward and the .diff + ChangeLog suffice
to explain them. 'rankguessed' was made redundant by the last PR87477
fix.

Regtests on x86_64 - good for mainline?

Paul

Fortran: Fix some more blockers in associate meta-bug [PR87477]

2023-06-07  Paul Thomas  

gcc/fortran
PR fortran/99350
* decl.cc (char_len_param_value): Simplify a copy of the expr
and replace the original if there is no error.
* gfortran.h : Remove the redundant field 'rankguessed' from
'gfc_association_list'.
* resolve.cc (resolve_assoc_var): Remove refs to 'rankguessed'.

PR fortran/107281
* resolve.cc (resolve_variable): Associate names with constant
or structure constructor targets cannot have array refs.

PR fortran/109451
* trans-array.cc (gfc_conv_expr_descriptor): Guard expression
character length backend decl before using it. Suppress the
assignment if lhs equals rhs.
* trans-io.cc (gfc_trans_transfer): Scalarize transfer of
associate variables pointing to a variable. Add comment.
* trans-stmt.cc (trans_associate_var): Remove requirement that
the character length be deferred before assigning the value
returned by gfc_conv_expr_descriptor. Also, guard the backend
decl before testing with VAR_P.

gcc/testsuite/
PR fortran/99350
* gfortran.dg/pr99350.f90 : New test.

PR fortran/107281
* gfortran.dg/associate_5.f03 : Changed error message.
* gfortran.dg/pr107281.f90 : New test.

PR fortran/109451
* gfortran.dg/associate_61.f90 : New test
diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc
index f5d39e2a3d8..d09c8bc97d9 100644
--- a/gcc/fortran/decl.cc
+++ b/gcc/fortran/decl.cc
@@ -1056,6 +1056,7 @@ static match
 char_len_param_value (gfc_expr **expr, bool *deferred)
 {
   match m;
+  gfc_expr *p;
 
   *expr = NULL;
   *deferred = false;
@@ -1081,10 +1082,10 @@ char_len_param_value (gfc_expr **expr, bool *deferred)
   if (!gfc_expr_check_typed (*expr, gfc_current_ns, false))
 return MATCH_ERROR;
 
-  /* If gfortran gets an EXPR_OP, try to simplify it.  This catches things
- like CHARACTER(([1])).   */
-  if ((*expr)->expr_type == EXPR_OP)
-gfc_simplify_expr (*expr, 1);
+  /* Try to simplify the expression to catch things like CHARACTER(([1])).   */
+  p = gfc_copy_expr (*expr);
+  if (gfc_is_constant_expr (p) && gfc_simplify_expr (p, 1))
+gfc_replace_expr (*expr, p);
 
   if ((*expr)->expr_type == EXPR_FUNCTION)
 {
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 3e5f942d7fd..a65dd571591 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2914,9 +2914,6 @@ typedef struct gfc_association_list
  for memory handling.  */
   unsigned dangling:1;
 
-  /* True when the rank of the target expression is guessed during parsing.  */
-  unsigned rankguessed:1;
-
   char name[GFC_MAX_SYMBOL_LEN + 1];
   gfc_symtree *st; /* Symtree corresponding to name.  */
   locus where;
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 2ba3101f1fe..f2604314570 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -5872,7 +5872,15 @@ resolve_variable (gfc_expr *e)
   if (sym->ts.type == BT_CLASS)
 	gfc_fix_class_refs (e);
   if (!sym->attr.dimension && e->ref && e->ref->type == REF_ARRAY)
-	return false;
+	{
+	  /* Unambiguously scalar!  */
+	  if (sym->assoc->target
+	  && (sym->assoc->target->expr_type == EXPR_CONSTANT
+		  || sym->assoc->target->expr_type == EXPR_STRUCTURE))
+	gfc_error ("Scalar variable %qs has an array reference at %L",
+		   sym->name, >where);
+	  return false;
+	}
   else if (sym->attr.dimension && (!e->ref || e->ref->type != REF_ARRAY))
 	{
 	  /* This can happen because the parser did not detect that the
@@ -9279,7 +9287,7 @@ resolve_assoc_var (gfc_symbol* sym, bool resolve_target)
   gfc_array_spec *as;
   /* The rank may be incorrectly guessed at parsing, therefore make sure
 	 it is corrected now.  */
-  if (sym->ts.type != BT_CLASS && (!sym->as || sym->assoc->rankguessed))
+  if (sym->ts.type != BT_CLASS && !sym->as)
 	{
 	  if (!sym->as)
 	sym->as = gfc_get_array_spec ();
@@ -9292,8 +9300,7 @@ resolve_assoc_var (gfc_symbol* sym, bool resolve_target)
 	sym->attr.codimension = 1;
 	}
   else if (sym->ts.type == BT_CLASS
-	   && CLASS_DATA (sym)
-	   && (!CLASS_DATA (sym)->as || sym->assoc->rankguessed))
+	   && CLASS_DATA (sym) && !CLASS_DATA (sym)->as)
 	{
 	  if (!CLASS_DATA (sym)->as)
 	CLASS_DATA (sym)->as = gfc_get_array_spec ();
diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 1c7ea900ea1..e1c75e9fe02 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -7934,7 +7934,8 @@ gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr)
 	  else
 	tmp = se->string_length;
 
-	  if (expr->ts.deferred && VAR_P 

Re: vect: Don't pass subtype to vect_widened_op_tree where not needed [PR 110142]

2023-06-07 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Hi,
>
> This patch fixes an issue introduced by 
> g:2f482a07365d9f4a94a56edd13b7f01b8f78b5a0, where a subtype was beeing 
> passed to vect_widened_op_tree, when no subtype was to be used. This 
> lead to an errorneous use of IFN_VEC_WIDEN_MINUS.
>
> gcc/ChangeLog:
>
>  * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Don't 
> pass subtype to
>  vect_widened_op_tree and remove subtype parameter.
>  (vect_recog_widen_plus_pattern): Remove subtype parameter and 
> dont pass to call to
>  vect_recog_widen_op_pattern.
>  (vect_recog_widen_minus_pattern): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>  * gcc.dg/vect/pr110142.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr110142.c 
> b/gcc/testsuite/gcc.dg/vect/pr110142.c
> new file mode 100644
> index 
> ..a88dbe400f46a33a53649298345c24c569e2f567
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr110142.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3" } */
> +void test(short *x, unsigned short *y, int n)
> +{
> +  for (int i = 0; i < n; i++)
> +  x[i] = (y[i] - x[i]) >> 1;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "widen_minus" "vect" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> dc102c919352a0328cf86eabceb3a38c41a7e4fd..599a027f9b2feb8971c1ee017b6457bc297c86c2
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -1405,15 +1405,14 @@ static gimple *
>  vect_recog_widen_op_pattern (vec_info *vinfo,
>stmt_vec_info last_stmt_info, tree *type_out,
>tree_code orig_code, code_helper wide_code,
> -  bool shift_p, const char *name,
> -  optab_subtype *subtype = NULL)
> +  bool shift_p, const char *name)
>  {
>gimple *last_stmt = last_stmt_info->stmt;
>  
>vect_unpromoted_value unprom[2];
>tree half_type;
>if (!vect_widened_op_tree (vinfo, last_stmt_info, orig_code, orig_code,
> -  shift_p, 2, unprom, _type, subtype))
> +  shift_p, 2, unprom, _type))
>  
>  return NULL;
>  
> @@ -1484,13 +1483,11 @@ static gimple *
>  vect_recog_widen_op_pattern (vec_info *vinfo,
>stmt_vec_info last_stmt_info, tree *type_out,
>tree_code orig_code, internal_fn wide_ifn,
> -  bool shift_p, const char *name,
> -  optab_subtype *subtype = NULL)
> +  bool shift_p, const char *name)
>  {
>combined_fn ifn = as_combined_fn (wide_ifn);
>return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
> -   orig_code, ifn, shift_p, name,
> -   subtype);
> +   orig_code, ifn, shift_p, name);
>  }

I think this overload can be deleted.  An internal_fn will then
be implicitly converted to code_helper and use the overload above.

OK with that change, thanks.

Richard

>  
>  
> @@ -1513,11 +1510,9 @@ static gimple *
>  vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
>  tree *type_out)
>  {
> -  optab_subtype subtype;
>return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
> PLUS_EXPR, IFN_VEC_WIDEN_PLUS,
> -   false, "vect_recog_widen_plus_pattern",
> -   );
> +   false, "vect_recog_widen_plus_pattern");
>  }
>  
>  /* Try to detect subtraction on widened inputs, converting MINUS_EXPR
> @@ -1526,11 +1521,9 @@ static gimple *
>  vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info 
> last_stmt_info,
>  tree *type_out)
>  {
> -  optab_subtype subtype;
>return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
> MINUS_EXPR, IFN_VEC_WIDEN_MINUS,
> -   false, "vect_recog_widen_minus_pattern",
> -   );
> +   false, "vect_recog_widen_minus_pattern");
>  }
>  
>  /* Function vect_recog_ctz_ffs_pattern


Re: [PATCH] libstdc++: Use AS_IF in configure.ac

2023-06-07 Thread Jonathan Wakely via Gcc-patches
On Wed, 7 Jun 2023 at 16:19, Andreas Schwab wrote:

> On Jun 07 2023, Jonathan Wakely via Gcc-patches wrote:
>
> > Let's just revert it then. The manual says we should use AS_IF, but what
> we
> > had previously was working well enough. I'll figure out what happened
> here
> > later.
>
> I think AS_IF is doing its job here: moving the expansion of
> AC_REQUIRE'd macros out of the bodies.  But many of those expansions
> actually need to remain under the $GLIBCXX_IS_NATIVE conditional, so it
> is not appropriate at this place.
>
>
Ah yes, that makes sense. Thanks.


Re: Support 'UNSUPPORTED: [...]: exception handling disabled' for libstdc++ testing (was: Support in the GCC(/C++) test suites for '-fno-exceptions')

2023-06-07 Thread Jonathan Wakely via Gcc-patches
On Wed, 7 Jun 2023 at 12:51, Jonathan Wakely  wrote:

>
>
> On Wed, 7 Jun 2023 at 10:08, Thomas Schwinge 
> wrote:
>
>> Hi!
>>
>> On 2023-06-07T09:12:31+0100, Jonathan Wakely  wrote:
>> > On Wed, 7 Jun 2023 at 08:13, Thomas Schwinge wrote:
>> >> On 2023-06-06T20:31:21+0100, Jonathan Wakely 
>> wrote:
>> >> > On Tue, 6 Jun 2023 at 20:14, Thomas Schwinge <
>> tho...@codesourcery.com>
>> >> > wrote:
>> >> >> This issue comes up in context of me working on C++ support for GCN
>> and
>> >> >> nvptx target.  Those targets shall default to '-fno-exceptions' --
>> or,
>> >> >> "in other words", '-fexceptions' is not supported.  (Details omitted
>> >> >> here.)
>> >> >>
>> >> >> It did seem clear to me that with such a configuration it'll be
>> hard to
>> >> >> get clean test results.  Then I found code in
>> >> >> 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':
>> >> >>
>> >> >> # If exceptions are disabled, mark tests expecting exceptions
>> to be
>> >> >> enabled
>> >> >> # as unsupported.
>> >> >> if { ![check_effective_target_exceptions_enabled] } {
>> >> >> if [regexp "(^|\n)\[^\n\]*: error: exception handling
>> disabled"
>> >> >> $text] {
>> >> >> return "::unsupported::exception handling disabled"
>> >> >> }
>> >> >>
>> >> >> ..., which, in a way, sounds as if the test suite generally is
>> meant to
>> >> >> produce useful results for '-fno-exceptions', nice surprise!
>> >> >>
>> >> >> Running x86_64-pc-linux-gnu (not yet GCN, nvptx) 'make check' with:
>> >> >>
>> >> >> RUNTESTFLAGS='--target_board=unix/-fno-exceptions\{,-m32\}'
>> >> >>
>> >> >> ..., I find that indeed this does work for a lot of test cases,
>> where we
>> >> >> then get (random example):
>> >> >>
>> >> >>  PASS: g++.dg/coroutines/pr99710.C  (test for errors, line 23)
>> >> >> -PASS: g++.dg/coroutines/pr99710.C (test for excess errors)
>> >> >> +UNSUPPORTED: g++.dg/coroutines/pr99710.C: exception handling
>> >> disabled
>> >> >>
>> >> >> ..., due to:
>> >> >>
>> >> >>  [...]/g++.dg/coroutines/pr99710.C: In function 'task
>> my_coro()':
>> >> >> +[...]/g++.dg/coroutines/pr99710.C:18:10: error: exception
>> handling
>> >> >> disabled, use '-fexceptions' to enable
>> >> >>  [...]/g++.dg/coroutines/pr99710.C:23:7: error: await
>> expressions
>> >> are
>> >> >> not permitted in handlers
>> >> >>  compiler exited with status 1
>> >> >>
>> >> >> But, we're nowhere near clean test results: PASS -> FAIL as well as
>> >> >> XFAIL -> XPASS regressions, due to 'error: exception handling
>> disabled'
>> >> >> precluding other diagnostics seems to be one major issue.
>> >> >>
>> >> >> Is there interest in me producing the obvious (?) changes to those
>> test
>> >> >> cases, such that compiler g++ as well as target library libstdc++
>> test
>> >> >> results are reasonably clean?  (If you think that's all "wasted
>> effort",
>> >> >> then I suppose I'll just locally ignore any
>> FAILs/XPASSes/UNRESOLVEDs
>> >> >> that appear in combination with
>> >> >> 'UNSUPPORTED: [...]: exception handling disabled'.)
>> >> >
>> >> > I would welcome that for libstdc++.
>> >>
>> >> Assuming no issues found in testing, OK to push the attached
>> >> "Support 'UNSUPPORTED: [...]: exception handling disabled' for
>> libstdc++
>> >> testing"?
>> >> (Thanks, Jozef!)
>> >
>> > Yes please.
>>
>> Pushed commit r14-1604-g5faaabef3819434d13fcbf749bd07bfc98ca7c3c
>> "Support 'UNSUPPORTED: [...]: exception handling disabled' for libstdc++
>> testing"
>> to master branch, as posted.
>>
>> For one-week-old GCC commit 2720bbd597f56742a17119dfe80edc2ba86af255,
>> x86_64-pc-linux-gnu, I see no changes without '-fno-exceptions' (as
>> expected), and otherwise:
>>
>> === libstdc++ Summary for
>> [-unix-]{+unix/-fno-exceptions+} ===
>>
>> # of expected passes[-15044-]{+12877+}
>> # of unexpected failures[-5-]{+10+}
>> # of expected failures  [-106-]{+77+}
>> {+# of unresolved testcases 6+}
>> # of unsupported tests  [-747-]{+1846+}
>>
>> As expected, there's a good number of (random example):
>>
>> -PASS: 18_support/105387.cc (test for excess errors)
>> -PASS: 18_support/105387.cc execution test
>> +UNSUPPORTED: 18_support/105387.cc: exception handling disabled
>>
>> ..., plus the following:
>>
>> [-PASS:-]{+FAIL:+} 23_containers/vector/capacity/constexpr.cc (test
>> for excess errors)
>>
>>
>> [...]/libstdc++-v3/testsuite/23_containers/vector/capacity/constexpr.cc:101:
>> error: non-constant condition for static assertion
>> In file included from
>> [...]/libstdc++-v3/testsuite/23_containers/vector/capacity/constexpr.cc:6:
>>
>> [...]/libstdc++-v3/testsuite/23_containers/vector/capacity/constexpr.cc:101:
>>  in 'constexpr' expansion of 'test_shrink_to_fit()'
>> [...]/libstdc++-v3/testsuite/util/testsuite_hooks.h:56: error:
>> '__builtin_fprintf(stderr, ((const char*)"%s:%d: %s: Assertion \'%s\'
>> failed.\012"), 

Re: [committed] libstdc++: Update list of known symbol versions for abi-check

2023-06-07 Thread Jonathan Wakely via Gcc-patches
On Wed, 7 Jun 2023 at 09:06, Jonathan Wakely  wrote:

> On Wed, 7 Jun 2023 at 05:43, François Dumont wrote:
>
>>
>> On 06/06/2023 17:59, Jonathan Wakely via Libstdc++ wrote:
>> > Tested x86_64-linux and powerpc64le-linux. Pushed to trunk.
>> >
>> > -- >8 --
>> >
>> > Add the recently added CXXABI_1.3.15 version. Also remove two "frozen"
>> > versions from the latestp list, as no more symbols should be added to
>> > those now.
>> >
>> > libstdc++-v3/ChangeLog:
>> >
>> >   * testsuite/util/testsuite_abi.cc (check_version): Add
>> >   CXXABI_1.3.15 symver and make it the latestp. Remove
>> >   GLIBCXX_IEEE128_3.4.31 and GLIBCXX_LDBL_3.4.31 from latestp.
>> > ---
>> >   libstdc++-v3/testsuite/util/testsuite_abi.cc | 7 ++-
>> >   1 file changed, 2 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc
>> b/libstdc++-v3/testsuite/util/testsuite_abi.cc
>> > index cea6c217433..59615dd701e 100644
>> > --- a/libstdc++-v3/testsuite/util/testsuite_abi.cc
>> > +++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc
>> > @@ -233,7 +233,7 @@ check_version(symbol& test, bool added)
>> > known_versions.push_back("CXXABI_1.3.11");
>> > known_versions.push_back("CXXABI_1.3.12");
>> > known_versions.push_back("CXXABI_1.3.13");
>> > -  known_versions.push_back("CXXABI_1.3.14");
>> > +  known_versions.push_back("CXXABI_1.3.15");
>>
>> Did you really want to remove CXXABI_1.3.14 here ? ChangeLog says you
>> just add CXXABI_1.3.15.
>>
>
> Oops, yes! Thanks for spotting that. I'll fix it today.
>
>
Fixed at  r14-1613-gb6235dbcfc3143, thanks again.

The abi-check test didn't fail because all symbols in the CXXABI_1.3.14
version are already in the baseline_symbols.txt file, so none of them is
compared to the list of known_versions. That meant it didn't matter that
CXXABI_1.3.14 wasn't in the known versions.


[committed] libstdc++: Restore accidentally removed version in abi-check

2023-06-07 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux (-m32/-m64) and powerpc64le-linux. Pushed to trunk.

-- >8 --

In r14-1583-g192665feef7129 I meant to add CXXABI_1.3.15 but instead I
replaced CXXABI_1.3.14 with it. This restores the CXXABI_1.3.14 version.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_abi.cc (check_version): Re-add
CXXABI_1.3.14.
---
 libstdc++-v3/testsuite/util/testsuite_abi.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc 
b/libstdc++-v3/testsuite/util/testsuite_abi.cc
index 59615dd701e..b0a628a1553 100644
--- a/libstdc++-v3/testsuite/util/testsuite_abi.cc
+++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc
@@ -233,6 +233,7 @@ check_version(symbol& test, bool added)
   known_versions.push_back("CXXABI_1.3.11");
   known_versions.push_back("CXXABI_1.3.12");
   known_versions.push_back("CXXABI_1.3.13");
+  known_versions.push_back("CXXABI_1.3.14");
   known_versions.push_back("CXXABI_1.3.15");
   known_versions.push_back("CXXABI_IEEE128_1.3.13");
   known_versions.push_back("CXXABI_TM_1");
-- 
2.40.1



[committed] libstdc++: Fix some tests that fail with -fno-exceptions

2023-06-07 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux (-m32/-m64) and powerpc64le-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* testsuite/18_support/nested_exception/rethrow_if_nested-term.cc:
Require effective target exceptions_enabled instead of using
dg-skip-if.
* testsuite/23_containers/vector/capacity/constexpr.cc: Expect
shrink_to_fit() to be a no-op without exceptions enabled.
* testsuite/23_containers/vector/capacity/shrink_to_fit.cc:
Likewise.
* testsuite/ext/bitmap_allocator/check_allocate_max_size.cc:
Require effective target exceptions_enabled.
* testsuite/ext/malloc_allocator/check_allocate_max_size.cc:
Likewise.
* testsuite/ext/mt_allocator/check_allocate_max_size.cc:
Likewise.
* testsuite/ext/new_allocator/check_allocate_max_size.cc:
Likewise.
* testsuite/ext/pool_allocator/check_allocate_max_size.cc:
Likewise.
* testsuite/ext/throw_allocator/check_allocate_max_size.cc:
Likewise.
---
 .../18_support/nested_exception/rethrow_if_nested-term.cc | 2 +-
 .../testsuite/23_containers/vector/capacity/constexpr.cc  | 8 
 .../23_containers/vector/capacity/shrink_to_fit.cc| 4 
 .../ext/bitmap_allocator/check_allocate_max_size.cc   | 2 ++
 .../ext/malloc_allocator/check_allocate_max_size.cc   | 2 ++
 .../testsuite/ext/mt_allocator/check_allocate_max_size.cc | 2 ++
 .../ext/new_allocator/check_allocate_max_size.cc  | 2 ++
 .../ext/pool_allocator/check_allocate_max_size.cc | 2 ++
 .../ext/throw_allocator/check_allocate_max_size.cc| 1 +
 9 files changed, 24 insertions(+), 1 deletion(-)

diff --git 
a/libstdc++-v3/testsuite/18_support/nested_exception/rethrow_if_nested-term.cc 
b/libstdc++-v3/testsuite/18_support/nested_exception/rethrow_if_nested-term.cc
index 5913392bd46..3bfc7ab9943 100644
--- 
a/libstdc++-v3/testsuite/18_support/nested_exception/rethrow_if_nested-term.cc
+++ 
b/libstdc++-v3/testsuite/18_support/nested_exception/rethrow_if_nested-term.cc
@@ -1,5 +1,5 @@
 // { dg-do run { target c++11 } }
-// { dg-skip-if "" { *-*-* } { "-fno-exceptions" } }
+// { dg-require-effective-target exceptions_enabled }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/23_containers/vector/capacity/constexpr.cc 
b/libstdc++-v3/testsuite/23_containers/vector/capacity/constexpr.cc
index 92c23035e4f..f102e78425b 100644
--- a/libstdc++-v3/testsuite/23_containers/vector/capacity/constexpr.cc
+++ b/libstdc++-v3/testsuite/23_containers/vector/capacity/constexpr.cc
@@ -89,11 +89,19 @@ test_shrink_to_fit()
   std::vector v;
   v.reserve(9);
   v.shrink_to_fit();
+#if __cpp_exceptions
   VERIFY( v.capacity() == 0 );
+#else
+  VERIFY( v.capacity() == 9 );
+#endif
   v.reserve(9);
   v.resize(5);
   v.shrink_to_fit();
+#if __cpp_exceptions
   VERIFY( v.capacity() == v.size() );
+#else
+  VERIFY( v.capacity() == 9 );
+#endif
 
   return true;
 }
diff --git 
a/libstdc++-v3/testsuite/23_containers/vector/capacity/shrink_to_fit.cc 
b/libstdc++-v3/testsuite/23_containers/vector/capacity/shrink_to_fit.cc
index a8cede2278d..6542b5fd39f 100644
--- a/libstdc++-v3/testsuite/23_containers/vector/capacity/shrink_to_fit.cc
+++ b/libstdc++-v3/testsuite/23_containers/vector/capacity/shrink_to_fit.cc
@@ -30,7 +30,11 @@ void test01()
   v.push_back(1);
   VERIFY( v.size() < v.capacity() );
   v.shrink_to_fit();
+#if __cpp_exceptions
   VERIFY( v.size() == v.capacity() );
+#else
+  VERIFY( v.size() < v.capacity() );
+#endif
 }
 
 int main()
diff --git 
a/libstdc++-v3/testsuite/ext/bitmap_allocator/check_allocate_max_size.cc 
b/libstdc++-v3/testsuite/ext/bitmap_allocator/check_allocate_max_size.cc
index 712489f26a9..e523bb8f6c2 100644
--- a/libstdc++-v3/testsuite/ext/bitmap_allocator/check_allocate_max_size.cc
+++ b/libstdc++-v3/testsuite/ext/bitmap_allocator/check_allocate_max_size.cc
@@ -16,6 +16,8 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
+// { dg-require-effective-target exceptions_enabled }
+
 // 20.4.1.1 allocator members
 
 #include 
diff --git 
a/libstdc++-v3/testsuite/ext/malloc_allocator/check_allocate_max_size.cc 
b/libstdc++-v3/testsuite/ext/malloc_allocator/check_allocate_max_size.cc
index 53fb8d4ab31..e59f6ad99b9 100644
--- a/libstdc++-v3/testsuite/ext/malloc_allocator/check_allocate_max_size.cc
+++ b/libstdc++-v3/testsuite/ext/malloc_allocator/check_allocate_max_size.cc
@@ -16,6 +16,8 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
+// { dg-require-effective-target exceptions_enabled }
+
 // 20.4.1.1 allocator members
 
 #include 
diff --git a/libstdc++-v3/testsuite/ext/mt_allocator/check_allocate_max_size.cc 
b/libstdc++-v3/testsuite/ext/mt_allocator/check_allocate_max_size.cc
index cc6f94bb2d0..b636098b5c9 100644
--- a/libstdc++-v3/testsuite/ext/mt_allocator/check_allocate_max_size.cc
+++ 

[committed] libstdc++: Fix some tests that fail with -fexcess-precision=standard

2023-06-07 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux (-m32/-m64) and powerpc64le-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* testsuite/20_util/duration/cons/2.cc: Use values that aren't
affected by rounding.
* testsuite/20_util/from_chars/5.cc: Cast arithmetic result to
double before comparing for equality.
* testsuite/20_util/from_chars/6.cc: Likewise.
* testsuite/20_util/variant/86874.cc: Use values that aren't
affected by rounding.
* testsuite/25_algorithms/lower_bound/partitioned.cc: Compare to
original value instead of to floating-point-literal.
* testsuite/26_numerics/random/discrete_distribution/cons/range.cc:
Cast arithmetic result to double before comparing for equality.
* 
testsuite/26_numerics/random/piecewise_constant_distribution/cons/range.cc:
Likewise.
* 
testsuite/26_numerics/random/piecewise_linear_distribution/cons/range.cc:
Likewise.
* testsuite/26_numerics/valarray/transcend.cc (eq): Check that
the absolute difference is less than 0.01 instead of comparing
to two decimal places.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/01.cc:
Cast arithmetic result to double before comparing for equality.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/09.cc:
Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/10.cc:
Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/01.cc:
Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/09.cc:
Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/10.cc:
Likewise.
* testsuite/ext/random/hoyt_distribution/cons/parms.cc: Likewise.
---
 libstdc++-v3/testsuite/20_util/duration/cons/2.cc | 4 ++--
 libstdc++-v3/testsuite/20_util/from_chars/5.cc| 8 
 libstdc++-v3/testsuite/20_util/from_chars/6.cc| 2 +-
 libstdc++-v3/testsuite/20_util/variant/86874.cc   | 4 ++--
 .../testsuite/25_algorithms/lower_bound/partitioned.cc| 4 ++--
 .../random/discrete_distribution/cons/range.cc| 4 ++--
 .../random/piecewise_constant_distribution/cons/range.cc  | 4 ++--
 .../random/piecewise_linear_distribution/cons/range.cc| 2 +-
 libstdc++-v3/testsuite/26_numerics/valarray/transcend.cc  | 2 +-
 .../27_io/basic_istream/extractors_arithmetic/char/01.cc  | 2 +-
 .../27_io/basic_istream/extractors_arithmetic/char/09.cc  | 2 +-
 .../27_io/basic_istream/extractors_arithmetic/char/10.cc  | 2 +-
 .../basic_istream/extractors_arithmetic/wchar_t/01.cc | 2 +-
 .../basic_istream/extractors_arithmetic/wchar_t/09.cc | 2 +-
 .../basic_istream/extractors_arithmetic/wchar_t/10.cc | 2 +-
 .../testsuite/ext/random/hoyt_distribution/cons/parms.cc  | 2 +-
 16 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/libstdc++-v3/testsuite/20_util/duration/cons/2.cc 
b/libstdc++-v3/testsuite/20_util/duration/cons/2.cc
index 95ec17be76e..4ee07a9a954 100644
--- a/libstdc++-v3/testsuite/20_util/duration/cons/2.cc
+++ b/libstdc++-v3/testsuite/20_util/duration/cons/2.cc
@@ -97,7 +97,7 @@ test01()
   duration d1_copy(d1);
   VERIFY(d1.count() * 1000 == d1_copy.count());
   
-  duration d2(8.0);
+  duration d2(85000);
   duration d2_copy(d2);
   VERIFY(d2.count() == d2_copy.count() * 1000.0);
   
@@ -105,7 +105,7 @@ test01()
   duration d3_copy(d3);
   VERIFY(d3.count() * 1000 == d3_copy.count());
   
-  duration d4(5.0);
+  duration d4(5);
   duration d4_copy(d4);
   VERIFY(d4.count() == d4_copy.count() * dbl_emulator(1000.0));
 }
diff --git a/libstdc++-v3/testsuite/20_util/from_chars/5.cc 
b/libstdc++-v3/testsuite/20_util/from_chars/5.cc
index db0976c33d3..291ebf90fa0 100644
--- a/libstdc++-v3/testsuite/20_util/from_chars/5.cc
+++ b/libstdc++-v3/testsuite/20_util/from_chars/5.cc
@@ -44,7 +44,7 @@ test01()
   r = std::from_chars(s.data(), s.data() + s.length(), d, fmt);
   VERIFY( r.ec == std::errc::invalid_argument );
   VERIFY( r.ptr == s.data() );
-  VERIFY( d == 3.2 );
+  VERIFY( d == (double) 3.2 );
 }
   }
 
@@ -57,7 +57,7 @@ test01()
   r = std::from_chars(s.data(), s.data() + s.length(), d, fmt);
   VERIFY( r.ec == std::errc::invalid_argument );
   VERIFY( r.ptr == s.data() );
-  VERIFY( d == 3.2 );
+  VERIFY( d == (double) 3.2 );
 }
   }
 
@@ -69,7 +69,7 @@ test01()
std::chars_format::scientific);
 VERIFY( r.ec == std::errc::invalid_argument );
 VERIFY( r.ptr == s.data() );
-VERIFY( d == 3.2 );
+VERIFY( d == (double) 3.2 );
   }
 
   // patterns that are invalid without the final character
@@ -83,7 +83,7 @@ test01()
   r = std::from_chars(s.data(), s.data() + s.length() - 1, d, fmt);
   VERIFY( r.ec == std::errc::invalid_argument );
   VERIFY( r.ptr == s.data() );
-  VERIFY( d == 3.2 );
+ 

Re: [PATCH 0/3] aarch64: ls64 builtin fixes [PR110100,PR110132]

2023-06-07 Thread Richard Sandiford via Gcc-patches
Alex Coplan  writes:
> Hi,
>
> This patch series fixes various defects with the FEAT_LS64 ACLE
> implementation in the AArch64 backend.
>
> The series is organised as follows:
>
>  - Patch 1/3 fixes whitespace errors in the existing code.
>  - Patch 2/3 fixes PR110100 where we generate wrong code for the st64b
>builtin.
>  - Patch 3/3 fixes PR110132, allowing the compiler to define the ACLE builtins
>directly, and also makes the builtins work under LTO.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu. OK for trunk
> and backports back to GCC 12?

Yeah, OK for trunk and branches.  Thanks for fixing this.

Richard


Re: [PATCH] analyzer: Standalone OOB-warning [PR109437, PR109439]

2023-06-07 Thread David Malcolm via Gcc-patches
On Wed, 2023-06-07 at 13:38 +0200, Benjamin Priour wrote:
> On Tue, Jun 6, 2023 at 8:37 PM David Malcolm 
> wrote:
> > 
> > On Tue, 2023-06-06 at 18:05 +0200, Benjamin Priour wrote:
> 
> [...]
> 
> > [Looks like you droppped the mailing list from the recipients; was
> > that
> > intentional?]
> > 
> 
> Not at all, just me missing the reply all button.
> 
> > > 
> > > I indeed bootstrapped and regtested on linux-x86_64, but it was
> > > last
> > > week, since I'm still using my laptop, which is painfully slow 
> > > (1
> > > night per step), my tests are always a few days old.
> > 
> > Thanks.  The patch is OK for trunk once the minor formatting nits
> > are
> > fixed (you don't have to bother with a full test run for that).  We
> > might want to backport it to gcc 13 as well, but let's let it
> > "soak" in
> > trunk for some time first.
> > 
> > > We discussed it already but yes, in the end I believe an account
> > > on
> > > the compile farm will be necessary for me.
> > 
> > Let me know if you need any help with that.
> 
> I'm not certain about what to put under "Contributions" in the
> account
> creation form.
> I'm still green behind the ears, and wouldn't count my current count
> of 2 patches
> *not yet pushed to trunk* as anything remarkable.

That's OK.  Let them know that you are a GSoC student working on GCC
this summer, and that I am sponsoring your request for usage of these
machines, as your GSoC mentor (you can link to this email in the
mailing list archives if need be).

> 
> > > I'll correct the formatting of the comments and resend it, and
> > > double
> > > check the indentation.
> > 
> > Thanks.
> 
> I said that but actually I am unsure about the indentation format.
> Is it spaces up to 6 characters them morph them into tabs ?
> It was looking like that in the code, although some portion were
> breaking this rule.
> I went with the same indentation rules as already shown within each
> function.

I believe it's 2 spaces for each indentation level, but every 8 spaces
becomes a tab, though from looking through
https://www.gnu.org/prep/standards/standards.html I couldn't find where
it specifies that.

> 
> > 
> > >  I'm still writing custom formatting rules for
> > > my gcc subfolders,
> > > but the formatter is sometimes switching back to my default rules
> > > instead of the workspace's.
> > 
> > Which formatter are these rules for, BTW?
> > 
> 
> I'm using vscode default C/Cpp extension's formatter.

(nods)

Thanks
Dave



RE: [PATCH] Handle FMA friendly in reassoc pass

2023-06-07 Thread Cui, Lili via Gcc-patches
Hi Di,

The compile options I use are: "-march=native -Ofast -funroll-loops -flto"
I re-ran 503, 507, and 527 on two neoverse-n1 machines, and found that one 
machine fluctuated greatly, and the score was only 70% of the other machine. I 
also couldn't reproduce the gain on the stable machine. For the 527 regression, 
I can't reproduce it and the data seems stable.

Regards,
Lili.

> -Original Message-
> From: Di Zhao OS 
> Sent: Wednesday, June 7, 2023 11:48 AM
> To: Cui, Lili ; gcc-patches@gcc.gnu.org
> Cc: richard.guent...@gmail.com; li...@linux.ibm.com
> Subject: RE: [PATCH] Handle FMA friendly in reassoc pass
> 
> Hello Lili Cui,
> 
> Since I'm also trying to improve this lately, I've tested your patch on 
> several
> aarch64 machines we have, including neoverse-n1 and ampere1
> architectures. However, I haven't reproduced the 6.00% improvement of
> 503.bwaves_r single copy run you mentioned. Could you share more
> information about the aarch64 CPU and compile options you tested? The
> option I'm using is "-Ofast", with or without "--param avoid-fma-max-
> bits=512".
> 
> Additionally, we found some spec2017 cases with regressions, including 4%
> on 527.cam4_r (neoverse-n1).
> 
> > -Original Message-
> > From: Gcc-patches  > bounces+dizhao=os.amperecomputing@gcc.gnu.org> On Behalf Of
> Cui,
> > bounces+Lili via
> > Gcc-patches
> > Sent: Thursday, May 25, 2023 7:30 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: richard.guent...@gmail.com; li...@linux.ibm.com; Lili Cui
> > 
> > Subject: [PATCH] Handle FMA friendly in reassoc pass
> >
> > From: Lili Cui 
> >
> > Make some changes in reassoc pass to make it more friendly to fma pass
> later.
> > Using FMA instead of mult + add reduces register pressure and
> > insruction retired.
> >
> > There are mainly two changes
> > 1. Put no-mult ops and mult ops alternately at the end of the queue,
> > which is conducive to generating more fma and reducing the loss of FMA
> > when breaking the chain.
> > 2. Rewrite the rewrite_expr_tree_parallel function to try to build
> > parallel chains according to the given correlation width, keeping the
> > FMA chance as much as possible.
> >
> > With the patch applied
> >
> > On ICX:
> > 507.cactuBSSN_r: Improved by 1.7% for multi-copy .
> > 503.bwaves_r   : Improved by  0.60% for single copy .
> > 507.cactuBSSN_r: Improved by  1.10% for single copy .
> > 519.lbm_r  : Improved by  2.21% for single copy .
> > no measurable changes for other benchmarks.
> >
> > On aarch64
> > 507.cactuBSSN_r: Improved by 1.7% for multi-copy.
> > 503.bwaves_r   : Improved by 6.00% for single-copy.
> > no measurable changes for other benchmarks.
> >
> > TEST1:
> >
> > float
> > foo (float a, float b, float c, float d, float *e) {
> >return  *e  + a * b + c * d ;
> > }
> >
> > For "-Ofast -mfpmath=sse -mfma" GCC generates:
> > vmulss  %xmm3, %xmm2, %xmm2
> > vfmadd132ss %xmm1, %xmm2, %xmm0
> > vaddss  (%rdi), %xmm0, %xmm0
> > ret
> >
> > With this patch GCC generates:
> > vfmadd213ss   (%rdi), %xmm1, %xmm0
> > vfmadd231ss   %xmm2, %xmm3, %xmm0
> > ret
> >
> > TEST2:
> >
> > for (int i = 0; i < N; i++)
> > {
> >   a[i] += b[i]* c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i]
> > * l[i]
> > + m[i]* o[i] + p[i];
> > }
> >
> > For "-Ofast -mfpmath=sse -mfma"  GCC generates:
> > vmovapd e(%rax), %ymm4
> > vmulpd  d(%rax), %ymm4, %ymm3
> > addq$32, %rax
> > vmovapd c-32(%rax), %ymm5
> > vmovapd j-32(%rax), %ymm6
> > vmulpd  h-32(%rax), %ymm6, %ymm2
> > vmovapd a-32(%rax), %ymm6
> > vaddpd  p-32(%rax), %ymm6, %ymm0
> > vmovapd g-32(%rax), %ymm7
> > vfmadd231pd b-32(%rax), %ymm5, %ymm3
> > vmovapd o-32(%rax), %ymm4
> > vmulpd  m-32(%rax), %ymm4, %ymm1
> > vmovapd l-32(%rax), %ymm5
> > vfmadd231pd f-32(%rax), %ymm7, %ymm2
> > vfmadd231pd k-32(%rax), %ymm5, %ymm1
> > vaddpd  %ymm3, %ymm0, %ymm0
> > vaddpd  %ymm2, %ymm0, %ymm0
> > vaddpd  %ymm1, %ymm0, %ymm0
> > vmovapd %ymm0, a-32(%rax)
> > cmpq$8192, %rax
> > jne .L4
> > vzeroupper
> > ret
> >
> > with this patch applied GCC breaks the chain with width = 2 and
> > generates 6
> > fma:
> >
> > vmovapd a(%rax), %ymm2
> > vmovapd c(%rax), %ymm0
> > addq$32, %rax
> > vmovapd e-32(%rax), %ymm1
> > vmovapd p-32(%rax), %ymm5
> > vmovapd g-32(%rax), %ymm3
> > vmovapd j-32(%rax), %ymm6
> > vmovapd l-32(%rax), %ymm4
> > vmovapd o-32(%rax), %ymm7
> > vfmadd132pd b-32(%rax), %ymm2, %ymm0
> > vfmadd132pd d-32(%rax), %ymm5, %ymm1
> > vfmadd231pd f-32(%rax), %ymm3, %ymm0
> > vfmadd231pd h-32(%rax), %ymm6, %ymm1
> > vfmadd231pd k-32(%rax), %ymm4, %ymm0
> > vfmadd231pd m-32(%rax), %ymm7, %ymm1
> > vaddpd  %ymm1, %ymm0, %ymm0
> > vmovapd %ymm0, a-32(%rax)
> > cmpq$8192, %rax
> > jne .L2
> > vzeroupper
> >   

[pushed] [PR109541] RA: Constrain class of pic offset table pseudo to general regs

2023-06-07 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109541

The patch was successfully bootstrapped and tested on x86-64, aarcha64, 
and ppc64le.




[PATCH][committed] aarch64: Represent SQXTUN with RTL operations

2023-06-07 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch removes UNSPEC_SQXTUN and uses organic RTL codes to represent the 
operation.
SQXTUN is an odd one. It's described in the architecture as "Signed saturating 
extract Unsigned Narrow".
It's not a straightforward ss_truncate nor a us_truncate.
It is a sort of truncating signed clamp operation with limits derived from the 
unsigned extrema of the narrow mode:
(truncate:N
  (smin:M
(smax:M (reg:M) (const_int 0))
(const_int )))

This patch implements these semantics. I've checked that the vqmovun tests in 
advsimd-intrinsics.exp
now get constant-folded and still pass validation, so I'm pretty confident in 
the semantics.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_sqmovun):
Rename to...
(*aarch64_sqmovun_insn): ... This.  Reimplement
with RTL codes.
(aarch64_sqmovun [SD_HSDI]): Reimplement with RTL codes.
(aarch64_sqxtun2_le): Likewise.
(aarch64_sqxtun2_be): Likewise.
(aarch64_sqxtun2): Adjust for the above.
(aarch64_sqmovun): New define_expand.
* config/aarch64/iterators.md (UNSPEC_SQXTUN): Delete.
(half_mask): New mode attribute.
* config/aarch64/predicates.md (aarch64_simd_umax_half_mode):
New predicate.


sqxtun.patch
Description: sqxtun.patch


[PATCH][committed] aarch64: Improve RTL representation of ADDP instructions

2023-06-07 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

Similar to the ADDLP instructions the non-widening ADDP ones can be
represented by adding the odd lanes with the even lanes of a vector.
These instructions take two vector inputs and the architecture spec
describes the operation as concatenating them together before going
through it with pairwise additions.
This patch chooses to represent ADDP on 64-bit and 128-bit input
vectors slightly differently, reasons explained in the comments
in aarhc64-simd.md.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_addp):
Reimplement as...
(aarch64_addp_insn): ... This...
(aarch64_addp_insn): ... And this.
(aarch64_addp): New define_expand.


addp-r.patch
Description: addp-r.patch


Re: [PATCH] libstdc++: Use AS_IF in configure.ac

2023-06-07 Thread Andreas Schwab via Gcc-patches
On Jun 07 2023, Jonathan Wakely via Gcc-patches wrote:

> Let's just revert it then. The manual says we should use AS_IF, but what we
> had previously was working well enough. I'll figure out what happened here
> later.

I think AS_IF is doing its job here: moving the expansion of
AC_REQUIRE'd macros out of the bodies.  But many of those expansions
actually need to remain under the $GLIBCXX_IS_NATIVE conditional, so it
is not appropriate at this place.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Tighten 'dg-warning' alternatives in 'c-c++-common/Wfree-nonheap-object{,-2,-3}.c' (was: [PATCH] correct -Wmismatched-new-delete (PR 98160, 98166))

2023-06-07 Thread Thomas Schwinge
Hi!

On 2020-12-08T13:46:32-0700, Martin Sebor via Gcc-patches 
 wrote:
> The attached changes [...]

... eventually became commit fe7f75cf16783589eedbab597e6d0b8d35d7e470
"Correct/improve maybe_emit_free_warning (PR middle-end/98166, PR c++/57111, PR 
middle-end/98160)".

>   * c-c++-common/Wfree-nonheap-object-2.c: New test.
>   * c-c++-common/Wfree-nonheap-object-3.c: New test.
>   * c-c++-common/Wfree-nonheap-object.c: New test.

OK to push the attached
"Tighten 'dg-warning' alternatives in 
'c-c++-common/Wfree-nonheap-object{,-2,-3}.c'"?


Grüße
 Thomas


> diff --git a/gcc/testsuite/c-c++-common/Wfree-nonheap-object-2.c 
> b/gcc/testsuite/c-c++-common/Wfree-nonheap-object-2.c
> new file mode 100644
> index 000..0aedf1babbc
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/Wfree-nonheap-object-2.c
> @@ -0,0 +1,52 @@
> +/* PR middle-end/98166: bogus -Wmismatched-dealloc on user-defined allocator
> +   and inlining
> +   Verify that the allocator can be declared inline without a warning when
> +   it's associated with a standard deallocator.  Associating an inline
> +   deallocator with an allocator would cause false positives when the former
> +   calls a deallocation function the allocator isn't associated with, so
> +   that triggers a warning on declaration.
> +   { dg-do compile }
> +   { dg-options "-O2 -Wall" } */
> +
> +__attribute__ ((malloc (__builtin_free)))
> +inline int*
> +alloc_int (int n)
> +{
> +  return (int*)__builtin_malloc (n + sizeof (int));
> +}
> +
> +void test_nowarn_int (int n)
> +{
> +  {
> +int *p = alloc_int (n);
> +__builtin_free (p);
> +  }
> +
> +  {
> +int *p = alloc_int (n);
> +__builtin_free (p + 1);   // { dg-warning "\\\[-Wfree-nonheap-object" }
> +  }
> +}
> +
> +
> +inline void
> +dealloc_long (long *p)
> +{
> +  __builtin_free (p); // { dg-warning "'__builtin_free|void 
> __builtin_free\\(void\\*\\)' called on pointer 'p|' with nonzero 
> offset" }
> +}
> +
> +__attribute__ ((malloc (dealloc_long)))
> +long* alloc_long (int);   // { dg-warning "'malloc \\\(dealloc_long\\\)' 
> attribute ignored with deallocation functions declared 'inline'" }
> +
> +void test_nowarn_long (int n)
> +{
> +  {
> +long *p = alloc_long (n);
> +dealloc_long (p);
> +  }
> +
> +  {
> +long *p = alloc_long (n);
> +dealloc_long (p + 1);
> +  }
> +}
> diff --git a/gcc/testsuite/c-c++-common/Wfree-nonheap-object-3.c 
> b/gcc/testsuite/c-c++-common/Wfree-nonheap-object-3.c
> new file mode 100644
> index 000..41a5b50362e
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/Wfree-nonheap-object-3.c
> @@ -0,0 +1,70 @@
> +/* PR middle-end/98166: bogus -Wmismatched-dealloc on user-defined allocator
> +   and inlining
> +   Verify that without inlining, both the allocator and the deallocator
> +   can be declared inline without a warning and that mismatched calls are
> +   detected, but that declaring them always_inline does trigger a warning.
> +   { dg-do compile }
> +   { dg-options "-Wall" } */
> +
> +__attribute__ ((malloc (__builtin_free)))
> +inline int*
> +alloc_int (int n)
> +{
> +  return (int*)__builtin_malloc (n + sizeof (int));
> +}
> +
> +void test_nowarn_int (int n)
> +{
> +  {
> +int *p = alloc_int (n);
> +__builtin_free (p);
> +  }
> +
> +  {
> +int *p = alloc_int (n);
> +__builtin_free (p + 1);   // { dg-warning "'__builtin_free|void 
> __builtin_free\\(void\\*\\)' called on pointer 'p|' with nonzero 
> offset" }
> +  }
> +}
> +
> +
> +inline void
> +dealloc_long (long *p) { __builtin_free (p); }
> +
> +__attribute__ ((malloc (dealloc_long)))
> +long* alloc_long (int);
> +
> +void test_nowarn_long (int n)
> +{
> +  {
> +long *p = alloc_long (n);
> +dealloc_long (p);
> +  }
> +
> +  {
> +long *p = alloc_long (n);
> +dealloc_long (p + 1); // { dg-warning "'dealloc_long' called on 
> pointer 'p|' with nonzero offset" }
> +  }
> +}
> +
> +
> +inline __attribute__ ((always_inline)) void
> +dealloc_float (float *p)  // { dg-message "deallocation function 
> declared here" }
> +{
> +  __builtin_free (p); // { dg-warning "'__builtin_free|void 
> __builtin_free\\(void\\*\\)' called on pointer 'p|' with nonzero 
> offset" }
> +}
> +
> +__attribute__ ((malloc (dealloc_float)))
> +float* alloc_float (int); // { dg-warning "'malloc \\(dealloc_float\\)' 
> attribute ignored with deallocation functions declared 'inline'" }
> +
> +void test_nowarn_float (int n)
> +{
> +  {
> +float *p = alloc_float (n);
> +dealloc_float (p);
> +  }
> +
> +  {
> +float *p = alloc_float (n);
> +dealloc_float (p + 2);
> +  }
> +}
> diff --git a/gcc/testsuite/c-c++-common/Wfree-nonheap-object.c 
> b/gcc/testsuite/c-c++-common/Wfree-nonheap-object.c
> new file mode 100644
> index 000..dfbb296e9a7
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/Wfree-nonheap-object.c
> @@ -0,0 +1,50 @@
> +/* Verify that built-in forms of functions can be used interchangeably
> +   with 

Re: [PATCH] libstdc++: Use AS_IF in configure.ac

2023-06-07 Thread Jonathan Wakely via Gcc-patches
On Wed, 7 Jun 2023 at 15:54, Jonathan Wakely  wrote:

>
>
> On Wed, 7 Jun 2023 at 15:42, Hans-Peter Nilsson  wrote:
>
>> > Date: Tue, 6 Jun 2023 16:30:12 +0100
>> > From: Jonathan Wakely via Gcc-patches 
>>
>> > On Thu, 1 Jun 2023 at 16:59, Jonathan Wakely via Libstdc++ <
>> > libstd...@gcc.gnu.org> wrote:
>> >
>> > > Tested x86_64-linux. I'd appreciate a second set of eyeballs on this
>> > > before I push it.
>> > >
>> >
>> > Pushed to trunk now.
>>
>> ...as r14-1581-g97a5e8a2a48d16, after which (apparently)
>> *all* linking libstdc++ tests for cris-elf (a "newlib
>> target") get (for example):
>>
>> FAIL: 17_intro/freestanding.cc (test for excess errors)
>> Excess errors:
>> /x/cris-elf/pre/cris-elf/bin/ld: cannot find -liconv: No such file or
>> directory
>>
>> (deduced from libstdc++.log and the commits in the range
>> ce2188e4320c..585c660f041c where 4144 regressions in
>> libstdc++ were introduced for cris-elf)
>>
>
> Gah. I tested building cris-elf but didn't run any tests.
>
> I *thought* I compared the configure results before and after the patch
> too, but I guess I missed something, or it didn't show up where I looked.
>
>
>
>> From the generated configure and a brief RTFM for AS_IF, it
>> looks almost like AS_IF was "miscompiled" and behaving
>> literally AS_IF (!) in that the condition TEST1 (here
>> [$GLIBCXX_IS_NATIVE] seems to be emitted *after* the
>> RUN-IF-TRUE1 clause (the next 31 lines).  Not obvious what
>> went wrong.  I even tried regenerating configure.  HTH.
>>
>>
> Let's just revert it then. The manual says we should use AS_IF, but what
> we had previously was working well enough. I'll figure out what happened
> here later.
>

Reverted as  r14-1607-g000f8b9a6a0ec7


Re: [PATCH] libstdc++: Use AS_IF in configure.ac

2023-06-07 Thread Jonathan Wakely via Gcc-patches
On Wed, 7 Jun 2023 at 15:42, Hans-Peter Nilsson  wrote:

> > Date: Tue, 6 Jun 2023 16:30:12 +0100
> > From: Jonathan Wakely via Gcc-patches 
>
> > On Thu, 1 Jun 2023 at 16:59, Jonathan Wakely via Libstdc++ <
> > libstd...@gcc.gnu.org> wrote:
> >
> > > Tested x86_64-linux. I'd appreciate a second set of eyeballs on this
> > > before I push it.
> > >
> >
> > Pushed to trunk now.
>
> ...as r14-1581-g97a5e8a2a48d16, after which (apparently)
> *all* linking libstdc++ tests for cris-elf (a "newlib
> target") get (for example):
>
> FAIL: 17_intro/freestanding.cc (test for excess errors)
> Excess errors:
> /x/cris-elf/pre/cris-elf/bin/ld: cannot find -liconv: No such file or
> directory
>
> (deduced from libstdc++.log and the commits in the range
> ce2188e4320c..585c660f041c where 4144 regressions in
> libstdc++ were introduced for cris-elf)
>

Gah. I tested building cris-elf but didn't run any tests.

I *thought* I compared the configure results before and after the patch
too, but I guess I missed something, or it didn't show up where I looked.



> From the generated configure and a brief RTFM for AS_IF, it
> looks almost like AS_IF was "miscompiled" and behaving
> literally AS_IF (!) in that the condition TEST1 (here
> [$GLIBCXX_IS_NATIVE] seems to be emitted *after* the
> RUN-IF-TRUE1 clause (the next 31 lines).  Not obvious what
> went wrong.  I even tried regenerating configure.  HTH.
>
>
Let's just revert it then. The manual says we should use AS_IF, but what we
had previously was working well enough. I'll figure out what happened here
later.


Remove 'gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s' (was: [PATCH] add -Wmismatched-new-delete to middle end (PR 90629))

2023-06-07 Thread Thomas Schwinge
Hi!

On 2020-11-03T16:56:48-0700, Martin Sebor via Gcc-patches 
 wrote:
> Attached is a simple middle end implementation of detection of
> mismatched pairs of calls to C++ new and delete, along with
> a substantially enhanced implementation of -Wfree-nonheap-object.

This eventually became commit dce6c58db87ebf7f4477bd3126228e73e497
"Add support for detecting mismatched allocation/deallocation calls".
Already in this original patch submission:

> diff --git a/gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s 
> b/gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s
> new file mode 100644
> index 000..e69de29bb2d

OK to push the attached
"Remove 'gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s'"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From d04c97b40a07bd2a3205d9de8577024f5d26aba0 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 7 Jun 2023 16:01:39 +0200
Subject: [PATCH] Remove 'gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s'

..., which, presumably, was added by mistake in
commit dce6c58db87ebf7f4477bd3126228e73e497
"Add support for detecting mismatched allocation/deallocation calls".

	gcc/testsuite/
	* g++.dg/warn/Wfree-nonheap-object.s: Remove.
---
 gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s

diff --git a/gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s b/gcc/testsuite/g++.dg/warn/Wfree-nonheap-object.s
deleted file mode 100644
index e69de29bb2d..000
-- 
2.34.1



[committed] Fix expected test output on hppa

2023-06-07 Thread Jeff Law via Gcc-patches
Recent changes in the hoisting code change the optimized gimple for the 
shadd-3 testcase on the PA.  That in turn changes the number of expected 
shadd instructions.


I'm not entirely sure the test is actually testing what we want anymore 
since I don't see a CSE for postreload to discover.  But I did verify 
that the number of shadd instructions is sane, so I just changed the 
count in the obvious way.


Pushed to the trunk.

Jeffcommit c0b88e9e8bbe15f0c2167371b49521b748c6da19
Author: Jeff Law 
Date:   Wed Jun 7 07:55:32 2023 -0600

Fix expected test output on hppa

Recent changes in the hoisting code change the optimized gimple for the
shadd-3 testcase on the PA.  That in turn changes the number of expected
 shadd instructions.

I'm not entirely sure the test is actually testing what we want anymore
since I don't see a CSE for postreload to discover.  But I did verify
that the number of shadd instructions is sane, so I just changed the
count in the obvious way.

gcc/testsuite
* gcc.target/hppa/shadd-3.c: Update expected output.

diff --git a/gcc/testsuite/gcc.target/hppa/shadd-3.c 
b/gcc/testsuite/gcc.target/hppa/shadd-3.c
index a0c1f663d56..2d0b648f384 100644
--- a/gcc/testsuite/gcc.target/hppa/shadd-3.c
+++ b/gcc/testsuite/gcc.target/hppa/shadd-3.c
@@ -10,7 +10,7 @@
over time we'll have to revisit the combine and/or postreload
dumps.  Note we have disabled delay slot filling to improve
test stability.  */
-/* { dg-final { scan-assembler-times "sh.add" 3 } }  */
+/* { dg-final { scan-assembler-times "sh.add" 4 } }  */
 
 extern void oof (void);
 typedef struct simple_bitmap_def *sbitmap;


Re: [PATCH] RISC-V: Add Veyron V1 pipeline description

2023-06-07 Thread Jeff Law via Gcc-patches




On 6/7/23 08:13, Kito Cheng wrote:
I would like vendor cpu name start with vendor name, like 
ventana-veyron-v1 which is consistent with all other vendor cpu, and 
llvm are using same convention too.
Fair enough.  Better to get it right now than have this stuff be 
inconsistent.  It'll be a little more pain for our internal folks, but 
we'll deal with that :-)


Jeff


Re: [PATCH] libstdc++: Use AS_IF in configure.ac

2023-06-07 Thread Hans-Peter Nilsson via Gcc-patches
> Date: Tue, 6 Jun 2023 16:30:12 +0100
> From: Jonathan Wakely via Gcc-patches 

> On Thu, 1 Jun 2023 at 16:59, Jonathan Wakely via Libstdc++ <
> libstd...@gcc.gnu.org> wrote:
> 
> > Tested x86_64-linux. I'd appreciate a second set of eyeballs on this
> > before I push it.
> >
> 
> Pushed to trunk now.

...as r14-1581-g97a5e8a2a48d16, after which (apparently)
*all* linking libstdc++ tests for cris-elf (a "newlib
target") get (for example):

FAIL: 17_intro/freestanding.cc (test for excess errors)
Excess errors:
/x/cris-elf/pre/cris-elf/bin/ld: cannot find -liconv: No such file or directory

(deduced from libstdc++.log and the commits in the range
ce2188e4320c..585c660f041c where 4144 regressions in
libstdc++ were introduced for cris-elf)

>From the generated configure and a brief RTFM for AS_IF, it
looks almost like AS_IF was "miscompiled" and behaving
literally AS_IF (!) in that the condition TEST1 (here
[$GLIBCXX_IS_NATIVE] seems to be emitted *after* the
RUN-IF-TRUE1 clause (the next 31 lines).  Not obvious what
went wrong.  I even tried regenerating configure.  HTH.

brgds, H-P


Re: [PATCH v2 0/3] RISC-V: Support ZC* extensions.

2023-06-07 Thread Kito Cheng via Gcc-patches
Thanks Jiawei, v2 patch set are LGTM, but I would like to defer this until
binutils part has merged, I know you guys already implement that for a
while, so I think it’s almost there :)

Jiawei 於 2023年6月7日 週三,20:57寫道:

> RISC-V Code Size Reduction(ZC*) extensions is a group of extensions
> which define subsets of the existing C extension (Zca, Zcd, Zcf) and new
> extensions(Zcb, Zcmp, Zcmt) which only contain 16-bit encodings.[1]
>
> The implementation of the RISC-V Code Size Reduction extension in GCC is
> an important step towards making the RISC-V architecture more efficient.
>
> The cooperation with OpenHW group has played a crucial role in this effort,
> with facilitating the implementation, testing and validation. Currently
> works can also find in OpenHW group's github repo.[2]
>
> Thanks to Tariq Kurd, Ibrahim Abu Kharmeh for help with explain the
> specification, and Jeremy Bennett's patient guidance throughout the whole
> development process.a
>
> V2 changes:
> Fix Kito's comments in first version, Eswin assisted in optimizing the
> implementation of Zcmp extension:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617440.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617442.html
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620869.html
>
>
> [1] github.com/riscv/riscv-code-size-reduction/tree/main/Zc-specification
>
> [2] github.com/openhwgroup/corev-gcc
>
> Co-Authored by: Charlie Keaney 
> Co-Authored by: Mary Bennett 
> Co-Authored by: Nandni Jamnadas 
> Co-Authored by: Sinan Lin 
> Co-Authored by: Simon Cook 
> Co-Authored by: Shihua Liao 
> Co-Authored by: Yulong Shi 
>
>   RISC-V: Minimal support for ZC extensions.
>   RISC-V: Enable compressible features when use ZC* extensions.
>   RISC-V: Add ZC* test for march args being passed.
>
>
> Jiawei (3):
>   RISC-V: Minimal support for ZC* extensions.
>   RISC-V: Enable compressible features when use ZC* extensions.
>   RISC-V: Add ZC* test for failed march args being passed.
>
>  gcc/common/config/riscv/riscv-common.cc   | 38 +++
>  gcc/config/riscv/riscv-c.cc   |  2 +-
>  gcc/config/riscv/riscv-opts.h | 16 ++
>  gcc/config/riscv/riscv-shorten-memrefs.cc |  3 +-
>  gcc/config/riscv/riscv.cc | 11 ---
>  gcc/config/riscv/riscv.h  |  2 +-
>  gcc/config/riscv/riscv.opt|  3 ++
>  gcc/testsuite/gcc.target/riscv/arch-22.c  |  5 +++
>  gcc/testsuite/gcc.target/riscv/arch-23.c  |  5 +++
>  9 files changed, 78 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-22.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-23.c
>
> --
> 2.25.1
>
>


Re: [PATCH] libgcc: Fix eh_frame fast path in find_fde_tail

2023-06-07 Thread Richard Biener via Gcc-patches
On Tue, Jun 6, 2023 at 11:53 AM Florian Weimer via Gcc-patches
 wrote:
>
> The eh_frame value is only used by linear_search_fdes, not the binary
> search directly in find_fde_tail, so the bug is not immediately
> apparent with most programs.
>
> Fixes commit e724b0480bfa5ec04f39be8c7290330b495c59de ("libgcc:
> Special-case BFD ld unwind table encodings in find_fde_tail").

OK.

> [I'd appreciate suggestions how I could add a test for this.  BFD ld
> does not seem to allow ommitting the binary search table.]
>
> libgcc/
>
> PR libgcc/109712
> * unwind-dw2-fde-dip.c (find_fde_tail): Correct fast path for
> parsing eh_frame.
>
> ---
>  libgcc/unwind-dw2-fde-dip.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
> index 6223f5f18a2..4e0b880513f 100644
> --- a/libgcc/unwind-dw2-fde-dip.c
> +++ b/libgcc/unwind-dw2-fde-dip.c
> @@ -403,8 +403,8 @@ find_fde_tail (_Unwind_Ptr pc,
>  BFD ld generates.  */
>signed value __attribute__ ((mode (SI)));
>memcpy (, p, sizeof (value));
> +  eh_frame = p + value;
>p += sizeof (value);
> -  dbase = value;   /* No adjustment because pcrel has base 0.  */
>  }
>else
>  p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc,
>
> base-commit: b327cbe8f4eefc91ee2bea49a1da7128adf30281
>


vect: Don't pass subtype to vect_widened_op_tree where not needed [PR 110142]

2023-06-07 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch fixes an issue introduced by 
g:2f482a07365d9f4a94a56edd13b7f01b8f78b5a0, where a subtype was beeing 
passed to vect_widened_op_tree, when no subtype was to be used. This 
lead to an errorneous use of IFN_VEC_WIDEN_MINUS.


gcc/ChangeLog:

* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Don't 
pass subtype to

vect_widened_op_tree and remove subtype parameter.
(vect_recog_widen_plus_pattern): Remove subtype parameter and 
dont pass to call to

vect_recog_widen_op_pattern.
(vect_recog_widen_minus_pattern): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr110142.c: New test.diff --git a/gcc/testsuite/gcc.dg/vect/pr110142.c 
b/gcc/testsuite/gcc.dg/vect/pr110142.c
new file mode 100644
index 
..a88dbe400f46a33a53649298345c24c569e2f567
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr110142.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+void test(short *x, unsigned short *y, int n)
+{
+  for (int i = 0; i < n; i++)
+  x[i] = (y[i] - x[i]) >> 1;
+}
+
+/* { dg-final { scan-tree-dump-not "widen_minus" "vect" } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
dc102c919352a0328cf86eabceb3a38c41a7e4fd..599a027f9b2feb8971c1ee017b6457bc297c86c2
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1405,15 +1405,14 @@ static gimple *
 vect_recog_widen_op_pattern (vec_info *vinfo,
 stmt_vec_info last_stmt_info, tree *type_out,
 tree_code orig_code, code_helper wide_code,
-bool shift_p, const char *name,
-optab_subtype *subtype = NULL)
+bool shift_p, const char *name)
 {
   gimple *last_stmt = last_stmt_info->stmt;
 
   vect_unpromoted_value unprom[2];
   tree half_type;
   if (!vect_widened_op_tree (vinfo, last_stmt_info, orig_code, orig_code,
-shift_p, 2, unprom, _type, subtype))
+shift_p, 2, unprom, _type))
 
 return NULL;
 
@@ -1484,13 +1483,11 @@ static gimple *
 vect_recog_widen_op_pattern (vec_info *vinfo,
 stmt_vec_info last_stmt_info, tree *type_out,
 tree_code orig_code, internal_fn wide_ifn,
-bool shift_p, const char *name,
-optab_subtype *subtype = NULL)
+bool shift_p, const char *name)
 {
   combined_fn ifn = as_combined_fn (wide_ifn);
   return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
- orig_code, ifn, shift_p, name,
- subtype);
+ orig_code, ifn, shift_p, name);
 }
 
 
@@ -1513,11 +1510,9 @@ static gimple *
 vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
   tree *type_out)
 {
-  optab_subtype subtype;
   return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
  PLUS_EXPR, IFN_VEC_WIDEN_PLUS,
- false, "vect_recog_widen_plus_pattern",
- );
+ false, "vect_recog_widen_plus_pattern");
 }
 
 /* Try to detect subtraction on widened inputs, converting MINUS_EXPR
@@ -1526,11 +1521,9 @@ static gimple *
 vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
   tree *type_out)
 {
-  optab_subtype subtype;
   return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
  MINUS_EXPR, IFN_VEC_WIDEN_MINUS,
- false, "vect_recog_widen_minus_pattern",
- );
+ false, "vect_recog_widen_minus_pattern");
 }
 
 /* Function vect_recog_ctz_ffs_pattern


Re: [PATCH] RISC-V: Add Veyron V1 pipeline description

2023-06-07 Thread Kito Cheng via Gcc-patches
I would like vendor cpu name start with vendor name, like ventana-veyron-v1
which is consistent with all other vendor cpu, and llvm are using same
convention too.

Raphael Moreira Zinsly 於 2023年6月7日 週三,21:18寫道:

> gcc/ChangeLog:
>
> * config/riscv/riscv-cores.def: Add veyron-v1
> core and tune info.
> * config/riscv/riscv-opts.h
> (riscv_microarchitecture_type): Add veyron-v1.
> * config/riscv/riscv.cc (veyron_v1_tune_info): New.
> * config/riscv/riscv.md: Include veyron-v1.md.
> (tune): Add veyron-v1.
> * config/riscv/veyron-v1.md: New file.
> * doc/invoke.texi (mcpu): Add veyron-v1.
> (mtune): Add veyron-v1.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/divmod-2.c: Enable test for veyron-v1.
> ---
>  gcc/config/riscv/riscv-cores.def  |   4 +
>  gcc/config/riscv/riscv-opts.h |   3 +-
>  gcc/config/riscv/riscv.cc |  15 +++
>  gcc/config/riscv/riscv.md |   3 +-
>  gcc/config/riscv/veyron-v1.md | 121 ++
>  gcc/doc/invoke.texi   |   5 +-
>  gcc/testsuite/gcc.target/riscv/divmod-2.c |   5 +-
>  7 files changed, 149 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/config/riscv/veyron-v1.md
>
> diff --git a/gcc/config/riscv/riscv-cores.def
> b/gcc/config/riscv/riscv-cores.def
> index 7d87ab7ce28..4078439e562 100644
> --- a/gcc/config/riscv/riscv-cores.def
> +++ b/gcc/config/riscv/riscv-cores.def
> @@ -38,6 +38,7 @@ RISCV_TUNE("sifive-3-series", generic, rocket_tune_info)
>  RISCV_TUNE("sifive-5-series", generic, rocket_tune_info)
>  RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info)
>  RISCV_TUNE("thead-c906", generic, thead_c906_tune_info)
> +RISCV_TUNE("veyron-v1", veyron_v1, veyron_v1_tune_info)
>  RISCV_TUNE("size", generic, optimize_size_tune_info)
>
>  #undef RISCV_TUNE
> @@ -77,4 +78,7 @@ RISCV_CORE("thead-c906",
> "rv64imafdc_xtheadba_xtheadbb_xtheadbs_xtheadcmo_"
>   "xtheadcondmov_xtheadfmemidx_xtheadmac_"
>   "xtheadmemidx_xtheadmempair_xtheadsync",
>   "thead-c906")
> +
> +RISCV_CORE("veyron-v1",
>  "rv64imafdc_zba_zbb_zbc_zbs_zifencei_xventanacondops",
> + "veyron-v1")
>  #undef RISCV_CORE
> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> index f34ca993689..8f7dd84115f 100644
> --- a/gcc/config/riscv/riscv-opts.h
> +++ b/gcc/config/riscv/riscv-opts.h
> @@ -52,7 +52,8 @@ extern enum riscv_isa_spec_class riscv_isa_spec;
>  /* Keep this list in sync with define_attr "tune" in riscv.md.  */
>  enum riscv_microarchitecture_type {
>generic,
> -  sifive_7
> +  sifive_7,
> +  veyron_v1
>  };
>  extern enum riscv_microarchitecture_type riscv_microarchitecture;
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 3954fc07a8b..6a5e89b4813 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -366,6 +366,21 @@ static const struct riscv_tune_param
> thead_c906_tune_info = {
>false/* use_divmod_expansion */
>  };
>
> +/* Costs to use when optimizing for Ventana Micro Veyron V1.  */
> +static const struct riscv_tune_param veyron_v1_tune_info = {
> +  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},  /* fp_add */
> +  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},  /* fp_mul */
> +  {COSTS_N_INSNS (9), COSTS_N_INSNS (17)}, /* fp_div */
> +  {COSTS_N_INSNS (4), COSTS_N_INSNS (4)},  /* int_mul */
> +  {COSTS_N_INSNS (12), COSTS_N_INSNS (20)},/* int_div */
> +  4,   /* issue_rate */
> +  4,   /* branch_cost */
> +  5,   /* memory_cost */
> +  8,   /* fmv_cost */
> +  false,   /* slow_unaligned_access */
> +  true,/* use_divmod_expansion */
> +};
> +
>  /* Costs to use when optimizing for size.  */
>  static const struct riscv_tune_param optimize_size_tune_info = {
>{COSTS_N_INSNS (1), COSTS_N_INSNS (1)},  /* fp_add */
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 124d8c95804..90f0c1b1cf1 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -482,7 +482,7 @@
>  ;; Microarchitectures we know how to tune for.
>  ;; Keep this in sync with enum riscv_microarchitecture.
>  (define_attr "tune"
> -  "generic,sifive_7"
> +  "generic,sifive_7,veyron_v1"
>(const (symbol_ref "((enum attr_tune) riscv_microarchitecture)")))
>
>  ;; Describe a user's asm statement.
> @@ -3123,3 +3123,4 @@
>  (include "sifive-7.md")
>  (include "thead.md")
>  (include "vector.md")
> +(include "veyron-v1.md")
> diff --git a/gcc/config/riscv/veyron-v1.md b/gcc/config/riscv/veyron-v1.md
> new file mode 100644
> index 

Re: [committed] Convert H8 port to LRA

2023-06-07 Thread Andrew Pinski via Gcc-patches
On Sun, Jun 4, 2023 at 10:43 AM Jeff Law via Gcc-patches
 wrote:
>
> With Vlad's recent LRA fix to the elimination code, the H8 can be
> converted to LRA.

Could you update the h8300 entry on https://gcc.gnu.org/backends.html
for this change?

Thanks,
Andrew

>
> This patch has two changes of note.
>
> First, this turns Zz into a standard constraint.  This helps reloading
> for the H8/SX movqi pattern.
>
> Second, this drops the whole pattern for the SX bit memory operations.
> I can't see why those exist to begin with.  They should be handled by
> the standard bit manipulation patterns.   If someone wants to try and
> improve SX bit support, that'd be great and they can do so within the
> LRA framework :-)
>
> Pushed to the trunk...
>
> Jeff


[PATCH 3/3] aarch64: Allow compiler to define ls64 builtins [PR110132]

2023-06-07 Thread Alex Coplan via Gcc-patches
This patch refactors the ls64 builtins to allow the compiler to define them
directly instead of having wrapper functions in arm_acle.h. This should be not
only easier to maintain, but it makes two important correctness fixes:
 - It fixes PR110132, where the builtins ended up getting declared with
   invisible bindings in the C FE, so the FE ended up synthesizing
   incompatible implicit definitions for these builtins.
 - It allows the builtins to be used with LTO, which didn't work previously.

We also take the opportunity to add test coverage from C++ for these
builtins.

gcc/ChangeLog:

PR target/110132
* config/aarch64/aarch64-builtins.cc (aarch64_general_simulate_builtin):
New. Use it ...
(aarch64_init_ls64_builtins): ... here. Switch to declaring public ACLE
names for builtins.
(aarch64_general_init_builtins): Ensure we invoke the arm_acle.h
setup if in_lto_p, just like we do for SVE.
* config/aarch64/arm_acle.h: (__arm_ld64b): Delete.
(__arm_st64b): Delete.
(__arm_st64bv): Delete.
(__arm_st64bv0): Delete.

gcc/testsuite/ChangeLog:

PR target/110132
* lib/target-supports.exp (check_effective_target_aarch64_asm_FUNC_ok):
Extend to ls64.
* g++.target/aarch64/acle/acle.exp: New.
* g++.target/aarch64/acle/ls64.C: New test.
* g++.target/aarch64/acle/ls64_lto.C: New test.
* gcc.target/aarch64/acle/ls64_lto.c: New test.
* gcc.target/aarch64/acle/pr110132.c: New test.
---
 gcc/config/aarch64/aarch64-builtins.cc| 24 ++---
 gcc/config/aarch64/arm_acle.h | 33 -
 .../g++.target/aarch64/acle/acle.exp  | 35 +++
 gcc/testsuite/g++.target/aarch64/acle/ls64.C  | 10 ++
 .../g++.target/aarch64/acle/ls64_lto.C| 10 ++
 .../gcc.target/aarch64/acle/ls64_lto.c| 10 ++
 .../gcc.target/aarch64/acle/pr110132.c| 15 
 gcc/testsuite/lib/target-supports.exp |  2 +-
 8 files changed, 100 insertions(+), 39 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/acle/acle.exp
 create mode 100644 gcc/testsuite/g++.target/aarch64/acle/ls64.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/acle/ls64_lto.C
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/ls64_lto.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/pr110132.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index cb5828a70f4..fce95c34a7c 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -956,6 +956,16 @@ aarch64_general_add_builtin (const char *name, tree type, unsigned int code,
 			   NULL, attrs);
 }
 
+static tree
+aarch64_general_simulate_builtin (const char *name, tree fntype,
+  unsigned int code,
+  tree attrs = NULL_TREE)
+{
+  code = (code << AARCH64_BUILTIN_SHIFT) | AARCH64_BUILTIN_GENERAL;
+  return simulate_builtin_function_decl (input_location, name, fntype,
+	 code, NULL, attrs);
+}
+
 static const char *
 aarch64_mangle_builtin_scalar_type (const_tree type)
 {
@@ -1879,23 +1889,24 @@ aarch64_init_ls64_builtins (void)
   aarch64_init_ls64_builtins_types ();
 
   ls64_builtins_data data[4] = {
-{"__builtin_aarch64_ld64b", AARCH64_LS64_BUILTIN_LD64B,
+{"__arm_ld64b", AARCH64_LS64_BUILTIN_LD64B,
  build_function_type_list (ls64_arm_data_t,
 			   const_ptr_type_node, NULL_TREE)},
-{"__builtin_aarch64_st64b", AARCH64_LS64_BUILTIN_ST64B,
+{"__arm_st64b", AARCH64_LS64_BUILTIN_ST64B,
  build_function_type_list (void_type_node, ptr_type_node,
 			   ls64_arm_data_t, NULL_TREE)},
-{"__builtin_aarch64_st64bv", AARCH64_LS64_BUILTIN_ST64BV,
+{"__arm_st64bv", AARCH64_LS64_BUILTIN_ST64BV,
  build_function_type_list (uint64_type_node, ptr_type_node,
 			   ls64_arm_data_t, NULL_TREE)},
-{"__builtin_aarch64_st64bv0", AARCH64_LS64_BUILTIN_ST64BV0,
+{"__arm_st64bv0", AARCH64_LS64_BUILTIN_ST64BV0,
  build_function_type_list (uint64_type_node, ptr_type_node,
 			   ls64_arm_data_t, NULL_TREE)},
   };
 
   for (size_t i = 0; i < ARRAY_SIZE (data); ++i)
 aarch64_builtin_decls[data[i].code]
-  = aarch64_general_add_builtin (data[i].name, data[i].type, data[i].code);
+  = aarch64_general_simulate_builtin (data[i].name, data[i].type,
+	  data[i].code);
 }
 
 static void
@@ -2028,6 +2039,9 @@ aarch64_general_init_builtins (void)
 
   if (TARGET_MEMTAG)
 aarch64_init_memtag_builtins ();
+
+  if (in_lto_p)
+handle_arm_acle_h ();
 }
 
 /* Implement TARGET_BUILTIN_DECL for the AARCH64_BUILTIN_GENERAL group.  */
diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index e0ac591d2c8..3b6b63e6805 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -270,40 +270,7 @@ __ttest (void)
 #endif
 
 #ifdef __ARM_FEATURE_LS64
-#pragma GCC push_options

[PATCH 2/3] aarch64: Fix wrong code with st64b builtin [PR110100]

2023-06-07 Thread Alex Coplan via Gcc-patches
The st64b pattern incorrectly had an output constraint on the register
operand containing the destination address for the store, leading to
wrong code. This patch fixes that.

gcc/ChangeLog:

PR target/110100
* config/aarch64/aarch64-builtins.cc (aarch64_expand_builtin_ls64):
Use input operand for the destination address.
* config/aarch64/aarch64.md: Fix constraint on address operand.

gcc/testsuite/ChangeLog:

PR target/110100
* gcc.target/aarch64/acle/pr110100.c: New test.
---
 gcc/config/aarch64/aarch64-builtins.cc   | 2 +-
 gcc/config/aarch64/aarch64.md| 2 +-
 gcc/testsuite/gcc.target/aarch64/acle/pr110100.c | 7 +++
 3 files changed, 9 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/pr110100.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index a3ae1a8e99e..cb5828a70f4 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2519,7 +2519,7 @@ aarch64_expand_builtin_ls64 (int fcode, tree exp, rtx target)
   {
 	rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
 	rtx op1 = expand_normal (CALL_EXPR_ARG (exp, 1));
-	create_output_operand ([0], op0, DImode);
+	create_input_operand ([0], op0, DImode);
 	create_input_operand ([1], op1, V8DImode);
 	expand_insn (CODE_FOR_st64b, 2, ops);
 	return const0_rtx;
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 11d0d9c8eb6..ac39a4d683e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7928,7 +7928,7 @@ (define_insn "ld64b"
 )
 
 (define_insn "st64b"
-  [(set (mem:V8DI (match_operand:DI 0 "register_operand" "=r"))
+  [(set (mem:V8DI (match_operand:DI 0 "register_operand" "r"))
 	(unspec_volatile:V8DI [(match_operand:V8DI 1 "register_operand" "r")]
 	UNSPEC_ST64B)
   )]
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/pr110100.c b/gcc/testsuite/gcc.target/aarch64/acle/pr110100.c
new file mode 100644
index 000..f56d5e619e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/pr110100.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8.7-a -O2" } */
+#include 
+void do_st64b(data512_t data) {
+  __arm_st64b((void*)0x1000, data);
+}
+/* { dg-final { scan-assembler {mov\tx([123])?[0-9], 268435456} } } */


[PATCH 0/3] aarch64: ls64 builtin fixes [PR110100,PR110132]

2023-06-07 Thread Alex Coplan via Gcc-patches
Hi,

This patch series fixes various defects with the FEAT_LS64 ACLE
implementation in the AArch64 backend.

The series is organised as follows:

 - Patch 1/3 fixes whitespace errors in the existing code.
 - Patch 2/3 fixes PR110100 where we generate wrong code for the st64b
   builtin.
 - Patch 3/3 fixes PR110132, allowing the compiler to define the ACLE builtins
   directly, and also makes the builtins work under LTO.

Bootstrapped/regtested as a series on aarch64-linux-gnu. OK for trunk
and backports back to GCC 12?

Thanks,
Alex

Alex Coplan (3):
  aarch64: Fix whitespace in ls64 builtin implementation [PR110100]
  aarch64: Fix wrong code with st64b builtin [PR110100]
  aarch64: Allow compiler to define ls64 builtins [PR110132]

 gcc/config/aarch64/aarch64-builtins.cc| 88 +++
 gcc/config/aarch64/aarch64.md | 24 ++---
 gcc/config/aarch64/arm_acle.h | 33 ---
 .../g++.target/aarch64/acle/acle.exp  | 35 
 gcc/testsuite/g++.target/aarch64/acle/ls64.C  | 10 +++
 .../g++.target/aarch64/acle/ls64_lto.C| 10 +++
 .../gcc.target/aarch64/acle/ls64_lto.c| 10 +++
 .../gcc.target/aarch64/acle/pr110100.c|  7 ++
 .../gcc.target/aarch64/acle/pr110132.c| 15 
 gcc/testsuite/lib/target-supports.exp |  2 +-
 10 files changed, 151 insertions(+), 83 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/acle/acle.exp
 create mode 100644 gcc/testsuite/g++.target/aarch64/acle/ls64.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/acle/ls64_lto.C
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/ls64_lto.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/pr110100.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/pr110132.c


[PATCH] RISC-V: Add Veyron V1 pipeline description

2023-06-07 Thread Raphael Moreira Zinsly
gcc/ChangeLog:

* config/riscv/riscv-cores.def: Add veyron-v1
core and tune info.
* config/riscv/riscv-opts.h
(riscv_microarchitecture_type): Add veyron-v1.
* config/riscv/riscv.cc (veyron_v1_tune_info): New.
* config/riscv/riscv.md: Include veyron-v1.md.
(tune): Add veyron-v1.
* config/riscv/veyron-v1.md: New file.
* doc/invoke.texi (mcpu): Add veyron-v1.
(mtune): Add veyron-v1.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/divmod-2.c: Enable test for veyron-v1.
---
 gcc/config/riscv/riscv-cores.def  |   4 +
 gcc/config/riscv/riscv-opts.h |   3 +-
 gcc/config/riscv/riscv.cc |  15 +++
 gcc/config/riscv/riscv.md |   3 +-
 gcc/config/riscv/veyron-v1.md | 121 ++
 gcc/doc/invoke.texi   |   5 +-
 gcc/testsuite/gcc.target/riscv/divmod-2.c |   5 +-
 7 files changed, 149 insertions(+), 7 deletions(-)
 create mode 100644 gcc/config/riscv/veyron-v1.md

diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index 7d87ab7ce28..4078439e562 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -38,6 +38,7 @@ RISCV_TUNE("sifive-3-series", generic, rocket_tune_info)
 RISCV_TUNE("sifive-5-series", generic, rocket_tune_info)
 RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info)
 RISCV_TUNE("thead-c906", generic, thead_c906_tune_info)
+RISCV_TUNE("veyron-v1", veyron_v1, veyron_v1_tune_info)
 RISCV_TUNE("size", generic, optimize_size_tune_info)
 
 #undef RISCV_TUNE
@@ -77,4 +78,7 @@ RISCV_CORE("thead-c906",  
"rv64imafdc_xtheadba_xtheadbb_xtheadbs_xtheadcmo_"
  "xtheadcondmov_xtheadfmemidx_xtheadmac_"
  "xtheadmemidx_xtheadmempair_xtheadsync",
  "thead-c906")
+
+RISCV_CORE("veyron-v1",   
"rv64imafdc_zba_zbb_zbc_zbs_zifencei_xventanacondops",
+ "veyron-v1")
 #undef RISCV_CORE
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index f34ca993689..8f7dd84115f 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -52,7 +52,8 @@ extern enum riscv_isa_spec_class riscv_isa_spec;
 /* Keep this list in sync with define_attr "tune" in riscv.md.  */
 enum riscv_microarchitecture_type {
   generic,
-  sifive_7
+  sifive_7,
+  veyron_v1
 };
 extern enum riscv_microarchitecture_type riscv_microarchitecture;
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3954fc07a8b..6a5e89b4813 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -366,6 +366,21 @@ static const struct riscv_tune_param thead_c906_tune_info 
= {
   false/* use_divmod_expansion */
 };
 
+/* Costs to use when optimizing for Ventana Micro Veyron V1.  */
+static const struct riscv_tune_param veyron_v1_tune_info = {
+  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},  /* fp_add */
+  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},  /* fp_mul */
+  {COSTS_N_INSNS (9), COSTS_N_INSNS (17)}, /* fp_div */
+  {COSTS_N_INSNS (4), COSTS_N_INSNS (4)},  /* int_mul */
+  {COSTS_N_INSNS (12), COSTS_N_INSNS (20)},/* int_div */
+  4,   /* issue_rate */
+  4,   /* branch_cost */
+  5,   /* memory_cost */
+  8,   /* fmv_cost */
+  false,   /* slow_unaligned_access */
+  true,/* use_divmod_expansion */
+};
+
 /* Costs to use when optimizing for size.  */
 static const struct riscv_tune_param optimize_size_tune_info = {
   {COSTS_N_INSNS (1), COSTS_N_INSNS (1)},  /* fp_add */
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 124d8c95804..90f0c1b1cf1 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -482,7 +482,7 @@
 ;; Microarchitectures we know how to tune for.
 ;; Keep this in sync with enum riscv_microarchitecture.
 (define_attr "tune"
-  "generic,sifive_7"
+  "generic,sifive_7,veyron_v1"
   (const (symbol_ref "((enum attr_tune) riscv_microarchitecture)")))
 
 ;; Describe a user's asm statement.
@@ -3123,3 +3123,4 @@
 (include "sifive-7.md")
 (include "thead.md")
 (include "vector.md")
+(include "veyron-v1.md")
diff --git a/gcc/config/riscv/veyron-v1.md b/gcc/config/riscv/veyron-v1.md
new file mode 100644
index 000..3eeff76d9b0
--- /dev/null
+++ b/gcc/config/riscv/veyron-v1.md
@@ -0,0 +1,121 @@
+;; Scheduling pipeline description for Veyron V1 RISC-V.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either 

Re: [PATCH] libiberty: pex-unix.c: Make pex_unix_cleanup signature always match body.

2023-06-07 Thread Costas Argyris via Gcc-patches
Oh OK, thanks for the clarification.

Costas

On Wed, 7 Jun 2023 at 13:59, Jeff Law  wrote:

>
>
> On 6/7/23 04:21, Costas Argyris via Gcc-patches wrote:
> > I saw this while working on something else:
> >
> > pex_unix_cleanup signature doesn't always match the
> > body of the function in terms of ATTRIBUTE_UNUSED.
> > If the conditional code in the body is compiled, then
> > ATTRIBUTE_UNUSED isn't correct.
> >
> > This change makes it always match, thereby making it
> > a bit cleaner IMO.
> ATTRIBUTE_UNUSED is meant to be a "maybe unused" decoration.   I'd just
> leave it as-is.
>
> jeff
>


Re: [PATCH 1/2] Match: zero_one_valued_p should match 0 constants too

2023-06-07 Thread Jeff Law via Gcc-patches




On 6/7/23 01:12, Jakub Jelinek via Gcc-patches wrote:

  
+/* zero_one_valued_p will match when a value is known to be either

+   0 or 1 including the constant 0. */
  (match zero_one_valued_p
   @0
   (if (INTEGRAL_TYPE_P (type) && tree_nonzero_bits (@0) == 1)))


So perhaps instead change this to
&& wi::leu_p (tree_nonzero_bits (@0), 1)
I guess that would cover both cases without the extra conditional.  I'm 
fine with that approach too.  Consider it pre-approved if someone wants 
to make that change.


jeff


Re: [PATCH] libiberty: pex-unix.c: Make pex_unix_cleanup signature always match body.

2023-06-07 Thread Jeff Law via Gcc-patches




On 6/7/23 04:21, Costas Argyris via Gcc-patches wrote:

I saw this while working on something else:

pex_unix_cleanup signature doesn't always match the
body of the function in terms of ATTRIBUTE_UNUSED.
If the conditional code in the body is compiled, then
ATTRIBUTE_UNUSED isn't correct.

This change makes it always match, thereby making it
a bit cleaner IMO.
ATTRIBUTE_UNUSED is meant to be a "maybe unused" decoration.   I'd just 
leave it as-is.


jeff


[PATCH v2 3/3] RISC-V: Add ZC* test for failed march args being passed.

2023-06-07 Thread Jiawei
Add ZC* extensions march args tests for error input cases.

Co-Authored by: Nandni Jamnadas 
Co-Authored by: Jiawei 
Co-Authored by: Mary Bennett 
Co-Authored by: Simon Cook 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-22.c: New test.
* gcc.target/riscv/arch-23.c: New test.

---
 gcc/testsuite/gcc.target/riscv/arch-22.c | 5 +
 gcc/testsuite/gcc.target/riscv/arch-23.c | 5 +
 2 files changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-22.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-23.c

diff --git a/gcc/testsuite/gcc.target/riscv/arch-22.c 
b/gcc/testsuite/gcc.target/riscv/arch-22.c
new file mode 100644
index 000..3be4ade65a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-22.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64i_zcf -mabi=lp64" } */
+int foo() {}
+/* { dg-error "'-march=rv64i_zcf': zcf extension supports in rv32 only" "" { 
target *-*-* } 0 } */
+/* { dg-error "'-march=rv64i_zca_zcf': zcf extension supports in rv32 only" "" 
{ target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-23.c 
b/gcc/testsuite/gcc.target/riscv/arch-23.c
new file mode 100644
index 000..cecce06e474
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-23.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64if_zce -mabi=lp64" } */
+int foo() {}
+/* { dg-error "'-march=rv64if_zce': zcf extension supports in rv32 only" "" { 
target *-*-* } 0 } */
+/* { dg-error "'-march=rv64if_zca_zcb_zce_zcf_zcmp_zcmt': zcf extension 
supports in rv32 only" "" { target *-*-* } 0 } */
-- 
2.25.1



[PATCH v2 2/3] RISC-V: Enable compressible features when use ZC* extensions.

2023-06-07 Thread Jiawei
This patch enables the compressible features with ZC* extensions.

Since all ZC* extension depends on the Zca extension, it's sufficient to only
add the target Zca to extend the target RVC.

Co-Authored by: Mary Bennett 
Co-Authored by: Nandni Jamnadas 
Co-Authored by: Simon Cook 

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
Enable compressed builtins when ZC* extensions enabled.
* config/riscv/riscv-shorten-memrefs.cc:
Enable shorten_memrefs pass when ZC* extensions enabled.
* config/riscv/riscv.cc (riscv_compressed_reg_p):
Enable compressible registers when ZC* extensions enabled.
(riscv_rtx_costs): Allow adjusting rtx costs when ZC* extensions 
enabled.
(riscv_address_cost): Allow adjusting address cost when ZC* extensions 
enabled.
(riscv_first_stack_step): Allow compression of the register saves
without adding extra instructions.
* config/riscv/riscv.h (FUNCTION_BOUNDARY): Adjusts function boundary
 to 16 bits when ZC* extensions enabled.

---
 gcc/config/riscv/riscv-c.cc   |  2 +-
 gcc/config/riscv/riscv-shorten-memrefs.cc |  3 ++-
 gcc/config/riscv/riscv.cc | 11 +++
 gcc/config/riscv/riscv.h  |  2 +-
 4 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 6ad562dcb8b..2937c160071 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -47,7 +47,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 {
   builtin_define ("__riscv");
 
-  if (TARGET_RVC)
+  if (TARGET_RVC || TARGET_ZCA)
 builtin_define ("__riscv_compressed");
 
   if (TARGET_RVE)
diff --git a/gcc/config/riscv/riscv-shorten-memrefs.cc 
b/gcc/config/riscv/riscv-shorten-memrefs.cc
index 8f10d24ec39..6f2b973278e 100644
--- a/gcc/config/riscv/riscv-shorten-memrefs.cc
+++ b/gcc/config/riscv/riscv-shorten-memrefs.cc
@@ -65,7 +65,8 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
 {
-  return TARGET_RVC && riscv_mshorten_memrefs && optimize > 0;
+  return (TARGET_RVC || TARGET_ZCA)
+   && riscv_mshorten_memrefs && optimize > 0;
 }
   virtual unsigned int execute (function *);
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 21e7d3b3caa..3a07122bf6a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1176,7 +1176,8 @@ static bool
 riscv_compressed_reg_p (int regno)
 {
   /* x8-x15/f8-f15 are compressible registers.  */
-  return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
+  return ((TARGET_RVC  || TARGET_ZCA)
+ && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
  || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)));
 }
 
@@ -2416,7 +2417,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  /* When optimizing for size, make uncompressible 32-bit addresses
 more expensive so that compressible 32-bit addresses are
 preferred.  */
- if (TARGET_RVC && !speed && riscv_mshorten_memrefs && mode == SImode
+ if ((TARGET_RVC || TARGET_ZCA)
+ && !speed && riscv_mshorten_memrefs && mode == SImode
  && !riscv_compressed_lw_address_p (XEXP (x, 0)))
cost++;
 
@@ -2828,7 +2830,8 @@ riscv_address_cost (rtx addr, machine_mode mode,
 {
   /* When optimizing for size, make uncompressible 32-bit addresses more
* expensive so that compressible 32-bit addresses are preferred.  */
-  if (TARGET_RVC && !speed && riscv_mshorten_memrefs && mode == SImode
+  if ((TARGET_RVC || TARGET_ZCA)
+  && !speed && riscv_mshorten_memrefs && mode == SImode
   && !riscv_compressed_lw_address_p (addr))
 return riscv_address_insns (addr, mode, false) + 1;
   return riscv_address_insns (addr, mode, false);
@@ -5331,7 +5334,7 @@ riscv_first_stack_step (struct riscv_frame_info *frame, 
poly_int64 remaining_siz
   && remaining_const_size % IMM_REACH >= min_first_step)
 return remaining_const_size % IMM_REACH;
 
-  if (TARGET_RVC)
+  if (TARGET_RVC || TARGET_ZCA)
 {
   /* If we need two subtracts, and one is small enough to allow compressed
 loads and stores, then put that one first.  */
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 4541255a8ae..a507db61900 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -186,7 +186,7 @@ ASM_MISA_SPEC
 #define PARM_BOUNDARY BITS_PER_WORD
 
 /* Allocation boundary (in *bits*) for the code of a function.  */
-#define FUNCTION_BOUNDARY (TARGET_RVC ? 16 : 32)
+#define FUNCTION_BOUNDARY ((TARGET_RVC || TARGET_ZCA) ? 16 : 32)
 
 /* The smallest supported stack boundary the calling convention supports.  */
 #define STACK_BOUNDARY \
-- 
2.25.1



[PATCH v2 1/3] RISC-V: Minimal support for ZC* extensions.

2023-06-07 Thread Jiawei
This patch is the minimal support for ZC* extensions, include the extension
name, mask and target defination. Also define the dependencies with Zca
and Zce extension. Notes that all ZC* extensions depend on the Zca extension.
Zce includes all relevant ZC* extensions for microcontrollers using. Zce
will imply zcf when 'f' extension enabled in rv32.

Co-Authored by: Charlie Keaney 
Co-Authored by: Mary Bennett 
Co-Authored by: Nandni Jamnadas 
Co-Authored by: Simon Cook 
Co-Authored by: Sinan Lin 
Co-Authored by: Shihua Liao 
Co-Authored by: Yulong Shi 

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_subset_list::parse): New 
extensions.
* config/riscv/riscv-opts.h (MASK_ZCA): New mask.
(MASK_ZCB): Ditto.
(MASK_ZCE): Ditto.
(MASK_ZCF): Ditto.
(MASK_ZCD): Ditto.
(MASK_ZCMP): Ditto.
(MASK_ZCMT): Ditto.
(TARGET_ZCA): New target.
(TARGET_ZCB): Ditto.
(TARGET_ZCE): Ditto.
(TARGET_ZCF): Ditto.
(TARGET_ZCD): Ditto.
(TARGET_ZCMP): Ditto.
(TARGET_ZCMT): Ditto.
* config/riscv/riscv.opt: New target variable.

---
 gcc/common/config/riscv/riscv-common.cc | 38 +
 gcc/config/riscv/riscv-opts.h   | 16 +++
 gcc/config/riscv/riscv.opt  |  3 ++
 3 files changed, 57 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 3247d526c0a..89bdbef43a5 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -111,6 +111,16 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zhinx", "zhinxmin"},
   {"zhinxmin", "zfinx"},
 
+  {"zce",  "zca"},
+  {"zce",  "zcb"},
+  {"zce",  "zcmp"},
+  {"zce",  "zcmt"},
+  {"zcf",  "zca"},
+  {"zcd",  "zca"},
+  {"zcb",  "zca"},
+  {"zcmp", "zca"},
+  {"zcmt", "zca"},
+
   {NULL, NULL}
 };
 
@@ -224,6 +234,14 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 
   {"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"zca",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zcb",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zce",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zcf",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zcd",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zcmp", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zcmt", ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
 
@@ -1156,8 +1174,19 @@ riscv_subset_list::parse (const char *arch, location_t 
loc)
   subset_list->handle_implied_ext (itr);
 }
 
+  /* Zce only implies zcf when RV32 and 'f' extension exist.  */
+  if (subset_list->lookup ("zce") != NULL
+   && subset_list->m_xlen == 32
+   && subset_list->lookup ("f") != NULL
+   && subset_list->lookup ("zcf") == NULL)
+subset_list->add ("zcf", false);
+
   subset_list->handle_combine_ext ();
 
+  if (subset_list->lookup ("zcf") && subset_list->m_xlen == 64)
+error_at (loc, "%<-march=%s%>: zcf extension supports in rv32 only"
+ , arch);
+
   if (subset_list->lookup ("zfinx") && subset_list->lookup ("f"))
 error_at (loc, "%<-march=%s%>: z*inx conflicts with floating-point "
   "extensions", arch);
@@ -1271,6 +1300,15 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"zmmul", _options::x_riscv_zm_subext, MASK_ZMMUL},
 
+  /* Code-size reduction extensions.  */
+  {"zca", _options::x_riscv_zc_subext, MASK_ZCA},
+  {"zcb", _options::x_riscv_zc_subext, MASK_ZCB},
+  {"zce", _options::x_riscv_zc_subext, MASK_ZCE},
+  {"zcf", _options::x_riscv_zc_subext, MASK_ZCF},
+  {"zcd", _options::x_riscv_zc_subext, MASK_ZCD},
+  {"zcmp",_options::x_riscv_zc_subext, MASK_ZCMP},
+  {"zcmt",_options::x_riscv_zc_subext, MASK_ZCMT},
+
   {"svinval", _options::x_riscv_sv_subext, MASK_SVINVAL},
   {"svnapot", _options::x_riscv_sv_subext, MASK_SVNAPOT},
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 208a557b8ff..3429fc1218e 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -215,6 +215,22 @@ enum riscv_entity
 #define MASK_ZMMUL  (1 << 0)
 #define TARGET_ZMMUL((riscv_zm_subext & MASK_ZMMUL) != 0)
 
+#define MASK_ZCA  (1 << 0)
+#define MASK_ZCB  (1 << 1)
+#define MASK_ZCE  (1 << 2)
+#define MASK_ZCF  (1 << 3)
+#define MASK_ZCD  (1 << 4)
+#define MASK_ZCMP (1 << 5)
+#define MASK_ZCMT (1 << 6)
+
+#define TARGET_ZCA((riscv_zc_subext & MASK_ZCA) != 0)
+#define TARGET_ZCB((riscv_zc_subext & MASK_ZCB) != 0)
+#define TARGET_ZCE((riscv_zc_subext & MASK_ZCE) != 0)
+#define TARGET_ZCF((riscv_zc_subext & MASK_ZCF) != 0)
+#define TARGET_ZCD((riscv_zc_subext & MASK_ZCD) != 0)
+#define TARGET_ZCMP   ((riscv_zc_subext & MASK_ZCMP) != 0)
+#define TARGET_ZCMT   ((riscv_zc_subext & MASK_ZCMT) != 0)
+
 #define MASK_SVINVAL (1 << 0)
 #define MASK_SVNAPOT (1 << 1)
 
diff --git 

[PATCH v2 0/3] RISC-V: Support ZC* extensions.

2023-06-07 Thread Jiawei
RISC-V Code Size Reduction(ZC*) extensions is a group of extensions 
which define subsets of the existing C extension (Zca, Zcd, Zcf) and new
extensions(Zcb, Zcmp, Zcmt) which only contain 16-bit encodings.[1]

The implementation of the RISC-V Code Size Reduction extension in GCC is
an important step towards making the RISC-V architecture more efficient.

The cooperation with OpenHW group has played a crucial role in this effort,
with facilitating the implementation, testing and validation. Currently
works can also find in OpenHW group's github repo.[2]

Thanks to Tariq Kurd, Ibrahim Abu Kharmeh for help with explain the 
specification, and Jeremy Bennett's patient guidance throughout the whole 
development process.a

V2 changes:
Fix Kito's comments in first version, Eswin assisted in optimizing the 
implementation of Zcmp extension:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617440.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617442.html

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620869.html


[1] github.com/riscv/riscv-code-size-reduction/tree/main/Zc-specification

[2] github.com/openhwgroup/corev-gcc

Co-Authored by: Charlie Keaney 
Co-Authored by: Mary Bennett 
Co-Authored by: Nandni Jamnadas 
Co-Authored by: Sinan Lin 
Co-Authored by: Simon Cook 
Co-Authored by: Shihua Liao 
Co-Authored by: Yulong Shi 

  RISC-V: Minimal support for ZC extensions.
  RISC-V: Enable compressible features when use ZC* extensions.
  RISC-V: Add ZC* test for march args being passed.


Jiawei (3):
  RISC-V: Minimal support for ZC* extensions.
  RISC-V: Enable compressible features when use ZC* extensions.
  RISC-V: Add ZC* test for failed march args being passed.

 gcc/common/config/riscv/riscv-common.cc   | 38 +++
 gcc/config/riscv/riscv-c.cc   |  2 +-
 gcc/config/riscv/riscv-opts.h | 16 ++
 gcc/config/riscv/riscv-shorten-memrefs.cc |  3 +-
 gcc/config/riscv/riscv.cc | 11 ---
 gcc/config/riscv/riscv.h  |  2 +-
 gcc/config/riscv/riscv.opt|  3 ++
 gcc/testsuite/gcc.target/riscv/arch-22.c  |  5 +++
 gcc/testsuite/gcc.target/riscv/arch-23.c  |  5 +++
 9 files changed, 78 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-22.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-23.c

-- 
2.25.1



Re: Re: [PATCH V3] VECT: Add SELECT_VL support

2023-06-07 Thread 钟居哲
Hi, Richi. I have fixed data reference pointer part following your comments
Could you take a look at it ?
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620916.html 
Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-07 19:04
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH V3] VECT: Add SELECT_VL support
On Wed, 7 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi. Since SELECT_VL only apply on single-rgroup (ncopies == 1 && 
> vec_num == 1)
> Should I make SELECT_VL stuff out side the loop?
> 
> for (i = 0; i < vec_num; i++)
>   for (j = 0; j < ncopies; j++)
> 
 
No, but please put assertions into the iteration so it's obvious
the SELECT_VL doesn't reach there.
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-07 15:41
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH V3] VECT: Add SELECT_VL support
> On Mon, 5 Jun 2023, juzhe.zh...@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong 
> > 
> > Co-authored-by: Richard Sandiford
> > 
> > This patch address comments from Richard and rebase to trunk.
> > 
> > This patch is adding SELECT_VL middle-end support
> > allow target have target dependent optimization in case of
> > length calculation.
> > 
> > This patch is inspired by RVV ISA and LLVM:
> > https://reviews.llvm.org/D99750
> > 
> > The SELECT_VL is same behavior as LLVM "get_vector_length" with
> > these following properties:
> > 
> > 1. Only apply on single-rgroup.
> > 2. non SLP.
> > 3. adjust loop control IV.
> > 4. adjust data reference IV.
> > 5. allow non-vf elements processing in non-final iteration
> > 
> > Code:
> ># void vvaddint32(size_t n, const int*x, const int*y, int*z)
> > # { for (size_t i=0; i > 
> > Take RVV codegen for example:
> > 
> > Before this patch:
> > vvaddint32:
> > ble a0,zero,.L6
> > csrra4,vlenb
> > srlia6,a4,2
> > .L4:
> > mv  a5,a0
> > bleua0,a6,.L3
> > mv  a5,a6
> > .L3:
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a2)
> > vsetvli a7,zero,e32,m1,ta,ma
> > sub a0,a0,a5
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a3)
> > add a2,a2,a4
> > add a3,a3,a4
> > add a1,a1,a4
> > bne a0,zero,.L4
> > .L6:
> > ret
> > 
> > After this patch:
> > 
> > vvaddint32:
> > vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
> > vle32.v v0, (a1) # Get first vector
> >   sub a0, a0, t0 # Decrement number done
> >   slli t0, t0, 2 # Multiply number done by 4 bytes
> >   add a1, a1, t0 # Bump pointer
> > vle32.v v1, (a2) # Get second vector
> >   add a2, a2, t0 # Bump pointer
> > vadd.vv v2, v0, v1   # Sum vectors
> > vse32.v v2, (a3) # Store result
> >   add a3, a3, t0 # Bump pointer
> >   bnez a0, vvaddint32# Loop back
> >   ret# Finished
> > 
> > gcc/ChangeLog:
> > 
> > * doc/md.texi: Add SELECT_VL support.
> > * internal-fn.def (SELECT_VL): Ditto.
> > * optabs.def (OPTAB_D): Ditto.
> > * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
> > * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto.
> > * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto.
> > (vectorizable_store): Ditto.
> > (vectorizable_load): Ditto.
> > * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
> > 
> > Co-authored-by: Richard Sandiford
> > 
> > ---
> >  gcc/doc/md.texi | 22 
> >  gcc/internal-fn.def |  1 +
> >  gcc/optabs.def  |  1 +
> >  gcc/tree-vect-loop-manip.cc | 32 -
> >  gcc/tree-vect-loop.cc   | 72 +
> >  gcc/tree-vect-stmts.cc  | 66 ++
> >  gcc/tree-vectorizer.h   |  6 
> >  7 files changed, 191 insertions(+), 9 deletions(-)
> > 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 6a435eb4461..95f7fe1f802 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
> >operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
> >  @end smallexample
> >  
> > +@cindex @code{select_vl@var{m}} instruction pattern
> > +@item @code{select_vl@var{m}}
> > +Set operand 0 to the number of scalar iterations that should be handled
> > +by one iteration of a vector loop.  Operand 1 is the total number of
> > +scalar iterations that the loop needs to process and operand 2 is a
> > +maximum bound on the result (also known as the maximum ``vectorization
> > +factor'').
> > +
> > +The maximum value of operand 0 is given by:
> > +@smallexample
> > +operand0 = MIN (operand1, 

[PATCH V4] VECT: Add SELECT_VL support

2023-06-07 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Co-authored-by: Richard Sandiford
Co-authored-by: Richard Biener 

This patch address comments from Richard && Richi and rebase to trunk.

This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.

This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750

The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:

1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration

Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
Co-authored-by: Richard Biener 

---
 gcc/doc/md.texi | 22 ++
 gcc/internal-fn.def |  1 +
 gcc/optabs.def  |  1 +
 gcc/tree-vect-loop-manip.cc | 32 ++
 gcc/tree-vect-loop.cc   | 72 ++
 gcc/tree-vect-stmts.cc  | 87 -
 gcc/tree-vectorizer.h   |  6 +++
 7 files changed, 202 insertions(+), 19 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6a435eb4461..95f7fe1f802 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of scalar iterations that should be handled
+by one iteration of a vector loop.  Operand 1 is the total number of
+scalar iterations that the loop needs to process and operand 2 is a
+maximum bound on the result (also known as the maximum ``vectorization
+factor'').
+
+The maximum value of operand 0 is given by:
+@smallexample
+operand0 = MIN (operand1, operand2)
+@end smallexample
+However, targets might choose a lower value than this, based on
+target-specific criteria.  Each iteration of the vector loop might
+therefore process a different number of scalar iterations, which in turn
+means that induction variables will have a variable step.  Because of
+this, it is generally not useful to define this instruction if it will
+always calculate the maximum value.
+
+This optab is only useful on targets that implement @samp{len_load_@var{m}}
+and/or @samp{len_store_@var{m}}.
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 3ac9d82aace..5d638de6d06 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -177,6 +177,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
 
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
 DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 6c064ff4993..f31b69c5d85 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -488,3 +488,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
 OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
 OPTAB_D (len_load_optab, "len_load_$a")
 OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3f735945e67..1c8100c1a1c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   _10 = (unsigned long) count_12(D);
   ...
   # ivtmp_9 = PHI 
-  _36 = MIN_EXPR ;
+  _36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
-_gsi, insert_after, _before_incr,
-_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
-   index_before_incr,
-   nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   {
+ create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, _gsi,
+insert_after, _before_incr, _after_incr);
+ tree len = 

Re: Support 'UNSUPPORTED: [...]: exception handling disabled' for libstdc++ testing (was: Support in the GCC(/C++) test suites for '-fno-exceptions')

2023-06-07 Thread Jonathan Wakely via Gcc-patches
On Wed, 7 Jun 2023 at 10:08, Thomas Schwinge 
wrote:

> Hi!
>
> On 2023-06-07T09:12:31+0100, Jonathan Wakely  wrote:
> > On Wed, 7 Jun 2023 at 08:13, Thomas Schwinge wrote:
> >> On 2023-06-06T20:31:21+0100, Jonathan Wakely 
> wrote:
> >> > On Tue, 6 Jun 2023 at 20:14, Thomas Schwinge  >
> >> > wrote:
> >> >> This issue comes up in context of me working on C++ support for GCN
> and
> >> >> nvptx target.  Those targets shall default to '-fno-exceptions' --
> or,
> >> >> "in other words", '-fexceptions' is not supported.  (Details omitted
> >> >> here.)
> >> >>
> >> >> It did seem clear to me that with such a configuration it'll be hard
> to
> >> >> get clean test results.  Then I found code in
> >> >> 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':
> >> >>
> >> >> # If exceptions are disabled, mark tests expecting exceptions to
> be
> >> >> enabled
> >> >> # as unsupported.
> >> >> if { ![check_effective_target_exceptions_enabled] } {
> >> >> if [regexp "(^|\n)\[^\n\]*: error: exception handling
> disabled"
> >> >> $text] {
> >> >> return "::unsupported::exception handling disabled"
> >> >> }
> >> >>
> >> >> ..., which, in a way, sounds as if the test suite generally is meant
> to
> >> >> produce useful results for '-fno-exceptions', nice surprise!
> >> >>
> >> >> Running x86_64-pc-linux-gnu (not yet GCN, nvptx) 'make check' with:
> >> >>
> >> >> RUNTESTFLAGS='--target_board=unix/-fno-exceptions\{,-m32\}'
> >> >>
> >> >> ..., I find that indeed this does work for a lot of test cases,
> where we
> >> >> then get (random example):
> >> >>
> >> >>  PASS: g++.dg/coroutines/pr99710.C  (test for errors, line 23)
> >> >> -PASS: g++.dg/coroutines/pr99710.C (test for excess errors)
> >> >> +UNSUPPORTED: g++.dg/coroutines/pr99710.C: exception handling
> >> disabled
> >> >>
> >> >> ..., due to:
> >> >>
> >> >>  [...]/g++.dg/coroutines/pr99710.C: In function 'task my_coro()':
> >> >> +[...]/g++.dg/coroutines/pr99710.C:18:10: error: exception
> handling
> >> >> disabled, use '-fexceptions' to enable
> >> >>  [...]/g++.dg/coroutines/pr99710.C:23:7: error: await expressions
> >> are
> >> >> not permitted in handlers
> >> >>  compiler exited with status 1
> >> >>
> >> >> But, we're nowhere near clean test results: PASS -> FAIL as well as
> >> >> XFAIL -> XPASS regressions, due to 'error: exception handling
> disabled'
> >> >> precluding other diagnostics seems to be one major issue.
> >> >>
> >> >> Is there interest in me producing the obvious (?) changes to those
> test
> >> >> cases, such that compiler g++ as well as target library libstdc++
> test
> >> >> results are reasonably clean?  (If you think that's all "wasted
> effort",
> >> >> then I suppose I'll just locally ignore any FAILs/XPASSes/UNRESOLVEDs
> >> >> that appear in combination with
> >> >> 'UNSUPPORTED: [...]: exception handling disabled'.)
> >> >
> >> > I would welcome that for libstdc++.
> >>
> >> Assuming no issues found in testing, OK to push the attached
> >> "Support 'UNSUPPORTED: [...]: exception handling disabled' for libstdc++
> >> testing"?
> >> (Thanks, Jozef!)
> >
> > Yes please.
>
> Pushed commit r14-1604-g5faaabef3819434d13fcbf749bd07bfc98ca7c3c
> "Support 'UNSUPPORTED: [...]: exception handling disabled' for libstdc++
> testing"
> to master branch, as posted.
>
> For one-week-old GCC commit 2720bbd597f56742a17119dfe80edc2ba86af255,
> x86_64-pc-linux-gnu, I see no changes without '-fno-exceptions' (as
> expected), and otherwise:
>
> === libstdc++ Summary for
> [-unix-]{+unix/-fno-exceptions+} ===
>
> # of expected passes[-15044-]{+12877+}
> # of unexpected failures[-5-]{+10+}
> # of expected failures  [-106-]{+77+}
> {+# of unresolved testcases 6+}
> # of unsupported tests  [-747-]{+1846+}
>
> As expected, there's a good number of (random example):
>
> -PASS: 18_support/105387.cc (test for excess errors)
> -PASS: 18_support/105387.cc execution test
> +UNSUPPORTED: 18_support/105387.cc: exception handling disabled
>
> ..., plus the following:
>
> [-PASS:-]{+FAIL:+} 23_containers/vector/capacity/constexpr.cc (test
> for excess errors)
>
>
> [...]/libstdc++-v3/testsuite/23_containers/vector/capacity/constexpr.cc:101:
> error: non-constant condition for static assertion
> In file included from
> [...]/libstdc++-v3/testsuite/23_containers/vector/capacity/constexpr.cc:6:
>
> [...]/libstdc++-v3/testsuite/23_containers/vector/capacity/constexpr.cc:101:
>  in 'constexpr' expansion of 'test_shrink_to_fit()'
> [...]/libstdc++-v3/testsuite/util/testsuite_hooks.h:56: error:
> '__builtin_fprintf(stderr, ((const char*)"%s:%d: %s: Assertion \'%s\'
> failed.\012"), ((const
> char*)"[...]/libstdc++-v3/testsuite/23_containers/vector/capacity/constexpr.cc"),
> 92, ((const char*)"constexpr bool test_shrink_to_fit()"), ((const
> char*)"v.capacity() == 0"))' is not a constant expression
> 

Re: [PATCH] analyzer: Standalone OOB-warning [PR109437, PR109439]

2023-06-07 Thread Benjamin Priour via Gcc-patches
On Tue, Jun 6, 2023 at 8:37 PM David Malcolm  wrote:
>
> On Tue, 2023-06-06 at 18:05 +0200, Benjamin Priour wrote:

[...]

> [Looks like you droppped the mailing list from the recipients; was that
> intentional?]
>

Not at all, just me missing the reply all button.

> >
> > I indeed bootstrapped and regtested on linux-x86_64, but it was last
> > week, since I'm still using my laptop, which is painfully slow  (1
> > night per step), my tests are always a few days old.
>
> Thanks.  The patch is OK for trunk once the minor formatting nits are
> fixed (you don't have to bother with a full test run for that).  We
> might want to backport it to gcc 13 as well, but let's let it "soak" in
> trunk for some time first.
>
> > We discussed it already but yes, in the end I believe an account on
> > the compile farm will be necessary for me.
>
> Let me know if you need any help with that.

I'm not certain about what to put under "Contributions" in the account
creation form.
I'm still green behind the ears, and wouldn't count my current count
of 2 patches
*not yet pushed to trunk* as anything remarkable.

> > I'll correct the formatting of the comments and resend it, and double
> > check the indentation.
>
> Thanks.

I said that but actually I am unsure about the indentation format.
Is it spaces up to 6 characters them morph them into tabs ?
It was looking like that in the code, although some portion were
breaking this rule.
I went with the same indentation rules as already shown within each function.

>
> >  I'm still writing custom formatting rules for
> > my gcc subfolders,
> > but the formatter is sometimes switching back to my default rules
> > instead of the workspace's.
>
> Which formatter are these rules for, BTW?
>

I'm using vscode default C/Cpp extension's formatter.

[...]

Thanks,
Benjamin


[committed] testsuite/libgomp.*/target-present-*.{c, f90}: Improve and fix (was: Re: [og12] Fix 'libgomp.{c-c++-common, fortran}/target-present-*' test cases)

2023-06-07 Thread Tobias Burnus

This patch fixes a corner case issue (missing list items in a map clause)
and ensures that such an issue is caught.

Committed to mainline as https://gcc.gnu.org/r14-1605-gdd958667821e38

It is a forward port of Thomas' OG12 then OG13 commit which fixed
a run-time issue which the mainline version does not have; still fixing
the map issue (and doing the check-point check) is a good idea and,
hence, a likewise patch has now been applied to mainline as well.

OG13 commit: https://gcc.gnu.org/g:f719ab9a3ac51d798b012a5ab7757af2b81b4ae2
OG12 commit, see Thomas email earlier in this thread.

Tobias

On 15.02.23 20:02, Thomas Schwinge wrote:

On 2023-02-09T21:17:44+, Kwok Cheung Yeung  wrote:

[...]

I've pushed to devel/omp/gcc-12 branch
commit bbda035ee62ba4db21356136c97e9d83a97ba7d1
"Fix 'libgomp.{c-c++-common,fortran}/target-present-*' test cases",
see attached. [...]

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit dd958667821e38b7d6b8efe448044901b4762b3a
Author: Tobias Burnus 
Date:   Wed Jun 7 13:22:13 2023 +0200

testsuite/libgomp.*/target-present-*.{c,f90}: Improve and fix

One of the testcases lacked variables in a map clause such that
the fail occurred too early. Additionally, it would have failed
for all those non-host devices where 'present' is always true, i.e.
non-host devices which can access all of the host memory
(shared-memory devices). [There are currently none.]

The commit now runs the code on all devices, which should succeed
for host fallback and for shared-memory devices, finding potenial issues
that way. Additionally, a checkpoint (required stdout output) is used
to ensure that the execution won't fail (with the same error) before
reaching the expected fail location.

2023-06-07  Thomas Schwinge  
Tobias Burnus  

libgomp/
* testsuite/libgomp.c-c++-common/target-present-1.c: Run code
also for non-offload_device targets; check that it runs
successfully for those and for all until a checkpoint for all
* testsuite/libgomp.c-c++-common/target-present-2.c: Likewise.
* testsuite/libgomp.c-c++-common/target-present-3.c: Likewise.
* testsuite/libgomp.fortran/target-present-1.f90: Likewise.
* testsuite/libgomp.fortran/target-present-3.f90: Likewise.
* testsuite/libgomp.fortran/target-present-2.f90: Likewise;
add missing vars to map clause.
---
 libgomp/testsuite/libgomp.c-c++-common/target-present-1.c |  9 ++---
 libgomp/testsuite/libgomp.c-c++-common/target-present-2.c | 11 +++
 libgomp/testsuite/libgomp.c-c++-common/target-present-3.c |  9 +
 libgomp/testsuite/libgomp.fortran/target-present-1.f90|  9 +
 libgomp/testsuite/libgomp.fortran/target-present-2.f90| 13 +++--
 libgomp/testsuite/libgomp.fortran/target-present-3.f90|  9 +
 6 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-present-1.c b/libgomp/testsuite/libgomp.c-c++-common/target-present-1.c
index 12f154c91a8..aa343197e35 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/target-present-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-present-1.c
@@ -1,5 +1,4 @@
-/* { dg-do run { target offload_device } } */
-/* { dg-shouldfail "present error triggered" } */
+#include 
 
 #define N 100
 
@@ -18,8 +17,12 @@ int main (void)
   for (int i = 0; i < N; i++)
 	c[i] = a[i];
 
+fprintf (stderr, "CheCKpOInT\n");
+/* { dg-output "CheCKpOInT(\n|\r\n|\r).*" } */
+
 /* b has not been allocated, so this should result in an error.  */
-/* { dg-output "libgomp: present clause: not present on the device \\\(0x\[0-9a-f\]+, \[0-9\]+\\\)" } */
+/* { dg-output "libgomp: present clause: not present on the device \\\(0x\[0-9a-f\]+, \[0-9\]+\\\)" { target offload_device_nonshared_as } } */
+/* { dg-shouldfail "present error triggered" { offload_device_nonshared_as } } */
 #pragma omp target map (present, to: b)
   for (int i = 0; i < N; i++)
 	c[i] += b[i];
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-present-2.c b/libgomp/testsuite/libgomp.c-c++-common/target-present-2.c
index d4debbab10b..ad11023b2d6 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/target-present-2.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-present-2.c
@@ -1,5 +1,4 @@
-/* { dg-do run { target offload_device } } */
-/* { dg-shouldfail "present error triggered" } */
+#include 
 
 #define N 100
 
@@ -13,13 +12,17 @@ int main (void)
   }
 
   #pragma omp target enter data map (alloc: a, c)
-/* a has already been allocated, so this should be okay.  */
+/* a and c have already been allocated, 

[committed] testsuite/libgomp.*/target-present-*.{c, f90}: Improve and fix (was: Re: [og12] Fix 'libgomp.{c-c++-common, fortran}/target-present-*' test cases)

2023-06-07 Thread Tobias Burnus

This patch fixes a corner case issue (missing list items in a map clause)
and ensures that such an issue is caught.

Committed to mainline as https://gcc.gnu.org/r14-1605-gdd958667821e38

It is a forward port of Thomas' OG12 then OG13 commit which fixed
a run-time issue which the mainline version does not have; still fixing
the map issue (and doing the check-point check) is a good idea and,
hence, the patch was applied to mainline as well.

OG13 commit: https://gcc.gnu.org/g:f719ab9a3ac51d798b012a5ab7757af2b81b4ae2
OG12 commit, see Thomas email earlier in this thread.

Tobias

On 15.02.23 20:02, Thomas Schwinge wrote:

On 2023-02-09T21:17:44+, Kwok Cheung Yeung  wrote:

[...]

I've pushed to devel/omp/gcc-12 branch
commit bbda035ee62ba4db21356136c97e9d83a97ba7d1
"Fix 'libgomp.{c-c++-common,fortran}/target-present-*' test cases",
see attached. [...]

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: Re: [PATCH V3] VECT: Add SELECT_VL support

2023-06-07 Thread Richard Biener via Gcc-patches
On Wed, 7 Jun 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi. Since SELECT_VL only apply on single-rgroup (ncopies == 1 && 
> vec_num == 1)
> Should I make SELECT_VL stuff out side the loop?
> 
> for (i = 0; i < vec_num; i++)
>   for (j = 0; j < ncopies; j++)
> 

No, but please put assertions into the iteration so it's obvious
the SELECT_VL doesn't reach there.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-07 15:41
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH V3] VECT: Add SELECT_VL support
> On Mon, 5 Jun 2023, juzhe.zh...@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong 
> > 
> > Co-authored-by: Richard Sandiford
> > 
> > This patch address comments from Richard and rebase to trunk.
> > 
> > This patch is adding SELECT_VL middle-end support
> > allow target have target dependent optimization in case of
> > length calculation.
> > 
> > This patch is inspired by RVV ISA and LLVM:
> > https://reviews.llvm.org/D99750
> > 
> > The SELECT_VL is same behavior as LLVM "get_vector_length" with
> > these following properties:
> > 
> > 1. Only apply on single-rgroup.
> > 2. non SLP.
> > 3. adjust loop control IV.
> > 4. adjust data reference IV.
> > 5. allow non-vf elements processing in non-final iteration
> > 
> > Code:
> ># void vvaddint32(size_t n, const int*x, const int*y, int*z)
> > # { for (size_t i=0; i > 
> > Take RVV codegen for example:
> > 
> > Before this patch:
> > vvaddint32:
> > ble a0,zero,.L6
> > csrra4,vlenb
> > srlia6,a4,2
> > .L4:
> > mv  a5,a0
> > bleua0,a6,.L3
> > mv  a5,a6
> > .L3:
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a2)
> > vsetvli a7,zero,e32,m1,ta,ma
> > sub a0,a0,a5
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a3)
> > add a2,a2,a4
> > add a3,a3,a4
> > add a1,a1,a4
> > bne a0,zero,.L4
> > .L6:
> > ret
> > 
> > After this patch:
> > 
> > vvaddint32:
> > vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
> > vle32.v v0, (a1) # Get first vector
> >   sub a0, a0, t0 # Decrement number done
> >   slli t0, t0, 2 # Multiply number done by 4 bytes
> >   add a1, a1, t0 # Bump pointer
> > vle32.v v1, (a2) # Get second vector
> >   add a2, a2, t0 # Bump pointer
> > vadd.vv v2, v0, v1   # Sum vectors
> > vse32.v v2, (a3) # Store result
> >   add a3, a3, t0 # Bump pointer
> >   bnez a0, vvaddint32# Loop back
> >   ret# Finished
> > 
> > gcc/ChangeLog:
> > 
> > * doc/md.texi: Add SELECT_VL support.
> > * internal-fn.def (SELECT_VL): Ditto.
> > * optabs.def (OPTAB_D): Ditto.
> > * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
> > * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto.
> > * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto.
> > (vectorizable_store): Ditto.
> > (vectorizable_load): Ditto.
> > * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
> > 
> > Co-authored-by: Richard Sandiford
> > 
> > ---
> >  gcc/doc/md.texi | 22 
> >  gcc/internal-fn.def |  1 +
> >  gcc/optabs.def  |  1 +
> >  gcc/tree-vect-loop-manip.cc | 32 -
> >  gcc/tree-vect-loop.cc   | 72 +
> >  gcc/tree-vect-stmts.cc  | 66 ++
> >  gcc/tree-vectorizer.h   |  6 
> >  7 files changed, 191 insertions(+), 9 deletions(-)
> > 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 6a435eb4461..95f7fe1f802 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
> >operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
> >  @end smallexample
> >  
> > +@cindex @code{select_vl@var{m}} instruction pattern
> > +@item @code{select_vl@var{m}}
> > +Set operand 0 to the number of scalar iterations that should be handled
> > +by one iteration of a vector loop.  Operand 1 is the total number of
> > +scalar iterations that the loop needs to process and operand 2 is a
> > +maximum bound on the result (also known as the maximum ``vectorization
> > +factor'').
> > +
> > +The maximum value of operand 0 is given by:
> > +@smallexample
> > +operand0 = MIN (operand1, operand2)
> > +@end smallexample
> > +However, targets might choose a lower value than this, based on
> > +target-specific criteria.  Each iteration of the vector loop might
> > +therefore process a different number of scalar iterations, which in turn
> > +means that induction variables will have a variable step.  Because of
> > +this, it is generally not useful to define 

Re: Re: [PATCH V3] VECT: Add SELECT_VL support

2023-06-07 Thread juzhe.zh...@rivai.ai
Hi, Richi. Since SELECT_VL only apply on single-rgroup (ncopies == 1 && vec_num 
== 1)
Should I make SELECT_VL stuff out side the loop?

for (i = 0; i < vec_num; i++)
  for (j = 0; j < ncopies; j++)


Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-07 15:41
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V3] VECT: Add SELECT_VL support
On Mon, 5 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Co-authored-by: Richard Sandiford
> 
> This patch address comments from Richard and rebase to trunk.
> 
> This patch is adding SELECT_VL middle-end support
> allow target have target dependent optimization in case of
> length calculation.
> 
> This patch is inspired by RVV ISA and LLVM:
> https://reviews.llvm.org/D99750
> 
> The SELECT_VL is same behavior as LLVM "get_vector_length" with
> these following properties:
> 
> 1. Only apply on single-rgroup.
> 2. non SLP.
> 3. adjust loop control IV.
> 4. adjust data reference IV.
> 5. allow non-vf elements processing in non-final iteration
> 
> Code:
># void vvaddint32(size_t n, const int*x, const int*y, int*z)
> # { for (size_t i=0; i 
> Take RVV codegen for example:
> 
> Before this patch:
> vvaddint32:
> ble a0,zero,.L6
> csrra4,vlenb
> srlia6,a4,2
> .L4:
> mv  a5,a0
> bleua0,a6,.L3
> mv  a5,a6
> .L3:
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v2,0(a1)
> vle32.v v1,0(a2)
> vsetvli a7,zero,e32,m1,ta,ma
> sub a0,a0,a5
> vadd.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a3)
> add a2,a2,a4
> add a3,a3,a4
> add a1,a1,a4
> bne a0,zero,.L4
> .L6:
> ret
> 
> After this patch:
> 
> vvaddint32:
> vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
> vle32.v v0, (a1) # Get first vector
>   sub a0, a0, t0 # Decrement number done
>   slli t0, t0, 2 # Multiply number done by 4 bytes
>   add a1, a1, t0 # Bump pointer
> vle32.v v1, (a2) # Get second vector
>   add a2, a2, t0 # Bump pointer
> vadd.vv v2, v0, v1   # Sum vectors
> vse32.v v2, (a3) # Store result
>   add a3, a3, t0 # Bump pointer
>   bnez a0, vvaddint32# Loop back
>   ret# Finished
> 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add SELECT_VL support.
> * internal-fn.def (SELECT_VL): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto.
> * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto.
> (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
> 
> Co-authored-by: Richard Sandiford
> 
> ---
>  gcc/doc/md.texi | 22 
>  gcc/internal-fn.def |  1 +
>  gcc/optabs.def  |  1 +
>  gcc/tree-vect-loop-manip.cc | 32 -
>  gcc/tree-vect-loop.cc   | 72 +
>  gcc/tree-vect-stmts.cc  | 66 ++
>  gcc/tree-vectorizer.h   |  6 
>  7 files changed, 191 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 6a435eb4461..95f7fe1f802 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>  @end smallexample
>  
> +@cindex @code{select_vl@var{m}} instruction pattern
> +@item @code{select_vl@var{m}}
> +Set operand 0 to the number of scalar iterations that should be handled
> +by one iteration of a vector loop.  Operand 1 is the total number of
> +scalar iterations that the loop needs to process and operand 2 is a
> +maximum bound on the result (also known as the maximum ``vectorization
> +factor'').
> +
> +The maximum value of operand 0 is given by:
> +@smallexample
> +operand0 = MIN (operand1, operand2)
> +@end smallexample
> +However, targets might choose a lower value than this, based on
> +target-specific criteria.  Each iteration of the vector loop might
> +therefore process a different number of scalar iterations, which in turn
> +means that induction variables will have a variable step.  Because of
> +this, it is generally not useful to define this instruction if it will
> +always calculate the maximum value.
> +
> +This optab is only useful on targets that implement @samp{len_load_@var{m}}
> +and/or @samp{len_store_@var{m}}.
> +
>  @cindex @code{check_raw_ptrs@var{m}} instruction pattern
>  @item @samp{check_raw_ptrs@var{m}}
>  Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
> diff --git a/gcc/internal-fn.def 

Re: [PATCH] mips: Fix overaligned function arguments [PR109435]

2023-06-07 Thread Jovan Dmitrovic
I see what you mean now, so I've made adjustment in order for testcase to work
on assembly. Following is the updated patch.

Regards,
Jovan

>From 2744357b5232c61bf1f780c4915d47b19d71f993 Mon Sep 17 00:00:00 2001
From: Jovan Dmitrovic 
Date: Fri, 19 May 2023 12:36:55 +0200
Subject: [PATCH] mips: Fix overaligned function arguments [PR109435]

This patch changes alignment for typedef types when passed as
arguments, making the alignment equal to the alignment of
original (aliased) types.

This change makes it impossible for a typedef type to have
alignment that is less than its size.

Signed-off-by: Jovan Dmitrovic 

gcc/ChangeLog:
PR target/109435
* config/mips/mips.cc (mips_function_arg_alignment): Returns
the alignment of function argument. In case of typedef type,
it returns the aligment of the aliased type.
(mips_function_arg_boundary): Relocated calculation of the
aligment of function arguments.

gcc/testsuite/ChangeLog:
PR target/109435
* gcc.target/mips/align-1.c: New test.
---
 gcc/config/mips/mips.cc | 19 -
 gcc/testsuite/gcc.target/mips/align-1.c | 38 +
 2 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/align-1.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index c1d1691306e..20ba35f754c 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -6190,6 +6190,23 @@ mips_arg_partial_bytes (cumulative_args_t cum, const 
function_arg_info )
   return info.stack_words > 0 ? info.reg_words * UNITS_PER_WORD : 0;
 }
 
+/* Given MODE and TYPE of a function argument, return the alignment in
+   bits.
+   In case of typedef, alignment of its original type is
+   used.  */
+
+static unsigned int
+mips_function_arg_alignment (machine_mode mode, const_tree type)
+{
+  if (!type)
+return GET_MODE_ALIGNMENT (mode);
+
+  if (is_typedef_decl (TYPE_NAME (type)))
+type = DECL_ORIGINAL_TYPE (TYPE_NAME (type));
+
+  return TYPE_ALIGN (type);
+}
+
 /* Implement TARGET_FUNCTION_ARG_BOUNDARY.  Every parameter gets at
least PARM_BOUNDARY bits of alignment, but will be given anything up
to STACK_BOUNDARY bits if the type requires it.  */
@@ -6198,8 +6215,8 @@ static unsigned int
 mips_function_arg_boundary (machine_mode mode, const_tree type)
 {
   unsigned int alignment;
+  alignment = mips_function_arg_alignment (mode, type);
 
-  alignment = type ? TYPE_ALIGN (type) : GET_MODE_ALIGNMENT (mode);
   if (alignment < PARM_BOUNDARY)
 alignment = PARM_BOUNDARY;
   if (alignment > STACK_BOUNDARY)
diff --git a/gcc/testsuite/gcc.target/mips/align-1.c 
b/gcc/testsuite/gcc.target/mips/align-1.c
new file mode 100644
index 000..5c639bee274
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/align-1.c
@@ -0,0 +1,38 @@
+/* Check that typedef alignment does not affect passing of function
+   parameters. */
+/* { dg-do compile { target { "mips*-*-linux*" } } } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
+
+#include 
+
+typedef struct ui8
+{
+  unsigned v[8];
+} uint8 __attribute__ ((aligned(64)));
+
+unsigned
+callee (int x, uint8 a)
+{
+  return a.v[0];
+}
+
+uint8
+identity (uint8 in)
+{
+  return in;
+}
+
+int
+main (void)
+{
+  uint8 vec = {{1, 2, 3, 4, 5, 6, 7, 8}};
+  uint8 temp = identity (vec);
+  unsigned temp2 = callee (1, identity (vec));
+  assert (callee (1, temp) == 1);
+  assert (temp2 == 1);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "\tsd\t\\\$5,0\\(\\\$\[0-9\]\\)" } } */
+/* { dg-final { scan-assembler "\tsd\t\\\$6,8\\(\\\$\[0-9\]\\)" } } */
+/* { dg-final { scan-assembler "\tsd\t\\\$7,16\\(\\\$\[0-9\]\\)" } } */
-- 
2.34.1




--
YunQiang Su


  1   2   >