RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-28 Thread Li, Pan2 via Gcc-patches
Thanks Jeff for comments.

It makes sense to me. For the EQ operator we should have CONSTM1. Does this 
mean s390 parts has similar issue here? Then for instructions like VMSEQ, we 
need to adjust the simplify_rtx up to a point.

Please help to correct me if any mistake. Thank you again.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, April 29, 2023 5:48 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, Yanzhang 

Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/28/23 09:21, Pan Li via Gcc-patches wrote:
> From: Pan Li 
> 
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMSET.
> 
> This PATCH allows the eq, le, leu, ge, geu to perform such kind of the 
> simplification by adding one macro in riscv for simplify rtx.
> 
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) 
> {
>return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> 
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v8,0(a1)
> vmseq.vv v8,v8,v8
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.vv8,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,m8,ta,ma
> vmset.m v1  <- optimized to vmset.m
> vsetvli a5,zero,e8,m8,ta,ma
> vsm.v   v1,0(a0)
> ret
> 
> As above, we may have one instruction eliminated and require less 
> vector registers.
> 
> Signed-off-by: Pan Li 
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
> consumed by simplify_rtx.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
> Adjust test check condition.
I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My understanding 
is we have one output bit per input element in the comparison.  So unless the 
number of elements matches the bit width of the mask register, this isn't going 
to work.

Am I missing something?

Jeff




Re: [PATCH] tree-ssa-sink: Improve code sinking pass.

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/16/23 07:20, Ajit Agarwal wrote:

Hello All:

This patch improves code sinking pass to sink the blocks before calls
in the use blocks or immediate dominator blocks that reduces register pressure.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

tree-ssa-sink: Improve code sinking pass.

Code Sinking sinks the blocks after call. This increases
register pressure for callee-saved registers. Improves
code sinking before call in the use blocks or immediate
dominator of use blocks.

2023-04-16  Ajit Kumar Agarwal  

gcc/ChangeLog:

* tree-ssa-sink.cc (statement_sink_location): Modifed to
move statements before calls.
(block_call_p): New function.
(def_use_same_block): New function.
(select_best_block): Add heuristics to select the best
blocks in the immediate post dominator.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
* gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
---


  
+/* Check def and use stmts are in same block.  */

A better function comment would be
/* Return TRUE if all the immediate uses of the defs in
   USE occur in the same block as USE, FALSE otherwise.  */

I would also strongly suggest you change "use" to something else.  This 
function is walking over uses and defs, so calling the incoming argument 
"use" is going to make it excessively hard to write clean comments for 
this function.  Something as simple as "stmt" would be considerably better.





+
+bool
+def_use_same_block (gimple *use)
+{
+  use_operand_p use_p;
+  def_operand_p def_p;
+  imm_use_iterator imm_iter;
+  ssa_op_iter iter;
+
+  FOR_EACH_SSA_DEF_OPERAND (def_p, use, iter, SSA_OP_DEF)
+{
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, DEF_FROM_PTR (def_p))
+   {
+ if (is_gimple_debug (USE_STMT (use_p)))
+   continue;
+
+ if (use_p
+ && (gimple_bb(USE_STMT (use_p)) == gimple_bb (use)))
Minor whitespace problems.  Make sure to include a space between the 
function name you are calling and the open parenthesis for the 
arguments.  ie gimple_bb (USE_STMT (use_p)).


It also seems like you're not checking all the uses, just one of them? 
Is that what you intended?  If so, then my suggested function comment is 
wrong and needs further adjustment.  This highlights how important it is 
to have a good function comment.




+/* Check if the block has only calls.  */
This comment doesn't match the code.  It appears that you can have both 
calls and conditional branches.  Please update the function comment 
appropriately.  You should also describe the arguments and return value 
in the function comment (see my suggestion above as an example for how 
to describe the function arguments and return value.


Based on the code it looks like you're requiring a the block to contain 
only two real statements.  A call followed by a conditional.




+
+bool
+block_call_p (basic_block bb)
+{
+  int i = 0;
+  bool is_call = false;
+  gimple_stmt_iterator gsi = gsi_last_bb (bb);
+  gimple *last_stmt = gsi_stmt (gsi);
ISTM there is likely a function that will give you the last statement in 
the function.



+
+  if (last_stmt && gimple_code (last_stmt) == GIMPLE_COND)
+{
+  if (!gsi_end_p (gsi))
+   gsi_prev ();
+
+   for (; !gsi_end_p (gsi);)
+{
+  gimple *stmt = gsi_stmt (gsi);
+
+  if (is_gimple_debug (stmt))
+return false;
Definitely incorrect as this can cause the decisions we make for 
optimization to change based on the existence of debug statements.




+
+  if (is_gimple_call (stmt))
+is_call = true;
+  else
+return false;
ISTM that this might be better/clearer.  Once you've seen a call, if you 
see another, you can just return immediately.  It also seems like if I 
ever has a value other than 0/1, then you can return false immediately.


  if (is_gimple_call (stmt))
{
  /* We have already seen a call.  */
  if (is_call)
return false;
  is_call = true;
  continue;
}


+
+  if (!gsi_end_p (gsi))
+gsi_prev ();
+
+   ++i;
Isn't this going to cause this routine to return false if it has (for 
example) one or more labels followed by a CALL, then a conditional?



Overall I think the logic in here needs a bit of work.



@@ -190,7 +254,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
  static basic_block
  select_best_block (basic_block early_bb,
   basic_block late_bb,
-  gimple *stmt)
+  gimple *stmt,
+  gimple *use = 0)
Rather than use a default value, just fix the callers.  There's only 3 
and you already fixed one :-)  And if you're going to initialize a 
pointer, use NULL rather than 0.






  {
basic_block best_bb = late_bb;
basic_block temp_bb = late_bb;
@@ -230,7 +295,28 @@ 

Re: [PATCH v4 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

2023-04-28 Thread Hans-Peter Nilsson
On Fri, 28 Apr 2023, Jeff Law wrote:
> On 4/28/23 16:42, Hans-Peter Nilsson wrote:
> > On Sat, 22 Apr 2023, Ajit Agarwal via Gcc-patches wrote:
> > I don't see anything in those functions that checks if
> > ZERO_EXTEND is actually a feature of the ABI, e.g. as opposed to
> > no extension or SIGN_EXTEND.  Do I miss something?
> I don't think you missed anything.  That was one of the points I was making
> last week.  Somewhere, somehow we need to describe what the ABI mandates and
> guarantees.

Right, I thought this was the new version.

> So while what Ajit has done is a step forward, at some point the actual
> details of the ABI need to be described in a way that can be checked and
> consumed by REE.

IIRC I also commented and suggested a few target macros that 
*should* have helped to that effect.  Ajit, I suggest you see my 
previous reply in this or a related conversation.

brgds, H-P


[PATCH] target: [PR109657] (a ? -1 : 0) | b could be optimized better for aarch64

2023-04-28 Thread Andrew Pinski via Gcc-patches
There is no canonical form for this case defined. So the aarch64 backend needs
a pattern to match both of these forms.

The forms are:
(set (reg/i:SI 0 x0)
(if_then_else:SI (eq (reg:CC 66 cc)
(const_int 0 [0]))
(reg:SI 97)
(const_int -1 [0x])))
and
(set (reg/i:SI 0 x0)
(ior:SI (neg:SI (ne:SI (reg:CC 66 cc)
(const_int 0 [0])))
(reg:SI 102)))

Currently the aarch64 backend matches the first form so this
patch adds a insn_and_split to match the second form and
convert it to the first form.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions

PR target/109657

gcc/ChangeLog:

* config/aarch64/aarch64.md (*cmov_insn_m1): New
insn_and_split pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/csinv-2.c: New test.
---
 gcc/config/aarch64/aarch64.md  | 20 +
 gcc/testsuite/gcc.target/aarch64/csinv-2.c | 26 ++
 2 files changed, 46 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/csinv-2.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e1a2b265b20..57fe5601350 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4194,6 +4194,26 @@ (define_insn "*cmovsi_insn_uxtw"
   [(set_attr "type" "csel, csel, csel, csel, csel, mov_imm, mov_imm")]
 )
 
+;; There are two canonical forms for `cmp ? -1 : a`.
+;; This is the second form and is here to help combine.
+;; Support `-(cmp) | a` into `cmp ? -1 : a` to be canonical in the backend.
+(define_insn_and_split "*cmov_insn_m1"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(ior:GPI
+(neg:GPI
+ (match_operator:GPI 1 "aarch64_comparison_operator"
+  [(match_operand 2 "cc_register" "") (const_int 0)]))
+(match_operand 3 "register_operand" "r")))]
+  ""
+  "#"
+  "&& true"
+  [(set (match_dup 0)
+   (if_then_else:GPI (match_dup 1)
+(const_int -1) (match_dup 3)))]
+  {}
+  [(set_attr "type" "csel")]
+)
+
 (define_insn "*cmovdi_insn_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=r")
(if_then_else:DI
diff --git a/gcc/testsuite/gcc.target/aarch64/csinv-2.c 
b/gcc/testsuite/gcc.target/aarch64/csinv-2.c
new file mode 100644
index 000..89132acb713
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csinv-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* PR target/109657: (a ? -1 : 0) | b could be better */
+
+/* Both functions should have the same assembly of:
+   cmp w1, 0
+   csinv   w0, w0, wzr, eq
+
+   We should not get:
+   cmp w1, 0
+   csetm   w1, ne
+   orr w0, w1, w0
+ */
+/* { dg-final { scan-assembler-times "csinv\tw\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-not "csetm\tw\[0-9\]" } } */
+unsigned b(unsigned a, unsigned b)
+{
+  if(b)
+return -1;
+  return a;
+}
+unsigned b1(unsigned a, unsigned b)
+{
+unsigned t = b ? -1 : 0;
+return a | t;
+}
-- 
2.31.1



Re: [PATCH v4 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/28/23 16:42, Hans-Peter Nilsson wrote:

On Sat, 22 Apr 2023, Ajit Agarwal via Gcc-patches wrote:


Hello All:

This new version of patch 4 use improve ree pass for rs6000 target using 
defined ABI interfaces.
Bootstrapped and regtested on power64-linux-gnu.

Thanks & Regards
Ajit


ree: Improve ree pass for rs6000 target using defined abi interfaces

 For rs6000 target we see redundant zero and sign
 extension and done to improve ree pass to eliminate
 such redundant zero and sign extension using defines
 ABI interfaces.

 2023-04-22  Ajit Kumar Agarwal  

gcc/ChangeLog:

 * ree.cc (combline_reaching_defs): Add zero_extend
 using defined abi interfaces.
 (add_removable_extension): use of defined abi interfaces
 for no reaching defs.
 (abi_extension_candidate_return_reg_p): New defined ABI function.
 (abi_extension_candidate_p): New defined ABI function.
 (abi_extension_candidate_argno_p): New defined ABI function.
 (abi_handle_regs_without_defs_p): New defined ABI function.

gcc/testsuite/ChangeLog:

 * g++.target/powerpc/zext-elim-3.C
---
  gcc/ree.cc| 176 +++---
  .../g++.target/powerpc/zext-elim-3.C  |  16 ++
  2 files changed, 162 insertions(+), 30 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index 413aec7c8eb..0de96b1ece1 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -473,7 +473,8 @@ get_defs (rtx_insn *insn, rtx reg, vec *dest)
break;
  }
  
-  gcc_assert (use != NULL);

+  if (use == NULL)
+return NULL;
  
ref_chain = DF_REF_CHAIN (use);
  
@@ -514,7 +515,8 @@ get_uses (rtx_insn *insn, rtx reg)

  if (REGNO (DF_REF_REG (def)) == REGNO (reg))
break;
  
-  gcc_assert (def != NULL);

+  if (def == NULL)
+return NULL;
  
ref_chain = DF_REF_CHAIN (def);
  
@@ -750,6 +752,103 @@ get_extended_src_reg (rtx src)

return src;
  }
  
+/* Return TRUE if the candidate insn is zero extend and regno is

+   an return  registers.  */
+
+static bool
+abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
+return false;
+
+  if (FUNCTION_VALUE_REGNO_P (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if reg source operand of zero_extend is argument registers
+   and not return registers and source and destination operand are same
+   and mode of source and destination operand are not same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
+return false;
+
+  machine_mode ext_dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set),0);
+
+  bool copy_needed
+= (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
+
+  if (!copy_needed && ext_dst_mode != GET_MODE (orig_src)
+  && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an argument registers.  */
+
+static bool
+abi_extension_candidate_argno_p (rtx_code code, int regno)
+{
+  if (code !=  ZERO_EXTEND)
+return false;
+
+  if (FUNCTION_ARG_REGNO_P (regno))
+return true;
+
+  return false;
+}


I don't see anything in those functions that checks if
ZERO_EXTEND is actually a feature of the ABI, e.g. as opposed to
no extension or SIGN_EXTEND.  Do I miss something?
I don't think you missed anything.  That was one of the points I was 
making last week.  Somewhere, somehow we need to describe what the ABI 
mandates and guarantees.


So while what Ajit has done is a step forward, at some point the actual 
details of the ABI need to be described in a way that can be checked and 
consumed by REE.


Jeff


[PATCH 3/3] OpenMP: Fortran support for imperfectly-nested loops

2023-04-28 Thread Sandra Loosemore
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

In the Fortran front end, most of the semantic processing happens during
the translation phase, so the parse phase just collects the intervening
statements, checks them for errors, and splices them around the loop body.

gcc/fortran/ChangeLog
* openmp.cc: Include omp-api.h.
(resolve_omp_clauses): Consolidate inscan reduction clause conflict
checking here.
(find_nested_loop_in_chain): New.
(find_nested_loop_in_block): New.
(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse properly.
Handle imperfectly-nested loops when looking for nested omp scan.
Refactor to move inscan reduction clause conflict checking to
resolve_omp_clauses.
(gfc_resolve_do_iterator): Handle imperfectly-nested loops.
(struct icode_error_state): New.
(icode_code_error_callback): New.
(icode_expr_error_callback): New.
(diagnose_intervening_code_errors_1): New.
(diagnose_intervening_code_errors): New.
(restructure_intervening_code): New.
(is_outer_iteration_variable): Do not assume loops are perfectly
nested.
(expr_is_invariant): Likewise.
(resolve_omp_do): Handle imperfectly-nested loops.

gcc/testsuite/ChangeLog
* gfortran.dg/gomp/collapse1.f90: Adjust expected errors.
* gfortran.dg/gomp/collapse2.f90: Likewise.
* gfortran.dg/gomp/imperfect1.f90: New.
* gfortran.dg/gomp/imperfect2.f90: New.
* gfortran.dg/gomp/imperfect3.f90: New.
* gfortran.dg/gomp/imperfect4.f90: New.
* gfortran.dg/gomp/imperfect5.f90: New.

libgomp/ChangeLog
* testsuite/libgomp.fortran/imperfect-destructor.f90: New.
* testsuite/libgomp.fortran/imperfect1.f90: New.
* testsuite/libgomp.fortran/imperfect2.f90: New.
* testsuite/libgomp.fortran/imperfect3.f90: New.
* testsuite/libgomp.fortran/imperfect4.f90: New.
* testsuite/libgomp.fortran/offload-imperfect1.f90: New.
* testsuite/libgomp.fortran/offload-imperfect2.f90: New.
* testsuite/libgomp.fortran/offload-imperfect3.f90: New.
* testsuite/libgomp.fortran/offload-imperfect4.f90: New.
---
 gcc/fortran/openmp.cc | 590 +++---
 gcc/testsuite/gfortran.dg/gomp/collapse1.f90  |   6 +-
 gcc/testsuite/gfortran.dg/gomp/collapse2.f90  |  10 +-
 gcc/testsuite/gfortran.dg/gomp/imperfect1.f90 |  39 ++
 gcc/testsuite/gfortran.dg/gomp/imperfect2.f90 |  56 ++
 gcc/testsuite/gfortran.dg/gomp/imperfect3.f90 |  29 +
 gcc/testsuite/gfortran.dg/gomp/imperfect4.f90 |  36 ++
 gcc/testsuite/gfortran.dg/gomp/imperfect5.f90 |  67 ++
 .../libgomp.fortran/imperfect-destructor.f90  | 142 +
 .../testsuite/libgomp.fortran/imperfect1.f90  |  67 ++
 .../testsuite/libgomp.fortran/imperfect2.f90  | 102 +++
 .../testsuite/libgomp.fortran/imperfect3.f90  | 110 
 .../testsuite/libgomp.fortran/imperfect4.f90  | 121 
 .../libgomp.fortran/offload-imperfect1.f90|  72 +++
 .../libgomp.fortran/offload-imperfect2.f90| 110 
 .../libgomp.fortran/offload-imperfect3.f90| 116 
 .../libgomp.fortran/offload-imperfect4.f90| 126 
 17 files changed, 1697 insertions(+), 102 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect5.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/imperfect-destructor.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/imperfect1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/imperfect2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/imperfect3.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/imperfect4.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/offload-imperfect1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/offload-imperfect2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/offload-imperfect3.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/offload-imperfect4.f90

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 86e451531a6..25c4c07138d 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gomp-constants.h"
 #include "target-memory.h"  /* For gfc_encode_character.  */
 #include "bitmap.h"
+#include "omp-api.h"  /* For omp_runtime_api_procname.  */
 
 
 static gfc_statement omp_code_to_statement (gfc_code *);
@@ -7209,15 +7210,24 @@ resolve_omp_clauses 

[PATCH 2/3] OpenMP: C++ support for imperfectly-nested loops

2023-04-28 Thread Sandra Loosemore
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

This patch changes the C++ front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an
iterative approach, in order to preserve proper nesting of compound
statements.  Preserving cleanups (destructors) for class objects
declared in intervening code and loop initializers complicates moving
the former into the body of the loop; this is handled by parsing the
entire construct before reassembling any of it.

gcc/cp/ChangeLog
* cp-tree.h (cp_convert_omp_range_for): Adjust declaration.
* parser.cc (struct omp_for_parse_data): New.
(cp_parser_postfix_expression): Diagnose calls to OpenMP runtime
in intervening code.
(cp_parser_statement_seq_opt): Special-case nested OMP loops and
blocks in intervening code.
(cp_parser_iteration_statement): Reject loops in intervening code.
(cp_parser_omp_for_loop_init): Expand comments and tweak the
interface slightly to better distinguish input/output parameters.
(cp_parser_omp_range_for): Likewise.
(cp_convert_omp_range_for): Likewise.
(cp_parser_omp_loop_nest): New, split from cp_parser_omp_for_loop
and largely rewritten.  Add more comments.
(struct sit_data, substitute_in_tree_walker, substitute_in_tree):
New.
(fixup_blocks_walker): New.
(cp_parser_omp_for_loop): Rewrite to use recursive descent instead
of a loop.  Add logic to reshuffle the bits of code collected
during parsing so intervening code gets moved to the loop body.
(cp_parser_omp_loop): Remove call to finish_omp_for_block, which
is now redundant.
(cp_parser_omp_simd): Likewise.
(cp_parser_omp_for): Likewise.
(cp_parser_omp_distribute): Likewise.
(cp_parser_oacc_loop): Likewise.
(cp_parser_omp_taskloop): Likewise.
(cp_parser_pragma): Reject OpenMP pragmas in intervening code.
* parser.h (struct cp_parser): Add omp_for_parse_state field.
* pt.cc (tsubst_omp_for_iterator): Adjust call to
cp_convert_omp_range_for.
* semantics.cc (struct fofb_data, finish_omp_for_block_walker): New.
(finish_omp_for_block): Allow variables to be bound in a BIND_EXPR
nested inside BIND instead of directly in BIND itself.

gcc/testsuite/ChangeLog
* g++.dg/gomp/pr41967.C: Adjust expected error messages.

libgomp/ChangeLog
* testsuite/libgomp.c++/imperfect-class-1.C : New.
* testsuite/libgomp.c++/imperfect-class-2.C : New.
* testsuite/libgomp.c++/imperfect-class-3.C : New.
* testsuite/libgomp.c++/imperfect-destructor.C : New.
* testsuite/libgomp.c++/imperfect-template-1.C : New.
* testsuite/libgomp.c++/imperfect-template-2.C : New.
* testsuite/libgomp.c++/imperfect-template-3.C : New.
---
 gcc/cp/cp-tree.h  |2 +-
 gcc/cp/parser.cc  | 1180 -
 gcc/cp/parser.h   |3 +
 gcc/cp/pt.cc  |3 +-
 gcc/cp/semantics.cc   |   80 +-
 gcc/testsuite/g++.dg/gomp/pr41967.C   |2 +-
 .../testsuite/libgomp.c++/imperfect-class-1.C |  169 +++
 .../testsuite/libgomp.c++/imperfect-class-2.C |  167 +++
 .../testsuite/libgomp.c++/imperfect-class-3.C |  167 +++
 .../libgomp.c++/imperfect-destructor.C|  135 ++
 .../libgomp.c++/imperfect-template-1.C|  172 +++
 .../libgomp.c++/imperfect-template-2.C|  170 +++
 .../libgomp.c++/imperfect-template-3.C|  170 +++
 13 files changed, 2021 insertions(+), 399 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c++/imperfect-class-1.C
 create mode 100644 libgomp/testsuite/libgomp.c++/imperfect-class-2.C
 create mode 100644 libgomp/testsuite/libgomp.c++/imperfect-class-3.C
 create mode 100644 libgomp/testsuite/libgomp.c++/imperfect-destructor.C
 create mode 100644 libgomp/testsuite/libgomp.c++/imperfect-template-1.C
 create mode 100644 libgomp/testsuite/libgomp.c++/imperfect-template-2.C
 create mode 100644 libgomp/testsuite/libgomp.c++/imperfect-template-3.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c9c4cd6f32f..90d369e4f65 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7279,7 +7279,7 @@ extern bool maybe_clone_body  (tree);
 /* In parser.cc */
 extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool,
  unsigned short);
-extern void cp_convert_omp_range_for (tree &, vec *, tree &,
+extern void cp_convert_omp_range_for (tree &, tree &, tree &,
  tree &, tree &, 

[PATCH 1/3] OpenMP: C support for imperfectly-nested loops

2023-04-28 Thread Sandra Loosemore
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

This patch changes the C front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an iterative
approach, in order to preserve proper nesting of compound statements.

gcc/c/ChangeLog
* c-parser.cc (struct c_parser): Add omp_for_parse_state field.
(struct omp_for_parse_data): New.
(c_parser_compound_statement_nostart): Special-case nested
OMP loops and blocks in intervening code.
(c_parser_while_statement): Reject in intervening code.
(c_parser_do_statement): Likewise.
(c_parser_for_statement): Likewise.
(c_parser_postfix_expression_after_primary): Reject calls to OMP
runtime routines in intervening code.
(c_parser_pragma): Reject OMP pragmas in intervening code.
(c_parser_omp_loop_nest): New, split from c_parser_omp_for_loop.
(c_parser_omp_for_loop): Rewrite to use recursive descent and
generalize handling for intervening code.

gcc/ChangeLog
* omp-api.h: New file.
* omp-general.cc (omp_runtime_api_procname): New.
(omp_runtime_api_call): Moved here from omp-low.cc, and make
non-static.
* omp-general.h: Include omp-api.h.
* omp-low.cc (omp_runtime_api_call): Delete this copy.

gcc/testsuite/ChangeLog
* c-c++-common/goacc/collapse-1.c: Adjust expected error messages.
* c-c++-common/goacc/tile-2.c: Likewise.
* c-c++-common/gomp/imperfect1.c: New.
* c-c++-common/gomp/imperfect2.c: New.
* c-c++-common/gomp/imperfect3.c: New.
* c-c++-common/gomp/imperfect4.c: New.
* c-c++-common/gomp/imperfect5.c: New.
* gcc.dg/gomp/collapse-1.c: Adjust expected error messages.

libgomp/ChangeLog
* testsuite/libgomp.c-c++-common/imperfect1.c: New.
* testsuite/libgomp.c-c++-common/imperfect2.c: New.
* testsuite/libgomp.c-c++-common/imperfect3.c: New.
* testsuite/libgomp.c-c++-common/imperfect4.c: New.
* testsuite/libgomp.c-c++-common/imperfect5.c: New.
* testsuite/libgomp.c-c++-common/imperfect6.c: New.
* testsuite/libgomp.c-c++-common/offload-imperfect1.c: New.
* testsuite/libgomp.c-c++-common/offload-imperfect2.c: New.
* testsuite/libgomp.c-c++-common/offload-imperfect3.c: New.
* testsuite/libgomp.c-c++-common/offload-imperfect4.c: New.
---
 gcc/c/c-parser.cc | 692 +++---
 gcc/omp-api.h |  32 +
 gcc/omp-general.cc| 134 
 gcc/omp-general.h |   1 +
 gcc/omp-low.cc| 129 
 gcc/testsuite/c-c++-common/goacc/collapse-1.c |  14 +-
 gcc/testsuite/c-c++-common/goacc/tile-2.c |   4 +-
 gcc/testsuite/c-c++-common/gomp/imperfect1.c  |  40 +
 gcc/testsuite/c-c++-common/gomp/imperfect2.c  |  36 +
 gcc/testsuite/c-c++-common/gomp/imperfect3.c  |  35 +
 gcc/testsuite/c-c++-common/gomp/imperfect4.c  |  35 +
 gcc/testsuite/c-c++-common/gomp/imperfect5.c  |  59 ++
 gcc/testsuite/gcc.dg/gomp/collapse-1.c|  10 +-
 .../libgomp.c-c++-common/imperfect1.c |  76 ++
 .../libgomp.c-c++-common/imperfect2.c | 114 +++
 .../libgomp.c-c++-common/imperfect3.c | 119 +++
 .../libgomp.c-c++-common/imperfect4.c | 117 +++
 .../libgomp.c-c++-common/imperfect5.c |  49 ++
 .../libgomp.c-c++-common/imperfect6.c | 115 +++
 .../libgomp.c-c++-common/offload-imperfect1.c |  81 ++
 .../libgomp.c-c++-common/offload-imperfect2.c | 122 +++
 .../libgomp.c-c++-common/offload-imperfect3.c | 125 
 .../libgomp.c-c++-common/offload-imperfect4.c | 122 +++
 23 files changed, 1870 insertions(+), 391 deletions(-)
 create mode 100644 gcc/omp-api.h
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect5.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect3.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect4.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect5.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect6.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/offload-imperfect1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/offload-imperfect2.c
 create mode 100644 

[PATCH 0/3] OpenMP: Support imperfectly-nested loops

2023-04-28 Thread Sandra Loosemore
OpenMP 5.0 removed the restriction that collapsed loops in "omp
for/do" and related constructs must be perfectly nested; it now allows
intervening code to appear before/after each nested loop level.  The
spec allows implementations considerable freedom in how many times this
intervening code is executed, and it seemed to me that the simplest
solution was to push it all into the loop body so that it is executed
on every logical iteration.  Implementing this in the respective front
ends means that no changes are required in the OMP_FOR representation
or in subsequent gimplification and lowering passes.

For C and C++, I refactored the OMP_FOR parsing code to use recursive
descent and parallel the revised syntax in the OpenMP spec for
canonical loop nest form.  In C this was relatively straightforward,
but in C++ the code that handles nested scopes for class iterators
gave me fits.  I ended up parsing the entire construct and then
reassembling the bits of code, and I tried to make this at least
better-documented than the code that was previously there.  In the
Fortran front end, I implemented the transformation during resolution.

This patch series has some overlap with Frederik Harwath's tile/unroll
patch set, but has been developed and tested independently of those changes.

https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614564.html

Depending on what order they are reviewed/committed in, we will of
course take care of merging and retesting them together.  Frederik has
already reviewed my patches and gave me some very helpful feedback,
and Tobias also has helped out, especially when I got stuck on some
unrelated bugs.

-Sandra


Sandra Loosemore (3):
  OpenMP: C support for imperfectly-nested loops
  OpenMP: C++ support for imperfectly-nested loops
  OpenMP: Fortran support for imperfectly-nested loops

 gcc/c/c-parser.cc |  692 ++
 gcc/cp/cp-tree.h  |2 +-
 gcc/cp/parser.cc  | 1180 -
 gcc/cp/parser.h   |3 +
 gcc/cp/pt.cc  |3 +-
 gcc/cp/semantics.cc   |   80 +-
 gcc/fortran/openmp.cc |  590 +++--
 gcc/omp-api.h |   32 +
 gcc/omp-general.cc|  134 ++
 gcc/omp-general.h |1 +
 gcc/omp-low.cc|  129 --
 gcc/testsuite/c-c++-common/goacc/collapse-1.c |   14 +-
 gcc/testsuite/c-c++-common/goacc/tile-2.c |4 +-
 gcc/testsuite/c-c++-common/gomp/imperfect1.c  |   40 +
 gcc/testsuite/c-c++-common/gomp/imperfect2.c  |   36 +
 gcc/testsuite/c-c++-common/gomp/imperfect3.c  |   35 +
 gcc/testsuite/c-c++-common/gomp/imperfect4.c  |   35 +
 gcc/testsuite/c-c++-common/gomp/imperfect5.c  |   59 +
 gcc/testsuite/g++.dg/gomp/pr41967.C   |2 +-
 gcc/testsuite/gcc.dg/gomp/collapse-1.c|   10 +-
 gcc/testsuite/gfortran.dg/gomp/collapse1.f90  |6 +-
 gcc/testsuite/gfortran.dg/gomp/collapse2.f90  |   10 +-
 gcc/testsuite/gfortran.dg/gomp/imperfect1.f90 |   39 +
 gcc/testsuite/gfortran.dg/gomp/imperfect2.f90 |   56 +
 gcc/testsuite/gfortran.dg/gomp/imperfect3.f90 |   29 +
 gcc/testsuite/gfortran.dg/gomp/imperfect4.f90 |   36 +
 gcc/testsuite/gfortran.dg/gomp/imperfect5.f90 |   67 +
 .../testsuite/libgomp.c++/imperfect-class-1.C |  169 +++
 .../testsuite/libgomp.c++/imperfect-class-2.C |  167 +++
 .../testsuite/libgomp.c++/imperfect-class-3.C |  167 +++
 .../libgomp.c++/imperfect-destructor.C|  135 ++
 .../libgomp.c++/imperfect-template-1.C|  172 +++
 .../libgomp.c++/imperfect-template-2.C|  170 +++
 .../libgomp.c++/imperfect-template-3.C|  170 +++
 .../libgomp.c-c++-common/imperfect1.c |   76 ++
 .../libgomp.c-c++-common/imperfect2.c |  114 ++
 .../libgomp.c-c++-common/imperfect3.c |  119 ++
 .../libgomp.c-c++-common/imperfect4.c |  117 ++
 .../libgomp.c-c++-common/imperfect5.c |   49 +
 .../libgomp.c-c++-common/imperfect6.c |  115 ++
 .../libgomp.c-c++-common/offload-imperfect1.c |   81 ++
 .../libgomp.c-c++-common/offload-imperfect2.c |  122 ++
 .../libgomp.c-c++-common/offload-imperfect3.c |  125 ++
 .../libgomp.c-c++-common/offload-imperfect4.c |  122 ++
 .../libgomp.fortran/imperfect-destructor.f90  |  142 ++
 .../testsuite/libgomp.fortran/imperfect1.f90  |   67 +
 .../testsuite/libgomp.fortran/imperfect2.f90  |  102 ++
 .../testsuite/libgomp.fortran/imperfect3.f90  |  110 ++
 .../testsuite/libgomp.fortran/imperfect4.f90  |  121 ++
 .../libgomp.fortran/offload-imperfect1.f90|   72 +
 .../libgomp.fortran/offload-imperfect2.f90|  110 ++
 .../libgomp.fortran/offload-imperfect3.f90|  116 ++
 .../libgomp.fortran/offload-imperfect4.f90|  126 ++
 53 files changed, 5588 insertions(+), 892 deletions(-)
 create mode 100644 gcc/omp-api.h
 create mode 100644 

Re: [PATCH] testsuite: Handle empty assembly lines in check-function-bodies

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/28/23 09:58, Hans-Peter Nilsson via Gcc-patches wrote:

Ok to commit?
-- >8 --
I tried to make use of check-function-bodies for cris-elf and was a
bit surprised to see it failing.  There's a deliberate empty line
after the filled delay slot of the return-function which was
mishandled.  I thought "aha" and tried to add an empty line
(containing just a "**" prefix) to the match, but that didn't help.
While it was added as input from the function's assembly output
to-be-matched like any other line, it couldn't be matched: I had to
use "...", which works but is...distracting.

Some digging shows that an empty assembly line can't be deliberately
matched because all matcher lines (lines starting with the prefix,
the ubiquitous "**") are canonicalized by trimming leading
whitespace (the "string trim" in check-function-bodies) and instead
adding a leading TAB character, thus empty lines end up containing
just a TAB.  For usability it's better to treat empty lines as fluff
than to uglifying the test-case and the code to properly match them.
Double-checking, no test-case tries to match an line containing just
TAB (by providing an a line containing just "**\s*", i.e. zero or
more whitespace characters).

* lib/scanasm.exp (parse_function_bodies): Set fluff to include
empty lines (besides optionally leading whitespace).

OK
jeff


Re: [PATCH v4 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

2023-04-28 Thread Hans-Peter Nilsson
On Sat, 22 Apr 2023, Ajit Agarwal via Gcc-patches wrote:

> Hello All:
> 
> This new version of patch 4 use improve ree pass for rs6000 target using 
> defined ABI interfaces.
> Bootstrapped and regtested on power64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> 
>   ree: Improve ree pass for rs6000 target using defined abi interfaces
> 
> For rs6000 target we see redundant zero and sign
> extension and done to improve ree pass to eliminate
> such redundant zero and sign extension using defines
> ABI interfaces.
> 
> 2023-04-22  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
> * ree.cc (combline_reaching_defs): Add zero_extend
> using defined abi interfaces.
> (add_removable_extension): use of defined abi interfaces
> for no reaching defs.
> (abi_extension_candidate_return_reg_p): New defined ABI function.
> (abi_extension_candidate_p): New defined ABI function.
> (abi_extension_candidate_argno_p): New defined ABI function.
> (abi_handle_regs_without_defs_p): New defined ABI function.
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.target/powerpc/zext-elim-3.C
> ---
>  gcc/ree.cc| 176 +++---
>  .../g++.target/powerpc/zext-elim-3.C  |  16 ++
>  2 files changed, 162 insertions(+), 30 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C
> 
> diff --git a/gcc/ree.cc b/gcc/ree.cc
> index 413aec7c8eb..0de96b1ece1 100644
> --- a/gcc/ree.cc
> +++ b/gcc/ree.cc
> @@ -473,7 +473,8 @@ get_defs (rtx_insn *insn, rtx reg, vec *dest)
>   break;
>  }
>  
> -  gcc_assert (use != NULL);
> +  if (use == NULL)
> +return NULL;
>  
>ref_chain = DF_REF_CHAIN (use);
>  
> @@ -514,7 +515,8 @@ get_uses (rtx_insn *insn, rtx reg)
>  if (REGNO (DF_REF_REG (def)) == REGNO (reg))
>break;
>  
> -  gcc_assert (def != NULL);
> +  if (def == NULL)
> +return NULL;
>  
>ref_chain = DF_REF_CHAIN (def);
>  
> @@ -750,6 +752,103 @@ get_extended_src_reg (rtx src)
>return src;
>  }
>  
> +/* Return TRUE if the candidate insn is zero extend and regno is
> +   an return  registers.  */
> +
> +static bool
> +abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno)
> +{
> +  rtx set = single_set (insn);
> +
> +  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
> +return false;
> +
> +  if (FUNCTION_VALUE_REGNO_P (regno))
> +return true;
> +
> +  return false;
> +}
> +
> +/* Return TRUE if reg source operand of zero_extend is argument registers
> +   and not return registers and source and destination operand are same
> +   and mode of source and destination operand are not same.  */
> +
> +static bool
> +abi_extension_candidate_p (rtx_insn *insn)
> +{
> +  rtx set = single_set (insn);
> +
> +  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
> +return false;
> +
> +  machine_mode ext_dst_mode = GET_MODE (SET_DEST (set));
> +  rtx orig_src = XEXP (SET_SRC (set),0);
> +
> +  bool copy_needed
> += (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
> +
> +  if (!copy_needed && ext_dst_mode != GET_MODE (orig_src)
> +  && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
> +  && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
> +return true;
> +
> +  return false;
> +}
> +
> +/* Return TRUE if the candidate insn is zero extend and regno is
> +   an argument registers.  */
> +
> +static bool
> +abi_extension_candidate_argno_p (rtx_code code, int regno)
> +{
> +  if (code !=  ZERO_EXTEND)
> +return false;
> +
> +  if (FUNCTION_ARG_REGNO_P (regno))
> +return true;
> +
> +  return false;
> +}

I don't see anything in those functions that checks if 
ZERO_EXTEND is actually a feature of the ABI, e.g. as opposed to 
no extension or SIGN_EXTEND.  Do I miss something?

Also, "!=  ZERO_EXTEND" has too many spaces, copy-pasted in 
several (all?) places.

Also, s/an return  registers/a return register/ (three errors).

brgds, H-P


Re: [PATCH v5 01/10] RISC-V: autovec: Add new predicates and function prototypes

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/26/23 15:45, Michael Collison wrote:

2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-protos.h
(riscv_vector_preferred_simd_mode): New.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(emit_vlmax_vsetvl): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(vlmul_field_enum): Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
Remove static scope.
* config/riscv/predicates.md (p_reg_or_const_csr_operand):
New predicate.
(vector_reg_or_const_dup_operand): Ditto.
* config/riscv/riscv-opts.h (riscv_vector_bits_enum): New enum.
(riscv_vector_lmul_enum): Ditto.
(vlmul_field_enum): Ditto.
---
  gcc/config/riscv/predicates.md  | 13 +
  gcc/config/riscv/riscv-opts.h   | 29 +
  gcc/config/riscv/riscv-protos.h |  9 +
  3 files changed, 51 insertions(+)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8654dbc5943..b3f2d622c7b 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -264,6 +264,14 @@
  })
  
  ;; Predicates for the V extension.

+(define_special_predicate "p_reg_or_const_csr_operand"
+  (match_code "reg, subreg, const_int")
+{
+  if (CONST_INT_P (op))
+return satisfies_constraint_K (op);
+  return GET_MODE (op) == Pmode;
+})

I don't see where this is used?  Perhaps defer?



  
+(define_predicate "vector_reg_or_const_dup_operand"

+  (ior (match_operand 0 "register_operand")
+   (match_test "const_vec_duplicate_p (op)
+   && !CONST_POLY_INT_P (CONST_VECTOR_ELT (op, 0))")))
+

Similarly.




diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index cf0cd669be4..af77df11430 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,35 @@ enum stack_protector_guard {
SSP_GLOBAL  /* global canary */
  };
  
+/* RISC-V auto-vectorization preference.  */

+enum riscv_autovec_preference_enum {
+  NO_AUTOVEC,
+  RVV_SCALABLE,
+  RVV_FIXED_VLMAX
+};
I think these were included in one of Juzhe's patches.  So you can 
probably drop them.





+enum vlmul_field_enum
+{
+  VLMUL_FIELD_000, /* LMUL = 1.  */
+  VLMUL_FIELD_001, /* LMUL = 2.  */
+  VLMUL_FIELD_010, /* LMUL = 4.  */
+  VLMUL_FIELD_011, /* LMUL = 8.  */
+  VLMUL_FIELD_100, /* RESERVED.  */
+  VLMUL_FIELD_101, /* LMUL = 1/8.  */
+  VLMUL_FIELD_110, /* LMUL = 1/4.  */
+  VLMUL_FIELD_111, /* LMUL = 1/2.  */
+  MAX_VLMUL_FIELD
+};

AFAICT these are unused.  Perhaps defer this hunk?


So no real objections.  There's one hunk that clearly shouldn't be 
applied as it's in one of Juzhe's recently applied patches.  There's a 
few hunks that don't look like they're used -- if you have strong 
reasons to believe they will be needed, go ahead and include them. 
Otherwise drop them for now.


OK for the trunk after addressing the comments above.


jeff


Re: [PATCH v5 06/11] RISC-V: Strengthen atomic stores

2023-04-28 Thread Hans Boehm via Gcc-patches
The RISC-V psABI pull request is at
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/378 . Pointers to
Paul Kirth's corresponding LLVM patches are also there.

On Fri, Apr 28, 2023 at 2:42 PM Hans Boehm  wrote:

> The concern with making the new behavior non-default is of course that the
> generated code will eventually end up on an A.7-capable platform.
>
> An A.6-classic option for compiling code that will never run on a newer
> machine seems OK. But I'm not sure that seq_cst stores are dynamically
> frequent enough in C++ code for this to be worth the trouble. Unlike loads,
> they are also costly on x86, programmers may also have been somewhat
> trained to avoid them where possible. (And probably where not possible, too
> :-( )
>
> Hans
>
> On Fri, Apr 28, 2023 at 10:43 AM Palmer Dabbelt 
> wrote:
>
>> On Fri, 28 Apr 2023 10:40:15 PDT (-0700), jeffreya...@gmail.com wrote:
>> >
>> >
>> > On 4/27/23 10:22, Patrick O'Neill wrote:
>> >> This change makes atomic stores strictly stronger than table A.6 of the
>> >> ISA manual. This mapping makes the overall patchset compatible with
>> >> table A.7 as well.
>> >>
>> >> 2023-04-27 Patrick O'Neill 
>> >>
>> >>  PR 89835
>> > Should be "PR target/89835"
>> >
>> >>
>> >> gcc/ChangeLog:
>> >>
>> >>  * config/riscv/sync.md:
>> > Needs some text here :-)
>> >
>> >
>> > I'm not objecting to this patch, but I think we've got an option
>> > question about whether or not this approach is too expensive for
>> > existing or soon arriving implementations.
>> >
>> > If the decision on that topic is to just pay the cost, then this patch
>> > is fine.  If we decide to make compatibility optional to avoid the
>> > additional cost, then this will need suitable adjustments.
>>
>> IMO the only hardware that's going to be here by gcc-14 and to have
>> enough concurrency for these to matter is the Ventana stuff.  I think
>> you're the only one who can figure out if these are slow, at least until
>> that stuff is availiable outside the lab.
>>
>> So are they too slow for you?
>>
>> >
>> > Jeff
>>
>


Re: [PATCH v5 03/10] RISC-V:autovec: Add auto-vectorization support functions

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/26/23 15:45, Michael Collison wrote:

2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-v.cc
(riscv_vector_preferred_simd_mode): New function.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
---



@@ -176,6 +178,46 @@ calculate_ratio (unsigned int sew, enum vlmul_type vlmul)
return ratio;
  }
  
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */

Doesn't really tell me much.

/* Return the preferred SIMD mode for MODE.  */



@@ -421,6 +463,43 @@ get_avl_type_rtx (enum avl_type type)
return gen_int_mode (type, Pmode);
  }
  
+rtx

+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
I'm guessing the call in get_tail_policy_no_pred should have been to 
get_tail_policy_for_pred rather than get_mask_policy_for_pred.

 


A short function comment for the two functions seems appropriate.




+
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE for RVV.  */

How about
/* Return the appropriate mask mode for MODE.  */


OK with the trivial fixes noted above.

Jeff


Re: [PATCH v5 02/10] RISC-V: autovec: Export policy functions to global scope

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/26/23 15:45, Michael Collison wrote:

2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Remove static declaration to to make externally visible.
(get_mask_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
New external declaration.
(get_mask_policy_for_pred): Ditto.

OK.  No need for these to wait for anything IMHO.

jeff


Re: [PATCH v3 3/4] ree: Main functionality to Improve ree pass for rs6000 target

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/20/23 15:03, Ajit Agarwal wrote:



Currently I support AND with const1_rtx. This is what is equivalent to zero 
extension instruction in power instruction set. When you specify many other 
constants and Could you please specify what other constants needs to be 
supported and how to determine on the Input and output modes.
x AND  will result in a zero-extended representation for a 
variety of constants, not just 1.  For example


For example x AND 3, x AND 7, x AND 15, etc.

If (const_int 1) is really that special here, then I've either 
completely misunderstood the intention of your patch or there's 
something quite special about the PPC port that I'm not aware of.


Jeff


Re: [PATCH] libstdc++: Another attempt to ensure g++ 13+ compiled programs enforce gcc 13.2+ libstdc++.so.6 [PR108969]

2023-04-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Apr 28, 2023 at 09:35:49AM +0100, Jonathan Wakely wrote:
> Yes, for both, thanks for the fix.
> 
> After it lands on the gcc-13 branch I'll also update the manual with:
> 
> --- a/libstdc++-v3/doc/xml/manual/abi.xml
> +++ b/libstdc++-v3/doc/xml/manual/abi.xml
> @@ -275,6 +275,7 @@ compatible.
> GCC 11.1.0: libstdc++.so.6.0.29
> GCC 12.1.0: libstdc++.so.6.0.30
> GCC 13.1.0: libstdc++.so.6.0.31
> +GCC 13.2.0: libstdc++.so.6.0.32
> 
> 
>   Note 1: Error should be libstdc++.so.3.0.3.

Don't you need to change later parts too?
I mean adding
  GCC 13.2.0: GLIBCXX_3.4.32, CXXABI_1.3.14
entry.

Jakub



Re: [PATCH] RISC-V: Add testcases for RVV auto-vectorization

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/6/23 19:37, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/rvv.exp: Add auto-vectorization testing.
 * gcc.target/riscv/rvv/vsetvl/vsetvl-17.c: Adapt testcase.
 * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c: New test.
 * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h: New test.
 * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.c: New test.
 * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.h: New test.
 * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c: New 
test.
 * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c: New 
test.
 * gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c: New test.
 * gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: New test.
 * gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c: New test.
 * gcc.target/riscv/rvv/autovec/template-1.h: New test.
 * gcc.target/riscv/rvv/autovec/v-1.c: New test.
 * gcc.target/riscv/rvv/autovec/v-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32f-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32f-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32f-3.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32x-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32x-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32x-3.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64d-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64d-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64d-3.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64f-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64f-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64f-3.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64x-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64x-2.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64x-3.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: New test.
 * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: New test.
This is fine for the trunk once the prerequisite patches to enable 
autovect are installed (and thus the test passes).


jeff


Re: [PATCH V2] RISC-V: Modified validation information for contracts-tmpl-spec2.C

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/6/23 21:37, shiyul...@iscas.ac.cn wrote:

From: yulong 

This patch fixes the problem of the contracts-tmpl-spec2.c running failure.
When run the dejagnu test, I find that the output is inconsistent with that 
verified
in the testcase. So I try to modify it, and then it can be passed.

gcc/testsuite/ChangeLog:

 * g++.dg/contracts/contracts-tmpl-spec2.C:delete some output 
information
I think you need to debug why you get different output from this test. 
Just removing the output because it didn't match on risc-v seems wrong.


jeff


Re: [PATCH V5] Testsuite: Fix a redefinition bug for the fd-4.c

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/12/23 07:18, shiyul...@iscas.ac.cn wrote:

From: yulong 

This patch fix a redefinition bug.
There are have a definition about mode_t in the fd-4.c, but it duplicates the 
definition in types.h that be included by stdio.h.
Thanks to Jeff Law for reviewing the previous version.

gcc/testsuite/ChangeLog:

 * gcc.dg/analyzer/fd-4.c: delete the definition of mode_t.
This appears to be exactly the same as the prior version.  Ultimately 
David Malcolm needs to make a decision on whether or not to accept this 
patch as it's a patc for the analyzer's testsuite.


David -- comments?
jeff


Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/28/23 09:21, Pan Li via Gcc-patches wrote:

From: Pan Li 

When some RVV integer compare operators act on the same vector registers
without mask. They can be simplified to VMSET.

This PATCH allows the eq, le, leu, ge, geu to perform such kind of the
simplification by adding one macro in riscv for simplify rtx.

Given we have:
vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl)
{
   return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v8,0(a1)
vmseq.vv v8,v8,v8
vsetvli  a5,zero,e8,m8,ta,ma
vsm.vv8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,m8,ta,ma
vmset.m v1  <- optimized to vmset.m
vsetvli a5,zero,e8,m8,ta,ma
vsm.v   v1,0(a0)
ret

As above, we may have one instruction eliminated and require less vector
registers.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
  consumed by simplify_rtx.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
  Adjust test check condition.

I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My 
understanding is we have one output bit per input element in the 
comparison.  So unless the number of elements matches the bit width of 
the mask register, this isn't going to work.


Am I missing something?

Jeff




Re: [PATCH v5 06/11] RISC-V: Strengthen atomic stores

2023-04-28 Thread Hans Boehm via Gcc-patches
The concern with making the new behavior non-default is of course that the
generated code will eventually end up on an A.7-capable platform.

An A.6-classic option for compiling code that will never run on a newer
machine seems OK. But I'm not sure that seq_cst stores are dynamically
frequent enough in C++ code for this to be worth the trouble. Unlike loads,
they are also costly on x86, programmers may also have been somewhat
trained to avoid them where possible. (And probably where not possible, too
:-( )

Hans

On Fri, Apr 28, 2023 at 10:43 AM Palmer Dabbelt  wrote:

> On Fri, 28 Apr 2023 10:40:15 PDT (-0700), jeffreya...@gmail.com wrote:
> >
> >
> > On 4/27/23 10:22, Patrick O'Neill wrote:
> >> This change makes atomic stores strictly stronger than table A.6 of the
> >> ISA manual. This mapping makes the overall patchset compatible with
> >> table A.7 as well.
> >>
> >> 2023-04-27 Patrick O'Neill 
> >>
> >>  PR 89835
> > Should be "PR target/89835"
> >
> >>
> >> gcc/ChangeLog:
> >>
> >>  * config/riscv/sync.md:
> > Needs some text here :-)
> >
> >
> > I'm not objecting to this patch, but I think we've got an option
> > question about whether or not this approach is too expensive for
> > existing or soon arriving implementations.
> >
> > If the decision on that topic is to just pay the cost, then this patch
> > is fine.  If we decide to make compatibility optional to avoid the
> > additional cost, then this will need suitable adjustments.
>
> IMO the only hardware that's going to be here by gcc-14 and to have
> enough concurrency for these to matter is the Ventana stuff.  I think
> you're the only one who can figure out if these are slow, at least until
> that stuff is availiable outside the lab.
>
> So are they too slow for you?
>
> >
> > Jeff
>


Re: [PATCH] riscv: generate builtin macro for compilation with strict alignment

2023-04-28 Thread Vineet Gupta




On 4/20/23 09:56, Jeff Law via Gcc-patches wrote:



On 1/17/23 15:59, Vineet Gupta wrote:

This could be useful for library writers who want to write code variants
for fast vs. slow unaligned accesses.

We distinguish explicit -mstrict-align (1) vs. slow_unaligned_access
cpu tune param (2) for even more code divesity.

gcc/ChangeLog:

* config/riscv-c.cc (riscv_cpu_cpp_builtins):
  Generate __riscv_strict_align with value 1 or 2.
* config/riscv/riscv.cc: Define riscv_user_wants_strict_align.
  (riscv_option_override) Set riscv_user_wants_strict_align to
  TARGET_STRICT_ALIGN.
* config/riscv/riscv.h: Declare riscv_user_wants_strict_align.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute.c: Check for
  __riscv_strict_align=1.
* gcc.target/riscv/predef-align-1.c: New test.
* gcc.target/riscv/predef-align-2.c: New test.
* gcc.target/riscv/predef-align-3.c: New test.
* gcc.target/riscv/predef-align-4.c: New test.
* gcc.target/riscv/predef-align-5.c: New test.

Signed-off-by: Vineet Gupta 
---
  gcc/config/riscv/riscv-c.cc | 11 +++
  gcc/config/riscv/riscv.cc   |  9 +
  gcc/config/riscv/riscv.h    |  1 +
  gcc/testsuite/gcc.target/riscv/attribute-4.c    |  9 +
  gcc/testsuite/gcc.target/riscv/predef-align-1.c | 12 
  gcc/testsuite/gcc.target/riscv/predef-align-2.c | 11 +++
  gcc/testsuite/gcc.target/riscv/predef-align-3.c | 15 +++
  gcc/testsuite/gcc.target/riscv/predef-align-4.c | 16 
  gcc/testsuite/gcc.target/riscv/predef-align-5.c | 16 
  9 files changed, 100 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-1.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-2.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-3.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-4.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-5.c

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 826ae0067bb8..47a396501d74 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -102,6 +102,17 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
    }
  +  /* TARGET_STRICT_ALIGN does not cover all cases.  */
+  if (riscv_slow_unaligned_access_p)
+    {
+  /* Explicit -mstruct-align preceedes cpu tune param
+ slow_unaligned_access=true.  */

Did you mean "-mstrict-align" above?


Doh sorry yes.





+  if (riscv_user_wants_strict_align)
+    builtin_define_with_int_value ("__riscv_strict_align", 1);
+  else
+    builtin_define_with_int_value ("__riscv_strict_align", 2);
So I don't understand why we're testing 
"riscv_user_wants_strict_align" instead of TARGET_STRICT_ALIGN here.  
AFAICT they're equivalent.  But maybe there's something subtle I'm 
missing.


The missing part is slightly over-engineered unaligned access signaling 
in RV gcc frontend IMHO.


Thing is -mno-strict-align can be over-ruled by the cpu tune param 
slow_unaligned_access=true (and behave as if -mstrict-align was passed)
And I wanted the macro to reflect this (for future proofing) by being 
defined but with different values.


There's some renewed discussion with Kito on [1] so I need to respin 
this after getting the agreed upon specification in there.


Thx,
-Vineet

[1] https://github.com/riscv-non-isa/riscv-c-api-doc/issues/32


Re: [PATCH] RISC-V: decouple stack allocation for rv32e w/o save-restore.

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/21/23 04:07, Fei Gao wrote:

Currently in rv32e, stack allocation for GPR callee-saved registers is
always 12 bytes w/o save-restore. Actually, for the case without save-restore,
less stack memory can be reserved. This patch decouples stack allocation for
rv32e w/o save-restore and makes riscv_compute_frame_info more readable.

output of testcase rv32e_stack.c
before patch:
addisp,sp,-16
sw  ra,12(sp)
callgetInt
sw  a0,0(sp)
lw  a0,0(sp)
callPrintInts
lw  a5,0(sp)
mv  a0,a5
lw  ra,12(sp)
addisp,sp,16
jr  ra

after patch:
addisp,sp,-8
sw  ra,4(sp)
callgetInt
sw  a0,0(sp)
lw  a0,0(sp)
callPrintInts
lw  a5,0(sp)
mv  a0,a5
lw  ra,4(sp)
addisp,sp,8
jr  ra

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_forbid_save_libcall): helper function 
for riscv_use_save_libcall.
 (riscv_use_save_libcall): call riscv_forbid_save_libcall.
 (riscv_compute_frame_info): restructure to decouple stack allocation 
for rv32e w/o save-restore.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rv32e_stack.c: New test.
---
  gcc/config/riscv/riscv.cc| 57 
  gcc/testsuite/gcc.target/riscv/rv32e_stack.c | 14 +
  2 files changed, 49 insertions(+), 22 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_stack.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5d2550871c7..6ccdfe96fe7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4772,12 +4772,26 @@ riscv_save_reg_p (unsigned int regno)
return false;
  }
  
+/* Determine whether to disable GPR save/restore routines.  */

+static bool
+riscv_forbid_save_libcall (void)

I would suggest something like this for the function comment:

/* Return TRUE if a libcall to save/restore GPRs should be
   avoided.  FALSE otherwise.  */

I would also change the name from "forbid" to "avoid".


With those changes I think this will be ready for the trunk.  So repost 
after those changes and I'll get it pushed into the trunk.


Thanks,
Jeff


[PATCH] libstdc++: Mention recent libgcc_s symbol versions in manual

2023-04-28 Thread Florian Weimer via Gcc-patches
GCC_11.0 is an aarch64-specific outlier.

* doc/xml/manual/abi.xml (abi.versioning.history): Add
GCC_7.0.0, GCC_9.0.0, GCC_11.0, GCC_12.0.0, GCC_13.0.0 for
libgcc_s.

---
 libstdc++-v3/doc/xml/manual/abi.xml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libstdc++-v3/doc/xml/manual/abi.xml 
b/libstdc++-v3/doc/xml/manual/abi.xml
index 3a3cbd3346c..e0e241de3bd 100644
--- a/libstdc++-v3/doc/xml/manual/abi.xml
+++ b/libstdc++-v3/doc/xml/manual/abi.xml
@@ -203,6 +203,11 @@ compatible.
 GCC 4.6.0: GCC_4.6.0
 GCC 4.7.0: GCC_4.7.0
 GCC 4.8.0: GCC_4.8.0
+GCC 7.1.0: GCC_7.0.0
+GCC 9.1.0: GCC_9.0.0
+GCC 11.1.0: GCC_11.0
+GCC 12.1.0: GCC_12.0.0
+GCC 13.1.0: GCC_13.0.0
 
 
 

base-commit: 0c77a0909456034d34036aa22a8dfcf0258cfa2d



Re: [PATCH] riscv: Allow vector constants in riscv_const_insns.

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/28/23 10:10, Robin Dapp wrote:

Hi,

I figured I'm going to start sending some patches that build on top
of the upcoming RISC-V autovectorization.  This one is obviously
not supposed to be installed before the basic support lands but
it's small enough that it shouldn't hurt to send it now.

This patch allows vector constants in riscv_const_insns in order
for them to be properly recognized as immediate operands such that
we can emit vmv.v.i instructions via autovec.

Bootstrapped and regtested on riscv32gcv and riscv64gcv.

Regards
  Robin

--

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Add permissible
vector constants.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vmv-imm.c: New test.

OK for the trunk once the basic bits are in.
jeff


Re: RISC-V: Add divmod instruction support

2023-04-28 Thread Jeff Law via Gcc-patches



On 2/17/23 07:02, Matevos Mehrabyan via Gcc-patches wrote:

Hi all,
If we have division and remainder calculations with the same operands:

   a = b / c;
   d = b % c;

We can replace the calculation of remainder with multiplication +
subtraction, using the result from the previous division:

   a = b / c;
   d = a * c;
   d = b - d;

Which will be faster.
Currently, it isn't done for RISC-V.

I've added an expander for DIVMOD which replaces 'rem' with 'mul + sub'.

Best regards,
Matevos.

gcc/ChangeLog:

 * config/riscv/riscv.md: Added divmod expander.

gcc/testsuite/ChangeLog:
 * gcc.target/riscv/divmod.c: New testcase.

So here's an update to the patch that I think addresses the key concerns.

Specifically use of the divmod expander is now conditional on the tuning 
info which allows for better control over using div+rem or div+mul+sub.


Given I don't know the right tunings for any implementation other than 
Veyron V1, I left them as-is.  So all the implementations will still use 
the div+rem sequence.  Obviously when we submit the Veyron V1 tunings, 
it will use div+mul+sub.  I expect other implementations will prefer 
div+mul+sub as well.


The testcase is split into two tests.  One to verify the old div+rem 
sequence, the other to test for div+mul+sub.  The latter test is 
disabled for now.  Once the first uarch has flipped tuning, we can add 
the proper tune parameter and enable the test.


Attached is the patch I committed.

Thanks and sorry for the long delay.


jeffcommit 065be0ffbcd676b635d492f4679e635b6ece4fe4
Author: Matevos Mehrabyan 
Date:   Fri Apr 28 14:01:30 2023 -0600

RISC-V: Add divmod expansion support

Hi all,
If we have division and remainder calculations with the same operands:

  a = b / c;
  d = b % c;

We can replace the calculation of remainder with multiplication +
subtraction, using the result from the previous division:

  a = b / c;
  d = a * c;
  d = b - d;

Which will be faster.
Currently, it isn't done for RISC-V.

I've added an expander for DIVMOD which replaces 'rem' with 'mul + sub'.

Best regards,
Matevos.

gcc/ChangeLog:

* config/riscv/iterators.md (only_div, paired_mod): New iterators.
(u): Add div/udiv cases.
* config/riscv/riscv-protos.h (riscv_use_divmod_expander): 
Prototype.
* config/riscv/riscv.cc (struct riscv_tune_param): Add field for
divmod expansion.
(rocket_tune_info, sifive_7_tune_info): Initialize new field.
(thead_c906_tune_info): Likewise.
(optimize_size_tune_info): Likewise.
(riscv_use_divmod_expander): New function.
* config/riscv/riscv.md (divmod4): New expander.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/divmod-1.c: New testcase.
* gcc.target/riscv/divmod-2.c: New testcase.

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 9b767038452..1d56324df03 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -152,6 +152,11 @@ (define_code_iterator any_div [div udiv mod umod])
 ;; from the same template.
 (define_code_iterator any_mod [mod umod])
 
+;; These code iterators allow unsigned and signed divmod to be generated
+;; from the same template.
+(define_code_iterator only_div [div udiv])
+(define_code_attr paired_mod [(div "mod") (udiv "umod")])
+
 ;; These code iterators allow the signed and unsigned scc operations to use
 ;; the same template.
 (define_code_iterator any_gt [gt gtu])
@@ -181,6 +186,7 @@ (define_code_attr u [(sign_extend "") (zero_extend "u")
 (lt "") (ltu "u")
 (le "") (leu "u")
 (fix "") (unsigned_fix "u")
+(div "") (udiv "u")
 (float "") (unsigned_float "u")])
 
 ;;  is like , but the signed form expands to "s" rather than "".
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index f87661bde2c..5a927bdf1b0 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -239,4 +239,5 @@ extern const char*
 th_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE);
 #endif
 
+extern bool riscv_use_divmod_expander (void);
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1529855a2b4..09a30dc260f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -236,6 +236,7 @@ struct riscv_tune_param
   unsigned short memory_cost;
   unsigned short fmv_cost;
   bool slow_unaligned_access;
+  bool use_divmod_expansion;
 };
 
 /* Information about one micro-arch we know about.  */
@@ -323,6 +324,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   5,   /* memory_cost */
   8,   /* fmv_cost 

[PATCH] WIP: All the -march documentation I got around to writing

2023-04-28 Thread Palmer Dabbelt
Kito and I were talking this morning, he's going to try and find the
time to actually write this.  Kind of odd to send to the mailing list,
but I figure that's the easist way to get it out.  It's very much not
mergeable as is...
---
 gcc/doc/invoke.texi | 87 ++---
 1 file changed, 83 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2f40c58b21c..6155b3d5ce1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -28968,10 +28968,89 @@ The default is @option{-misa-spec=20191213} unless 
GCC has been configured
 with @option{--with-isa-spec=} specifying a different default version.
 
 @opindex march
-@item -march=@var{ISA-string}
-Generate code for given RISC-V ISA (e.g.@: @samp{rv64im}).  ISA strings must be
-lower-case.  Examples include @samp{rv64i}, @samp{rv32g}, @samp{rv32e}, and
-@samp{rv32imaf}.
+@item -march=@var{feature-string}
+Generate code for given feature string.  Feature strings are similar in spirit
+to RISC-V ISA strings, but are subtly different.  Users that just what to
+target a specific CPU should consider using the @option{-mcpu} argument, 
+
+Similar to ISA strings, feature strings consist of a single base feature set
+along with zero or more extensions.  The supported base feature sets are
+@samp{rv32i}, @samp{rv64i}, @samp{rv32e}, @samp{rv32g}, and @samp{rv64g}.
+These mostly match the cooresponding base ISA for the current
+@var{ISA-spec-string} (see @option{-misa-spec}).
+
+The supported extensions are:
+
+@table @code
+@item m: Generate code for the M extension, 
+@item a
+@item f
+@item d
+@item c
+@item h
+@item v
+@item zicsr
+@item zifencei
+@item zawrs
+@item zba
+@item zbb
+@item zbc
+@item zbs
+@item zfinx
+@item zdinx
+@item zhinx
+@item zhinxmin
+@item zbkb
+@item zbkc
+@item zbkx
+@item zkne
+@item zknh
+@item zkr
+@item zksed
+@item zksh
+@item zkt
+@item zicboz
+@item zicbom
+@item zkcbop
+@item zk
+@item zkn
+@item zks
+@item zve32x
+@item zve32f
+@item zve32d
+@item zve64x
+@item zve64f
+@item zve64d
+@item zvl32b
+@item zvl64b
+@item zvl128b
+@item zvl256b
+@item zvl512b
+@item zvl1024b
+@item zvl2048b
+@item zvl4096b
+@item zvl8192b
+@item zvl16384b
+@item zvl32768b
+@item zvl65536b
+@item zfh
+@item zfhmin
+@item zmmul
+@item svinval
+@item snapot
+@item xtheadba
+@item xtheadbb
+@item xtheadbs
+@item xtheadcmo
+@item xtheadcondmov
+@item xtheadfmemidx
+@item xtheadfmv
+@item xtheadint
+@item xtheadmac
+@item xtheadmemidx
+@item xtheadmempair
+@item xtheadsync
+@end table
 
 When @option{-march=} is not specified, use the setting from @option{-mcpu}.
 
-- 
2.40.0



Re: [PATCH] c++: RESULT_DECL replacement in constexpr call result [PR105440]

2023-04-28 Thread Patrick Palka via Gcc-patches
On Fri, 28 Apr 2023, Patrick Palka wrote:

> On Fri, 28 Apr 2023, Patrick Palka wrote:
> 
> > After mechanically replacing RESULT_DECL within a constexpr call result
> > (for sake of RVO), we can in some cases simplify the call result
> > further.
> > 
> > In the below testcase the result of get() during evaluation of a's
> > initializer is the self-referential CONSTRUCTOR:
> > 
> >   {._M_p=(char *) &._M_local_buf}
> > 
> > which after replacing RESULT_DECL with ctx->object (aka *D.2603, where
> > the D.2603 temporary points to the current element of _M_elems under
> > construction) becomes:
> > 
> >   {._M_p=(char *) >_M_local_buf}
> > 
> > but what we really want is:
> > 
> >   {._M_p=(char *) _M_elems[0]._M_local_buf}.
> > 
> > so that the value of _M_p is independent of the value of the mutable
> > D.2603 temporary.
> > 
> > So to that end, it seems we should constexpr evaluate the result again
> > after RESULT_DECL replacement, which is what this patch implements.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> > 
> > PR libstdc++/105440
> > 
> > gcc/cp/ChangeLog:
> > 
> > * constexpr.cc (cxx_eval_call_expression): If any RESULT_DECLs get
> > replaced in the call result, try further evaluating the result.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/constexpr-dtor16.C: New test.
> > ---
> >  gcc/cp/constexpr.cc   | 12 +-
> >  gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C | 39 +++
> >  2 files changed, 49 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C
> > 
> > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> > index d1097764b10..22a1609e664 100644
> > --- a/gcc/cp/constexpr.cc
> > +++ b/gcc/cp/constexpr.cc
> > @@ -3213,7 +3213,12 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
> > tree t,
> > && CLASS_TYPE_P (TREE_TYPE (res))
> > && !is_empty_class (TREE_TYPE (res)))
> >   if (replace_decl (, res, ctx->object))
> > -   cacheable = false;
> > +   {
> > + cacheable = false;
> > + result = cxx_eval_constant_expression (ctx, result, lval,
> > +non_constant_p,
> > +overflow_p);
> > +   }
> > }
> >else
> > /* Couldn't get a function copy to evaluate.  */
> > @@ -5988,9 +5993,12 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
> > tree t,
> > object = probe;
> >   else
> > {
> > + tree orig_probe = probe;
> >   probe = cxx_eval_constant_expression (ctx, probe, vc_glvalue,
> > non_constant_p, overflow_p);
> >   evaluated = true;
> > + if (orig_probe == target)
> > +   target = probe;
> 
> Whoops, thanks to an accidental git commit --amend this patch contains
> an alternative approach that I considered: in cxx_eval_store_expression,
> ensure that we always set ctx->object to a fully reduced result (so
> _M_elems[0] instead of of *D.2603 in this case), which means later
> RESULT_DECL replacement with ctx->object should yield an already reduced
> result as well.  But with this approach I ran into a bogus "modifying
> const object" error on cpp1y/constexpr-tracking-const23.C so I gave up
> on it :(

Ah, the problem was that later in cxx_eval_store_expression we were
suppressing a TREE_READONLY update via pattern matching on 'target',
but if we are now updating 'target' to its reduced value the pattern
matching needs to consider the shape of the original 'target' instead.
Here's an alternative fix for this PR that passes regression testing,
not sure which approach would be preferable.

PR c++/105440

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_store_expression): Save the original
target in 'orig_target'.  Update 'target' after evaluating it in
the 'probe' loop.  Use 'orig_target' instead of 'target' when
appropriate.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-dtor16.C: New test.
---
 gcc/cp/constexpr.cc | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 9dbbf6eec03..2939ac89a98 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -5902,8 +5902,10 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
tree t,
 /* Just ignore clobbers.  */
 return void_node;
 
+  const tree orig_target = TREE_OPERAND (t, 0);
+
   /* First we figure out where we're storing to.  */
-  tree target = TREE_OPERAND (t, 0);
+  tree target = orig_target;
 
   maybe_simplify_trivial_copy (target, init);
 
@@ -5993,9 +5995,12 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
tree t,
object = probe;
  else
{
+ bool is_target = (probe == 

Re: [PATCH] c++: RESULT_DECL replacement in constexpr call result [PR105440]

2023-04-28 Thread Patrick Palka via Gcc-patches
On Fri, 28 Apr 2023, Patrick Palka wrote:

> After mechanically replacing RESULT_DECL within a constexpr call result
> (for sake of RVO), we can in some cases simplify the call result
> further.
> 
> In the below testcase the result of get() during evaluation of a's
> initializer is the self-referential CONSTRUCTOR:
> 
>   {._M_p=(char *) &._M_local_buf}
> 
> which after replacing RESULT_DECL with ctx->object (aka *D.2603, where
> the D.2603 temporary points to the current element of _M_elems under
> construction) becomes:
> 
>   {._M_p=(char *) >_M_local_buf}
> 
> but what we really want is:
> 
>   {._M_p=(char *) _M_elems[0]._M_local_buf}.
> 
> so that the value of _M_p is independent of the value of the mutable
> D.2603 temporary.
> 
> So to that end, it seems we should constexpr evaluate the result again
> after RESULT_DECL replacement, which is what this patch implements.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?
> 
>   PR libstdc++/105440
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (cxx_eval_call_expression): If any RESULT_DECLs get
>   replaced in the call result, try further evaluating the result.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/constexpr-dtor16.C: New test.
> ---
>  gcc/cp/constexpr.cc   | 12 +-
>  gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C | 39 +++
>  2 files changed, 49 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C
> 
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index d1097764b10..22a1609e664 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -3213,7 +3213,12 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
> tree t,
>   && CLASS_TYPE_P (TREE_TYPE (res))
>   && !is_empty_class (TREE_TYPE (res)))
> if (replace_decl (, res, ctx->object))
> - cacheable = false;
> + {
> +   cacheable = false;
> +   result = cxx_eval_constant_expression (ctx, result, lval,
> +  non_constant_p,
> +  overflow_p);
> + }
>   }
>else
>   /* Couldn't get a function copy to evaluate.  */
> @@ -5988,9 +5993,12 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
> tree t,
>   object = probe;
> else
>   {
> +   tree orig_probe = probe;
> probe = cxx_eval_constant_expression (ctx, probe, vc_glvalue,
>   non_constant_p, overflow_p);
> evaluated = true;
> +   if (orig_probe == target)
> + target = probe;

Whoops, thanks to an accidental git commit --amend this patch contains
an alternative approach that I considered: in cxx_eval_store_expression,
ensure that we always set ctx->object to a fully reduced result (so
_M_elems[0] instead of of *D.2603 in this case), which means later
RESULT_DECL replacement with ctx->object should yield an already reduced
result as well.  But with this approach I ran into a bogus "modifying
const object" error on cpp1y/constexpr-tracking-const23.C so I gave up
on it :(

Here's the correct patch:

PR libstdc++/105440

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): If any RESULT_DECLs get
replaced in the call result, try further evaluating the result.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-dtor16.C: New test.
---
 gcc/cp/constexpr.cc   |  7 +++-
 gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C | 39 +++
 2 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index d1097764b10..9dbbf6eec03 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -3213,7 +3213,12 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
&& CLASS_TYPE_P (TREE_TYPE (res))
&& !is_empty_class (TREE_TYPE (res)))
  if (replace_decl (, res, ctx->object))
-   cacheable = false;
+   {
+ cacheable = false;
+ result = cxx_eval_constant_expression (ctx, result, lval,
+non_constant_p,
+overflow_p);
+   }
}
   else
/* Couldn't get a function copy to evaluate.  */
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C
new file mode 100644
index 000..707a3e025b1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C
@@ -0,0 +1,39 @@
+// PR c++/105440
+// { dg-do compile { target c++20 } }
+
+struct basic_string {
+  char _M_local_buf[32];
+  char* _M_p;

[PATCH] c++: RESULT_DECL replacement in constexpr call result [PR105440]

2023-04-28 Thread Patrick Palka via Gcc-patches
After mechanically replacing RESULT_DECL within a constexpr call result
(for sake of RVO), we can in some cases simplify the call result
further.

In the below testcase the result of get() during evaluation of a's
initializer is the self-referential CONSTRUCTOR:

  {._M_p=(char *) &._M_local_buf}

which after replacing RESULT_DECL with ctx->object (aka *D.2603, where
the D.2603 temporary points to the current element of _M_elems under
construction) becomes:

  {._M_p=(char *) >_M_local_buf}

but what we really want is:

  {._M_p=(char *) _M_elems[0]._M_local_buf}.

so that the value of _M_p is independent of the value of the mutable
D.2603 temporary.

So to that end, it seems we should constexpr evaluate the result again
after RESULT_DECL replacement, which is what this patch implements.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR libstdc++/105440

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): If any RESULT_DECLs get
replaced in the call result, try further evaluating the result.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-dtor16.C: New test.
---
 gcc/cp/constexpr.cc   | 12 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C | 39 +++
 2 files changed, 49 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index d1097764b10..22a1609e664 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -3213,7 +3213,12 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
&& CLASS_TYPE_P (TREE_TYPE (res))
&& !is_empty_class (TREE_TYPE (res)))
  if (replace_decl (, res, ctx->object))
-   cacheable = false;
+   {
+ cacheable = false;
+ result = cxx_eval_constant_expression (ctx, result, lval,
+non_constant_p,
+overflow_p);
+   }
}
   else
/* Couldn't get a function copy to evaluate.  */
@@ -5988,9 +5993,12 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
tree t,
object = probe;
  else
{
+ tree orig_probe = probe;
  probe = cxx_eval_constant_expression (ctx, probe, vc_glvalue,
non_constant_p, overflow_p);
  evaluated = true;
+ if (orig_probe == target)
+   target = probe;
  if (*non_constant_p)
return t;
}
@@ -6154,7 +6162,7 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, tree 
t,
   if (!empty_base && !(same_type_ignoring_top_level_qualifiers_p
   (initialized_type (init), type)))
 {
-  gcc_assert (is_empty_class (TREE_TYPE (target)));
+  gcc_assert (is_empty_class (TREE_TYPE (TREE_OPERAND (t, 0;
   empty_base = true;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C
new file mode 100644
index 000..707a3e025b1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor16.C
@@ -0,0 +1,39 @@
+// PR c++/105440
+// { dg-do compile { target c++20 } }
+
+struct basic_string {
+  char _M_local_buf[32];
+  char* _M_p;
+  constexpr basic_string() : _M_p{_M_local_buf} { }
+  constexpr void f() { if (_M_p) { } }
+  constexpr ~basic_string() { if (_M_p) { } }
+};
+
+template
+struct array {
+  basic_string _M_elems[N];
+};
+
+constexpr basic_string get() { return {}; }
+
+constexpr bool f1() {
+  array<1> a{get()};
+  a._M_elems[0].f();
+
+  return true;
+}
+
+constexpr bool f2() {
+  array<2> a2{get(), get()};
+  array<3> a3{get(), get(), get()};
+
+  for (basic_string& e : a2._M_elems)
+e.f();
+  for (basic_string& e : a3._M_elems)
+e.f();
+
+  return true;
+}
+
+static_assert(f1());
+static_assert(f2());
-- 
2.40.1.445.gf85cd430b1



Re: RISC-V: Added support clmul[r,h] instructions for Zbc extension.

2023-04-28 Thread Jeff Law via Gcc-patches



On 4/27/23 08:29, Karen Sargsyan via Gcc-patches wrote:

clmul[h] instructions were added only for the ZBKC extension.
This patch includes them in the ZBC extension too.
Besides, added support of 'clmulr' instructions for ZBC extension.

gcc/ChangeLog:

  * config/riscv/bitmanip.md: Added clmulr instruction.
  * config/riscv/riscv-builtins.cc (AVAIL): Add new.
  * config/riscv/riscv.md: (UNSPEC_CLMULR): Add new unspec type.
  * config/riscv/riscv-cmo.def: Added built-in function for clmulr.
  * config/riscv/crypto.md: Move clmul[h] instructions to bitmanip.md.
  * config/riscv/riscv-scalar-crypto.def: Move clmul[h] built-in
functions to riscv-cmo.def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbc32.c: New test.
* gcc.target/riscv/zbc64.c: New test.
THanks.  I made a couple of minor changes.  Specifically rather than 
using the "bitmanip" type, I added a "clmul" type.  These instructions 
are typically not going to be single cycle and thus uarchs probably want 
to schedule them differently than a generic bitmanip instruction.  I 
also added the clmul type to the generic pipeline desription, routing 
into the generic_imul unit which seemed like the best fit.


Attached is the actual patch I committed.

Jeffcommit d9df45a66b2c8f543106be0a2387bbe6195b00a6
Author: Karen Sargsyan 
Date:   Fri Apr 28 12:45:34 2023 -0600

RISC-V: Added support clmul[r,h] instructions for Zbc extension.

clmul[h] instructions were added only for the ZBKC extension.
This patch includes them in the ZBC extension too.
Besides, added support of 'clmulr' instructions for ZBC extension.

gcc/ChangeLog:

* config/riscv/bitmanip.md: Added clmulr instruction.
* config/riscv/riscv-builtins.cc (AVAIL): Add new.
* config/riscv/riscv.md: (UNSPEC_CLMULR): Add new unspec type.
(type): Add clmul
* config/riscv/riscv-cmo.def: Added built-in function for clmulr.
* config/riscv/crypto.md: Move clmul[h] instructions to bitmanip.md.
* config/riscv/riscv-scalar-crypto.def: Move clmul[h] built-in
functions to riscv-cmo.def.
* config/riscv/generic.md: Add clmul to list of instructions
using the generic_imul reservation.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbc32.c: New test.
* gcc.target/riscv/zbc64.c: New test.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 6617876bb0b..a27fc3e34a1 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -719,3 +719,32 @@ (define_insn_and_split 
"*branch_mask_twobits_equals_singlebit"
operands[8] = GEN_INT (setbit);
operands[9] = GEN_INT (clearbit);
 })
+
+;; ZBKC or ZBC extension
+(define_insn "riscv_clmul_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")
+  (match_operand:X 2 "register_operand" "r")]
+  UNSPEC_CLMUL))]
+  "TARGET_ZBKC || TARGET_ZBC"
+  "clmul\t%0,%1,%2"
+  [(set_attr "type" "clmul")])
+
+(define_insn "riscv_clmulh_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")
+  (match_operand:X 2 "register_operand" "r")]
+  UNSPEC_CLMULH))]
+  "TARGET_ZBKC || TARGET_ZBC"
+  "clmulh\t%0,%1,%2"
+  [(set_attr "type" "clmul")])
+
+;; ZBC extension
+(define_insn "riscv_clmulr_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")
+  (match_operand:X 2 "register_operand" "r")]
+  UNSPEC_CLMULR))]
+  "TARGET_ZBC"
+  "clmulr\t%0,%1,%2"
+  [(set_attr "type" "clmul")])
diff --git a/gcc/config/riscv/crypto.md b/gcc/config/riscv/crypto.md
index 777aa529005..e4b7f0190df 100644
--- a/gcc/config/riscv/crypto.md
+++ b/gcc/config/riscv/crypto.md
@@ -26,10 +26,6 @@ (define_c_enum "unspec" [
 UNSPEC_PACKH
 UNSPEC_PACKW
 
-;; Zbkc unspecs
-UNSPEC_CLMUL
-UNSPEC_CLMULH
-
 ;; Zbkx unspecs
 UNSPEC_XPERM8
 UNSPEC_XPERM4
@@ -126,26 +122,6 @@ (define_insn "riscv_packw"
   "packw\t%0,%1,%2"
   [(set_attr "type" "crypto")])
 
-;; ZBKC extension
-
-(define_insn "riscv_clmul_"
-  [(set (match_operand:X 0 "register_operand" "=r")
-(unspec:X [(match_operand:X 1 "register_operand" "r")
-  (match_operand:X 2 "register_operand" "r")]
-  UNSPEC_CLMUL))]
-  "TARGET_ZBKC"
-  "clmul\t%0,%1,%2"
-  [(set_attr "type" "crypto")])
-
-(define_insn "riscv_clmulh_"
-  [(set (match_operand:X 0 "register_operand" "=r")
-(unspec:X [(match_operand:X 1 "register_operand" "r")
-  (match_operand:X 2 "register_operand" "r")]
-  UNSPEC_CLMULH))]
-  "TARGET_ZBKC"
-  "clmulh\t%0,%1,%2"
-  [(set_attr "type" "crypto")])
-
 ;; ZBKX extension
 
 (define_insn "riscv_xperm4_"
diff 

Re: [PATCH v5 00/11] RISC-V: Implement ISA Manual Table A.6 Mappings

2023-04-28 Thread Hans Boehm via Gcc-patches
We're certainly pushing for the same ABI (A.6 + trailing fence on store) in
LLVM as well. I'm about to upload a pull request for the psABI document
that describes this version of the ABI, and a bit of the rationale for it.
I'll attach the current draft here.

I agree that compatibility is critical here, not just across llvm and gcc,
but also with other language implementations. That's part of the reason to
get this correct asap.

I believe that standardizing on A.6 + trailing fence on store, though
initially suboptimal, is by far the best bet to get us to an efficient ABI
in the long term. I expect the A.7 ABI to perform well. A.6, even without
the trailing store fence, has annoyingly expensive seq_cst loads, which I
would really like to get away from.

Hans







On Fri, Apr 28, 2023 at 10:44 AM Patrick O'Neill 
wrote:

> On 4/28/23 09:29, Palmer Dabbelt wrote:
> > On Fri, 28 Apr 2023 09:14:00 PDT (-0700), jeffreya...@gmail.com wrote:
> >>
> >>
> >> On 4/27/23 10:22, Patrick O'Neill wrote:
> >>> This patchset aims to make the RISCV atomics implementation stronger
> >>> than the recommended mapping present in table A.6 of the ISA manual.
> >>>
> https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157
> >>>
> >>>
> >>> Context
> >>> -
> >>> GCC defined RISC-V mappings [1] before the Memory Model task group
> >>> finalized their work and provided the ISA Manual Table A.6/A.7
> >>> mappings[2].
> >>>
> >>> For at least a year now, we've known that the mappings were different,
> >>> but it wasn't clear if these unique mappings had correctness issues.
> >>>
> >>> Andrea Parri found an issue with the GCC mappings, showing that
> >>> atomic_compare_exchange_weak_explicit(-,-,-,release,relaxed)
> >>> mappings do
> >>> not enforce release ordering guarantees. (Meaning the GCC mappings have
> >>> a correctness issue).
> >>> https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/
> >> Right.  I recall this discussion, but thanks for the back reference.
> >
> > Yep, and it's an important one: that's why we're calling the change a
> > bug fix and dropping the current GCC mappings.  If we didn't have the
> > bug we'd be talking about an ABI break, and since the GCC mappings
> > predate the ISA mappings we'd likely need an additional compatibility
> > mode.
> >
> > So I guess we're lucky that we have a concurrency bug.  I think it's
> > the first time I've said that ;)
> >
> >>> Why not A.6?
> >>> -
> >>> We can update our mappings now, so the obvious choice would be to
> >>> implement Table A.6 (what LLVM implements/ISA manual recommends).
> >>>
> >>> The reason why that isn't the best path forward for GCC is due to a
> >>> proposal by Hans Boehm to add L{d|w|b|h}.aq/rl and S{d|w|b|h}.aq/rl.
> >>>
> >>> For context, there is discussion about fast-tracking the addition of
> >>> these instructions. The RISCV architectural review committee supports
> >>> adopting a "new and common atomics ABI for gcc and LLVM toochains ...
> >>> that assumes the addition of the preceding instructions”. That common
> >>> ABI is likely to be A.7.
> >>>https://lists.riscv.org/g/tech-privileged/message/1284
> >>>
> >>> Transitioning from A.6 to A.7 will cause an ABI break. We can hedge
> >>> against that risk by emitting a conservative fence after SEQ_CST stores
> >>> to make the mapping compatible with both A.6 and A.7.
> >> So I like that we can have compatible sequences across A.6 and A.7.  Of
> >> course the concern is performance ;-)
> >>
> >>
> >>>
> >>> What does a mapping compatible with both A.6 & A.7 look like?
> >>> -
> >>> It is exactly the same as Table A.6, but SEQ_CST stores have a trailing
> >>> fence rw,rw. It's strictly stronger than Table A.6.
> >> Right.  So my worry here is silicon that is either already available or
> >> coming online shortly.   Those implementations simply aren't going to be
> >> able to use the A.7 mapping, so they pay a penalty.  Does it make sense
> >> to have the compatibility fences conditional?
> >
> > IIRC this was discussed somewhere in some thread, but I think there's
> > really three ABIs that could be implemented here (ignoring the current
> > GCC mappings as they're broken):
> >
> > * ABI compatible with the current mappings in the ISA manual (A.6).
> >  This will presumably perform best on extant hardware, given that it's
> >  what the words in the PDF say to do.
> > * ABI compatible with the proposed mappings for the ISA manual (A.7).
> >  This may perform better on new hardware.
> > * ABI compatible with both A.6 and A.7.  This is likely slow on both
> > new  and old hardware, but allows cross-linking.  If there's no
> > performance  issues this would be the only mode we need, but that
> > seems unlikely.
> >
> > IMO those should be encoded somewhere in the ELF.  I'd just do it as
> > two bits in the header, but last time I proposed header bits the psABI
> > folks wanted to do something 

Re: [committed] Fortran: Fix (mostly) comment typos

2023-04-28 Thread Bernhard Reutner-Fischer via Gcc-patches
On 28 April 2023 09:26:06 CEST, Tobias Burnus  wrote:
>Committed as r14-319-g7ebd4a1d61993c0a75e9ff3098aded21ef04a4da

 >  Only other changes are fixing the variable name a(b)breviated_modproc_decl

I think this is not good, I've mentioned it somewhere, i think, but I'll rename 
it.
thanks!


Re: libsanitizer: sync from master

2023-04-28 Thread Bernhard Reutner-Fischer via Gcc-patches
On 28 April 2023 11:23:55 CEST, Florian Weimer via Fortran 
 wrote:
>* Martin Liška:

>But that's okay for me as well.

Even better.


Re: [PATCH v5 00/11] RISC-V: Implement ISA Manual Table A.6 Mappings

2023-04-28 Thread Patrick O'Neill



On 4/28/23 10:44, Patrick O'Neill wrote:

On 4/28/23 09:29, Palmer Dabbelt wrote:

On Fri, 28 Apr 2023 09:14:00 PDT (-0700), jeffreya...@gmail.com wrote:

On 4/27/23 10:22, Patrick O'Neill wrote:

...

LLVM mapping notes

LLVM emits corresponding fences for atomic_signal_fence instructions.
This seems to be an oversight since AFAIK atomic_signal_fence acts 
as a
compiler directive. GCC does not emit any fences for 
atomic_signal_fence

instructions.

This starts to touch on a larger concern.  Specifically I'd really like
the two compilers to be compatible in terms of the code they generate
for the various atomics.

What I worry about is code being written (by design or accident) 
that is

dependent on the particular behavior of one compiler and then if that
code gets built with the other compiler, and we end up different
behavior.  Worse yet, if/when this happens, it's likely to be tough to
expose, reproduce & debug.

Agreed.

I'll open an issue with LLVM and see what they have to say about this
particular behavior. Ideally we'd have perfectly compatible compilers
(for atomic ops) by the end of this :)

AFAICT GCC hasn't ever been emitting fences for these instructions.
(& This behavior isn't touched by the patchset).

I re-ran the set of tests I was using and couldn't replicate LLVM's
behavior that was noted here. I think I mixed had up atomic_thread_fence
with atomic_signal_fence at some point.

That was the only difference I could find during my testing of this
patchset (other than the strengthened SEQ_CST store), so I think
GCC/LLVM atomics will be fully compatible once this patchset is applied.


Re: [PATCH v5 11/11] RISC-V: Table A.6 conformance tests

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/27/23 10:23, Patrick O'Neill wrote:

These tests cover basic cases to ensure the atomic mappings follow the
strengthened Table A.6 mappings that are compatible with Table A.7.

2023-04-27 Patrick O'Neill 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-1.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-2.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-3.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-4.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-5.c: New test.
* gcc.target/riscv/amo-table-a-6-load-1.c: New test.
* gcc.target/riscv/amo-table-a-6-load-2.c: New test.
* gcc.target/riscv/amo-table-a-6-load-3.c: New test.
* gcc.target/riscv/amo-table-a-6-store-1.c: New test.
* gcc.target/riscv/amo-table-a-6-store-2.c: New test.
* gcc.target/riscv/amo-table-a-6-store-compat-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: New test.
OK.  And as mentioned earlier, there is a framework where you can verify 
ordering as well.  Your call whether or not you want to switch to that 
form.  If you do choose to checking ordering, consider that patch 
pre-approved.


Jeff


Re: [PATCH v5 10/11] RISC-V: Weaken atomic loads

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/27/23 10:23, Patrick O'Neill wrote:

This change brings atomic loads in line with table A.6 of the ISA
manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md (atomic_load): Implement atomic
load mapping.

OK.
jeff


Re: [PATCH v5 09/11] RISC-V: Weaken mem_thread_fence

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/27/23 10:22, Patrick O'Neill wrote:

This change brings atomic fences in line with table A.6 of the ISA
manual.

Relax mem_thread_fence according to the memmodel given.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md (mem_thread_fence_1): Change fence
depending on the given memory model.

OK
jeff


Re: [PATCH v5 08/11] RISC-V: Weaken LR/SC pairs

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/27/23 10:22, Patrick O'Neill wrote:

Introduce the %I and %J flags for setting the .aqrl bits on LR/SC pairs
as needed.

Atomic compare and exchange ops provide success and failure memory
models. C++17 and later place no restrictions on the relative strength
of each model, so ensure we cover both by using a model that enforces
the ordering of both given models.

This change brings LR/SC ops in line with table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_union_memmodels): Expose
riscv_union_memmodels function to sync.md.
* config/riscv/riscv.cc (riscv_union_memmodels): Add function to
get the union of two memmodels in sync.md.
(riscv_print_operand): Add %I and %J flags that output the
optimal LR/SC flag bits for a given memory model.
* config/riscv/sync.md: Remove static .aqrl bits on LR op/.rl
bits on SC op and replace with optimized %I, %J flags.

OK.

Note for the future.  Operands don't have to appear in-order in a 
define_insn.  So the kind of reordering you did here may not have been 
strictly necessary.   As you found out, when you renumber the operands, 
you have to adjust the assembly template, which can be error prone. 
Knowing that I checked them pretty closely and they look right to me.




Jeff




Signed-off-by: Patrick O'Neill 
---
v3 Changelog:
* Consolidate tests in [PATCH v3 10/10]
---
v5 Changelog:
* Also optimize subword LR/SC ops based on given memory model.
---
  gcc/config/riscv/riscv-protos.h |   3 +
  gcc/config/riscv/riscv.cc   |  44 
  gcc/config/riscv/sync.md| 114 +++-
  3 files changed, 114 insertions(+), 47 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index f87661bde2c..5fa9e1122ab 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
  #ifndef GCC_RISCV_PROTOS_H
  #define GCC_RISCV_PROTOS_H
  
+#include "memmodel.h"

+
  /* Symbol types we understand.  The order of this list must match that of
 the unspec enum in riscv.md, subsequent to UNSPEC_ADDRESS_FIRST.  */
  enum riscv_symbol_type {
@@ -81,6 +83,7 @@ extern bool riscv_v_ext_vector_mode_p (machine_mode);
  extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
  extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
  extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
+extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel);
  
  /* Routines implemented in riscv-c.cc.  */

  void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9eba03ac189..69e9b2aa548 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4289,6 +4289,36 @@ riscv_print_operand_reloc (FILE *file, rtx op, bool 
hi_reloc)
fputc (')', file);
  }
  
+/* Return the memory model that encapuslates both given models.  */

+
+enum memmodel
+riscv_union_memmodels (enum memmodel model1, enum memmodel model2)
+{
+  model1 = memmodel_base (model1);
+  model2 = memmodel_base (model2);
+
+  enum memmodel weaker = model1 <= model2 ? model1: model2;
+  enum memmodel stronger = model1 > model2 ? model1: model2;
+
+  switch (stronger)
+{
+  case MEMMODEL_SEQ_CST:
+  case MEMMODEL_ACQ_REL:
+   return stronger;
+  case MEMMODEL_RELEASE:
+   if (weaker == MEMMODEL_ACQUIRE || weaker == MEMMODEL_CONSUME)
+ return MEMMODEL_ACQ_REL;
+   else
+ return stronger;
+  case MEMMODEL_ACQUIRE:
+  case MEMMODEL_CONSUME:
+  case MEMMODEL_RELAXED:
+   return stronger;
+  default:
+   gcc_unreachable ();
+}
+}
+
  /* Return true if the .AQ suffix should be added to an AMO to implement the
 acquire portion of memory model MODEL.  */
  
@@ -4342,6 +4372,8 @@ riscv_memmodel_needs_amo_release (enum memmodel model)

 'R'Print the low-part relocation associated with OP.
 'C'Print the integer branch condition for comparison OP.
 'A'Print the atomic operation suffix for memory model OP.
+   'I' Print the LR suffix for memory model OP.
+   'J' Print the SC suffix for memory model OP.
 'z'Print x0 if OP is zero, otherwise print OP normally.
 'i'Print i if the operand is not a register.
 'S'Print shift-index of single-bit mask OP.
@@ -4511,6 +4543,18 @@ riscv_print_operand (FILE *file, rtx op, int letter)
fputs (".rl", file);
break;
  
+case 'I':

+  if (model == MEMMODEL_SEQ_CST)
+   fputs (".aqrl", file);
+  else if (riscv_memmodel_needs_amo_acquire (model))
+   fputs (".aq", file);
+  break;
+
+case 'J':
+  if (riscv_memmodel_needs_amo_release (model))
+   fputs (".rl", file);
+  break;
+
  case 'i':
if (code != REG)

[PATCH] add glibc-stdint.h to vax and lm32 linux target (PR target/105525)

2023-04-28 Thread Mikael Pettersson via Gcc-patches
PR target/105525 is a build regression for the vax and lm32 linux
targets present in gcc-12/13/head, where the builds fail due to
unsatisfied references to __INTPTR_TYPE__ and __UINTPTR_TYPE__,
caused by these two targets failing to provide glibc-stdint.h.

Fixed thusly, tested by building crosses, which now succeeds.

Ok for trunk? (Note I don't have commit rights.)

2023-04-28  Mikael Pettersson  

PR target/105525
* config.gcc (vax-*-linux*): Add glibc-stdint.h.
(lm32-*-uclinux*): Likewise.
---
 gcc/config.gcc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6fd1594480a..671c7e3b018 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2287,7 +2287,7 @@ lm32-*-rtems*)
tmake_file="${tmake_file} lm32/t-rtems"
  ;;
 lm32-*-uclinux*)
-tm_file="elfos.h ${tm_file} gnu-user.h linux.h lm32/uclinux-elf.h"
+tm_file="elfos.h ${tm_file} gnu-user.h linux.h glibc-stdint.h 
lm32/uclinux-elf.h"
tmake_file="${tmake_file} lm32/t-lm32"
 ;;
 m32r-*-elf*)
@@ -3488,7 +3488,7 @@ v850*-*-*)
use_gcc_stdint=wrap
;;
 vax-*-linux*)
-   tm_file="${tm_file} elfos.h gnu-user.h linux.h vax/elf.h vax/linux.h"
+   tm_file="${tm_file} elfos.h gnu-user.h linux.h glibc-stdint.h vax/elf.h 
vax/linux.h"
extra_options="${extra_options} vax/elf.opt"
;;
 vax-*-netbsdelf*)
-- 
2.40.0



Re: [PATCH v5 00/11] RISC-V: Implement ISA Manual Table A.6 Mappings

2023-04-28 Thread Patrick O'Neill

On 4/28/23 09:29, Palmer Dabbelt wrote:

On Fri, 28 Apr 2023 09:14:00 PDT (-0700), jeffreya...@gmail.com wrote:



On 4/27/23 10:22, Patrick O'Neill wrote:

This patchset aims to make the RISCV atomics implementation stronger
than the recommended mapping present in table A.6 of the ISA manual.
https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157 



Context
-
GCC defined RISC-V mappings [1] before the Memory Model task group
finalized their work and provided the ISA Manual Table A.6/A.7 
mappings[2].


For at least a year now, we've known that the mappings were different,
but it wasn't clear if these unique mappings had correctness issues.

Andrea Parri found an issue with the GCC mappings, showing that
atomic_compare_exchange_weak_explicit(-,-,-,release,relaxed) 
mappings do

not enforce release ordering guarantees. (Meaning the GCC mappings have
a correctness issue).
https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/

Right.  I recall this discussion, but thanks for the back reference.


Yep, and it's an important one: that's why we're calling the change a 
bug fix and dropping the current GCC mappings.  If we didn't have the 
bug we'd be talking about an ABI break, and since the GCC mappings 
predate the ISA mappings we'd likely need an additional compatibility 
mode.


So I guess we're lucky that we have a concurrency bug.  I think it's 
the first time I've said that ;)



Why not A.6?
-
We can update our mappings now, so the obvious choice would be to
implement Table A.6 (what LLVM implements/ISA manual recommends).

The reason why that isn't the best path forward for GCC is due to a
proposal by Hans Boehm to add L{d|w|b|h}.aq/rl and S{d|w|b|h}.aq/rl.

For context, there is discussion about fast-tracking the addition of
these instructions. The RISCV architectural review committee supports
adopting a "new and common atomics ABI for gcc and LLVM toochains ...
that assumes the addition of the preceding instructions”. That common
ABI is likely to be A.7.
   https://lists.riscv.org/g/tech-privileged/message/1284

Transitioning from A.6 to A.7 will cause an ABI break. We can hedge
against that risk by emitting a conservative fence after SEQ_CST stores
to make the mapping compatible with both A.6 and A.7.

So I like that we can have compatible sequences across A.6 and A.7.  Of
course the concern is performance ;-)




What does a mapping compatible with both A.6 & A.7 look like?
-
It is exactly the same as Table A.6, but SEQ_CST stores have a trailing
fence rw,rw. It's strictly stronger than Table A.6.

Right.  So my worry here is silicon that is either already available or
coming online shortly.   Those implementations simply aren't going to be
able to use the A.7 mapping, so they pay a penalty.  Does it make sense
to have the compatibility fences conditional?


IIRC this was discussed somewhere in some thread, but I think there's 
really three ABIs that could be implemented here (ignoring the current 
GCC mappings as they're broken):


* ABI compatible with the current mappings in the ISA manual (A.6).  
 This will presumably perform best on extant hardware, given that it's 
 what the words in the PDF say to do.
* ABI compatible with the proposed mappings for the ISA manual (A.7).  
 This may perform better on new hardware.
* ABI compatible with both A.6 and A.7.  This is likely slow on both 
new  and old hardware, but allows cross-linking.  If there's no 
performance  issues this would be the only mode we need, but that 
seems unlikely.


IMO those should be encoded somewhere in the ELF.  I'd just do it as 
two bits in the header, but last time I proposed header bits the psABI 
folks wanted to do something different.  I don't think where we encode 
this matters all that much, but if we're doing to treat these as real 
long-term ABIs we should have some way to encode that.


There's also the orthogonal axis of whether we use the new 
instructions.  Those aren't in specs yet so I think we can hold off on 
them for a bit, but they're the whole point of doing the ABI break so 
we should at least think them over.  I think we're OK because we've 
just split out the ABI from the ISA here, but I'm not sure if I'm 
missing something.


Now that I wrote that, though, I remember talking to Patrick about it 
and we drew a bunch of stuff on the whiteboard and then got confused.  
So sorry if I'm just out of the loop here...

This looks up-to-date with how I understand it.




Benchmark Interpretation

As expected, out of order machines are significantly faster with the
REL_STORE mappings. Unexpectedly, the in-order machines are
significantly slower with REL_STORE rather than REL_STORE_FENCE.

Yea, that's a bit of a surprise.



Most machines in the wild are expected to use Table A.7 once the
instructions are introduced.
Incurring this added cost now will make it easier for compiled RISC-V
binaries to 

Re: [PATCH v5 06/11] RISC-V: Strengthen atomic stores

2023-04-28 Thread Palmer Dabbelt

On Fri, 28 Apr 2023 10:40:15 PDT (-0700), jeffreya...@gmail.com wrote:



On 4/27/23 10:22, Patrick O'Neill wrote:

This change makes atomic stores strictly stronger than table A.6 of the
ISA manual. This mapping makes the overall patchset compatible with
table A.7 as well.

2023-04-27 Patrick O'Neill 

PR 89835

Should be "PR target/89835"



gcc/ChangeLog:

* config/riscv/sync.md:

Needs some text here :-)


I'm not objecting to this patch, but I think we've got an option
question about whether or not this approach is too expensive for
existing or soon arriving implementations.

If the decision on that topic is to just pay the cost, then this patch
is fine.  If we decide to make compatibility optional to avoid the
additional cost, then this will need suitable adjustments.


IMO the only hardware that's going to be here by gcc-14 and to have 
enough concurrency for these to matter is the Ventana stuff.  I think 
you're the only one who can figure out if these are slow, at least until 
that stuff is availiable outside the lab.


So are they too slow for you?



Jeff


Re: [PATCH v5 07/11] RISC-V: Eliminate AMO op fences

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/27/23 10:22, Patrick O'Neill wrote:

Atomic operations with the appropriate bits set already enfore release
semantics. Remove unnecessary release fences from atomic ops.

This change brings AMO ops in line with table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_memmodel_needs_amo_release): Change function name.
(riscv_print_operand): Remove unneeded %F case.
* config/riscv/sync.md: Remove unneeded fences.
OK.  Though note this depends on a resolution of patch #6.  You could 
potentially leave the %F support in riscv_print_operand and install the 
rest of this patch while we settle the question around #6.


Jeff


Re: [PATCH v5 06/11] RISC-V: Strengthen atomic stores

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/27/23 10:22, Patrick O'Neill wrote:

This change makes atomic stores strictly stronger than table A.6 of the
ISA manual. This mapping makes the overall patchset compatible with
table A.7 as well.

2023-04-27 Patrick O'Neill 

PR 89835

Should be "PR target/89835"



gcc/ChangeLog:

* config/riscv/sync.md:

Needs some text here :-)


I'm not objecting to this patch, but I think we've got an option 
question about whether or not this approach is too expensive for 
existing or soon arriving implementations.


If the decision on that topic is to just pay the cost, then this patch 
is fine.  If we decide to make compatibility optional to avoid the 
additional cost, then this will need suitable adjustments.


Jeff



Re: [PATCH v5 05/11] RISC-V: Add AMO release bits

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/27/23 10:22, Patrick O'Neill wrote:

This patch sets the relevant .rl bits on amo operations.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): change behavior
of %A to include release bits.

Capitalize "change" in the ChangeLog entry.  OK with that nit fixed.

jeff


Re: [PATCHv2] openmp: Add support for 'present' modifier

2023-04-28 Thread Tobias Burnus

Hi Kwok,

On 17.02.23 12:45, Kwok Cheung Yeung wrote:

This is a revised version of the patch for the 'present' modifier for
OpenMP. Compared to the first version, three improvements have been made:

- A bug which caused bootstrapping with a '-m32' multilib on x86-64 to
fail due to pointer size issues has been fixed.
- The Fortran parse tree dump now shows clauses with 'present' applied.
- The reordering of OpenMP clauses has been moved to
gimplify_scan_omp_clauses, where the other clause reordering rules are
applied.


thanks for the patch; I have a bunch of smaller review comments, requiring small
code changes or more tedious but still simple changes.

Namely:

In the ChangeLog:

 (c_parser_omp_target_data): Allow map clauses with 'present'
 modifiers.
 (c_parser_omp_target_enter_data): Likewise.
 (c_parser_omp_target_exit_data): Likewise.
 (c_parser_omp_target): Likewise.


Those be combined; a separate entry is only required per file not per
function name.


+ if (kind == OMP_CLAUSE_FROM || kind == OMP_CLAUSE_TO)
+   OMP_CLAUSE_SET_MOTION_MODIFIER (u, OMP_CLAUSE_MOTION_NONE);


This should not be needed as 'build_omp_clause' memset '\0' the data and
OMP_CLAUSE_MOTION_NONE == 0 (as it should).

However, as you really only have two values, denoting that the modifier has
been specified or not, you should really use an available existing flag. For 
instance,
other code uses base.deprecated_flag – which could also be used here.

Macro wise, this would then permit to use:
  OMP_CLAUSE_MOTION_PRESENT (node) = 1;
or
  OMP_CLAUSE_TO_PRESENT (node) = 1;
  OMP_CLAUSE_FROM_PRESENT (node) = 1;
and 'if (OMP_... (node))' which is shorter and is IMHO to be also more readable.

* * *

I think c_parser_omp_var_list_parens / cp_parser_omp_var_list / 
gfc_match_omp_variable_list
should not be modified.

For C/C++, you just could do the '(' + {'present', ':' } parsing before the 
call to
  c_parser_omp_variable_list / cp_parser_omp_var_list_no_open
and then loop over 'list' after the call - and similarly for Fortran.

And besides not cluttering a generic function, we will also soon add support for
'mapper' (→ Julian's patch set adds generic mapper support) and 'iterator' is 
also
missing. And we really do not want those in the generic function!


+   kind = always_present_modifier ? GOMP_MAP_ALWAYS_PRESENT_FROM
+  : present_modifier ? GOMP_MAP_PRESENT_FROM
+  : always_modifier ? GOMP_MAP_ALWAYS_FROM
+  : GOMP_MAP_FROM;


Can you wrap the RHS in parenthesis, i.e. 'kind = (' ... ');' to aid some
editors in terms of indenting. (I assume 'emacs' is that editor, which I don't 
use.)


+   tkind
+ = OMP_CLAUSE_MOTION_MODIFIER (c) == OMP_CLAUSE_MOTION_PRESENT
+   ? GOMP_MAP_PRESENT_TO : GOMP_MAP_TO;


Likewise.


* * *


@@ -1358,6 +1371,7 @@ typedef struct gfc_omp_namelist
   ENUM_BITFIELD (gfc_omp_linear_op) op:4;
   bool old_modifier;
 } linear;
+  gfc_omp_motion_modifier motion_modifier;
struct gfc_common_head *common;
bool lastprivate_conditional;
  } u;



I think a 'bool present;' would do here. Can you additionally move the
pointers first and then the bitfields/enums later? That way,
less space is wasted by padding and we might even save space despite
adding another variable.


@@ -2893,20 +2912,38 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const 
omp_mask mask,
   if (close_modifier++ == 1)
 second_close_locus = current_locus;
 }
+ else if (gfc_match ("present ") == MATCH_YES)
+   {
+ if (present_modifier++ == 1)
+   second_close_locus = current_locus;
+   }
...


This code is broken in terms of error diagnostic. You need to handle this
differently to get diagostic for:
  'map(present, present : var)'

Can you fix the code + add a testcase (e.g. by augmenting the 'always' testcase
files testsuite/gfortran.dg/gomp/map-{7,8}.f90).



+   gomp_fatal ("present clause: !omp_target_is_present "


I personally find the error a bit unclear. How about something more explicit
like: 'present clause: not present on the device' - or something like that?


+/* { dg-do run { target offload_target_any } } */



This needs to be '{ target offload_device }' - i.e. the default device needs to 
be
an offload device (!= the initial device).

(Side note: We may need to consider whether offload_device_nonshared_as might 
make sense, but
the current omp_target_is_present and omp_target_(dis)associate_ptr implies that
we will still go though this route with USM for explicitly mapped variables
(but do not do any actual mapping in that case).
But that we can handle, once the USM code gets merged and we get FAILs.)

Tobias

-
Siemens Electronic Design Automation GmbH; 

Re: [PATCH v5 04/11] RISC-V: Enforce atomic compare_exchange SEQ_CST

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/27/23 10:22, Patrick O'Neill wrote:

This patch enforces SEQ_CST for atomic compare_exchange ops.

Replace Fence/LR.aq/SC.aq pairs with SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md: Change FENCE/LR.aq/SC.aq into
sequentially consistent LR.aqrl/SC.rl pair.
OK.  Note that generally you should note which pattern you're changing 
in a ChangeLog entry, similar to how we note the function being changed. 
 So something like this might be better:


* config/riscv/sync.md (atomic_cas_value_strong): ...

Jeff


[PATCH 0/2] Porting of builtin_zero_pattern to match

2023-04-28 Thread Andrew Pinski via Gcc-patches
These two patches implement the base support of builtin_zero_pattern
into match.pd. To implement the other part requires match-and-simplify
inside phiopt to support moving 2 statements from the middle-bb. The
match.pd part is already incldued. I will try to get to it next week. 

Also __builtin_clrsb has not been moved yet either and I will get to
that next week as well.

Andrew Pinski (2):
  PHIOPT: Allow moving of some builtin calls
  MATCH: add some of what phiopt's builtin_zero_pattern did

 gcc/match.pd   | 41 +++--
 gcc/tree-ssa-phiopt.cc | 35 +++
 2 files changed, 70 insertions(+), 6 deletions(-)

-- 
2.39.1



[PATCH 2/2] MATCH: add some of what phiopt's builtin_zero_pattern did

2023-04-28 Thread Andrew Pinski via Gcc-patches
This adds the patterns for
POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
For "a != 0 ? FUNC(a) : CST".
CLRSB, CLRSBL, and CLRSBLL will be moved next.

Note this is not enough to remove
cond_removal_in_builtin_zero_pattern as we need to handle
the case where there is an NOP_CONVERT inside the conditional
to move out of the condition inside match_simplify_replacement.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd: Add patterns for "a != 0 ? FUNC(a) : CST"
for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
---
 gcc/match.pd | 41 +++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index e17597ead26..0e782cde71d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -52,6 +52,8 @@ along with GCC; see the file COPYING3.  If not see
   gt   ge   eq ne le   lt   unordered ordered   ungt unge unlt unle uneq ltgt)
 (define_operator_list simple_comparison lt   le   eq ne ge   gt)
 (define_operator_list swapped_simple_comparison gt   ge   eq ne le   lt)
+(define_operator_list BSWAP BUILT_IN_BSWAP16 BUILT_IN_BSWAP32
+   BUILT_IN_BSWAP64 BUILT_IN_BSWAP128)
 
 #include "cfn-operators.pd"
 
@@ -4313,8 +4315,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (complex (convert:itype @0) (negate (convert:itype @1)
 
 /* BSWAP simplifications, transforms checked by gcc.dg/builtin-bswap-8.c.  */
-(for bswap (BUILT_IN_BSWAP16 BUILT_IN_BSWAP32
-   BUILT_IN_BSWAP64 BUILT_IN_BSWAP128)
+(for bswap (BSWAP)
  (simplify
   (bswap (bswap @0))
   @0)
@@ -7780,6 +7781,42 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (bit_xor (PARITY:s @0) (PARITY:s @1))
   (PARITY (bit_xor @0 @1)))
 
+/* a != 0 ? FUN(a) : 0 -> Fun(a) for some builtin functions. */
+(for func (POPCOUNT BSWAP FFS PARITY)
+ (simplify
+  (cond (ne @0 integer_zerop@1) (func@4 (convert? @2)) integer_zerop@3)
+  @4))
+
+#if GIMPLE
+/* a != 0 ? CLZ(a) : CST -> .CLZ(a) where CST is the result of the internal 
function for 0. */
+(for func (CLZ)
+ (simplify
+  (cond (ne @0 integer_zerop@1) (func (convert?@4 @2)) INTEGER_CST@3)
+  (with { int val;
+ internal_fn ifn = IFN_LAST;
+ if (direct_internal_fn_supported_p (IFN_CLZ, type, OPTIMIZE_FOR_BOTH)
+ && CLZ_DEFINED_VALUE_AT_ZERO (SCALAR_INT_TYPE_MODE (type),
+   val) == 2)
+   ifn = IFN_CLZ;
+   }
+   (if (ifn == IFN_CLZ && wi::to_widest (@3) == val)
+(IFN_CLZ @4)
+
+/* a != 0 ? CTZ(a) : CST -> .CTZ(a) where CST is the result of the internal 
function for 0. */
+(for func (CTZ)
+ (simplify
+  (cond (ne @0 integer_zerop@1) (func (convert?@4 @2)) INTEGER_CST@3)
+  (with { int val;
+ internal_fn ifn = IFN_LAST;
+ if (direct_internal_fn_supported_p (IFN_CTZ, type, OPTIMIZE_FOR_BOTH)
+ && CTZ_DEFINED_VALUE_AT_ZERO (SCALAR_INT_TYPE_MODE (type),
+   val) == 2)
+   ifn = IFN_CTZ;
+   }
+   (if (ifn == IFN_CTZ && wi::to_widest (@3) == val)
+(IFN_CTZ @4)
+#endif
+
 /* Common POPCOUNT/PARITY simplifications.  */
 /* popcount(X) is (X>>C2)&1 when C1 == 1<

[PATCH 1/2] PHIOPT: Allow moving of some builtin calls

2023-04-28 Thread Andrew Pinski via Gcc-patches
While moving working on moving
cond_removal_in_builtin_zero_pattern to match, I noticed
that functions were not allowed to move as we reject all
non-assignments.
This changes to allowing a few calls which are known not
to throw/trap. Right now it is restricted to ones
which cond_removal_in_builtin_zero_pattern handles but
adding more is just adding it to the switch statement.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p):
Allow some builtin/internal function calls which
are known not to trap/throw.
(phiopt_worker::match_simplify_replacement):
Use name instead of getting the lhs again.
---
 gcc/tree-ssa-phiopt.cc | 35 +++
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 4b43f1abdbc..024a4362093 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -548,6 +548,7 @@ empty_bb_or_one_feeding_into_p (basic_block bb,
 {
   stmt = nullptr;
   gimple *stmt_to_move = nullptr;
+  tree lhs;
 
   if (empty_block_p (bb))
 return true;
@@ -592,17 +593,43 @@ empty_bb_or_one_feeding_into_p (basic_block bb,
   if (gimple_uses_undefined_value_p (stmt_to_move))
 return false;
 
-  /* Allow assignments and not no calls.
+  /* Allow assignments but allow some builtin/internal calls.
  As const calls don't match any of the above, yet they could
  still have some side-effects - they could contain
  gimple_could_trap_p statements, like floating point
  exceptions or integer division by zero.  See PR70586.
  FIXME: perhaps gimple_has_side_effects or gimple_could_trap_p
- should handle this.  */
+ should handle this.
+ Allow some known builtin/internal calls that are known not to
+ trap: logical functions (e.g. bswap and bit counting). */
   if (!is_gimple_assign (stmt_to_move))
-return false;
+{
+  if (!is_gimple_call (stmt_to_move))
+   return false;
+  combined_fn cfn = gimple_call_combined_fn (stmt_to_move);
+  switch (cfn)
+   {
+   default:
+ return false;
+   case CFN_BUILT_IN_BSWAP16:
+   case CFN_BUILT_IN_BSWAP32:
+   case CFN_BUILT_IN_BSWAP64:
+   case CFN_BUILT_IN_BSWAP128:
+   CASE_CFN_FFS:
+   CASE_CFN_PARITY:
+   CASE_CFN_POPCOUNT:
+   CASE_CFN_CLZ:
+   CASE_CFN_CTZ:
+   case CFN_BUILT_IN_CLRSB:
+   case CFN_BUILT_IN_CLRSBL:
+   case CFN_BUILT_IN_CLRSBLL:
+ lhs = gimple_call_lhs (stmt_to_move);
+ break;
+   }
+}
+  else
+lhs = gimple_assign_lhs (stmt_to_move);
 
-  tree lhs = gimple_assign_lhs (stmt_to_move);
   gimple *use_stmt;
   use_operand_p use_p;
 
-- 
2.39.1



RE: [PATCH 10/10] arm testsuite: Shifts and get_FPSCR ACLE optimisation fixes

2023-04-28 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Friday, April 28, 2023 12:30 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Stam Markianos-Wright  wri...@arm.com>
> Subject: [PATCH 10/10] arm testsuite: Shifts and get_FPSCR ACLE optimisation
> fixes
> 
> From: Stam Markianos-Wright 
> 
> These newly updated tests were rewritten by Andrea. Some of them
> needed further manual fixing as follows:
> 
> * The #shift immediate value not in the check-function-bodies as expected
> * Some shifts getting optimised to mov immediates, e.g.
>   `uqshll (1, 1);` -> movsr0, #2; movsr1, #0

Shouldn't this test be testing something that cannot be constant-folded away? 
i.e. have non-constant arguments?
I think we should have conformance tests first and foremost, and follow-up 
tests for such optimisations should be (welcome) added separately.

> * The ACLE was specifying sub-optimal code: lsr+and instead of ubfx. In
>   this case the test rewritten from the ACLE had the lsr+and pattern,
>   but the compiler was able to optimise to ubfx. Hence I've changed the
>   test to now match on ubfx.

That looks ok.
Thanks,
Kyrill

> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/srshr.c: Update shift value.
>   * gcc.target/arm/mve/intrinsics/srshrl.c: Update shift value.
>   * gcc.target/arm/mve/intrinsics/uqshl.c: Update shift value and mov
> imm.
>   * gcc.target/arm/mve/intrinsics/uqshll.c: Update shift value and mov
> imm.
>   * gcc.target/arm/mve/intrinsics/urshr.c: Update shift value.
>   * gcc.target/arm/mve/intrinsics/urshrl.c: Update shift value.
>   * gcc.target/arm/mve/intrinsics/vadciq_m_s32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vadciq_m_u32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vadciq_s32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vadciq_u32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vadcq_s32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vadcq_u32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vsbciq_s32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vsbciq_u32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vsbcq_s32.c: Update to ubfx.
>   * gcc.target/arm/mve/intrinsics/vsbcq_u32.c: Update to ubfx.
> ---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c   | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c  | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c   | 4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshll.c  | 5 +++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/urshr.c   | 4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/urshrl.c  | 4 ++--
>  .../gcc.target/arm/mve/intrinsics/vadciq_m_s32.c  | 8 ++--
>  .../gcc.target/arm/mve/intrinsics/vadciq_m_u32.c  | 8 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_s32.c  | 8 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_u32.c  | 8 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_s32.c | 8 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_u32.c | 8 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_s32.c   | 8 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_u32.c   | 8 ++--
>  .../gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c  | 8 ++--
>  .../gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c  | 8 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_s32.c  | 8 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_u32.c  | 8 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c | 8 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c | 8 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_s32.c   | 8 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_u32.c   | 8 ++--
>  22 files changed, 43 insertions(+), 106 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
> index 94e3f42fd33..734375d58c0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
> @@ -12,7 +12,7 @@ extern "C" {
>  /*
>  **foo:
>  **   ...
> -**   srshr   (?:ip|fp|r[0-9]+), #shift(?:@.*|)
> +**   srshr   (?:ip|fp|r[0-9]+), #1(?:@.*|)
>  **   ...
>  */
>  int32_t
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
> 

RE: [PATCH 09/10] arm testsuite: XFAIL or relax registers in some tests

2023-04-28 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Friday, April 28, 2023 12:30 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Stam Markianos-Wright  wri...@arm.com>
> Subject: [PATCH 09/10] arm testsuite: XFAIL or relax registers in some tests
> 
> From: Stam Markianos-Wright 
> 
> Hi all,
> 
> This is a simple testsuite tidy-up patch, addressing to types of errors:
> 
> * The vcmp vector-scalar tests failing due to the compiler's preference
> of vector-vector comparisons, over vector-scalar comparisons. This is
> due to the lack of cost model for MVE and the compiler not knowing that
> the RTL vec_duplicate is free in those instructions. For now, we simply
> XFAIL these checks.

I'd like to see this deficiency tracked in Bugzilla before we mark these as 
XFAIL.

> * The tests for pr108177 had strict usage of q0 and r0 registers,
> meaning that they would FAIL with -mfloat-abi=softf. The register checks
> have now been relaxed.

This part is ok.
Thanks,
Kyrill

> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/srshr.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/srshrl.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/uqshl.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/uqshll.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/urshr.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/urshrl.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vadciq_m_s32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vadciq_m_u32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vadciq_s32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vadciq_u32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vadcq_s32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vadcq_u32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vsbciq_s32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vsbciq_u32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vsbcq_s32.c: XFAIL check.
>   * gcc.target/arm/mve/intrinsics/vsbcq_u32.c: XFAIL check.
>   * gcc.target/arm/mve/pr108177-1.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-10.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-11.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-12.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-13.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-14.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-2.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-3.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-4.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-5.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-6.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-7.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-8.c: Relax registers.
>   * gcc.target/arm/mve/pr108177-9.c: Relax registers.
> ---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c  | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c  | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c  | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c | 2 +-
>  

Re: [PATCH] testsuite: adjust NOP expectations for RISC-V

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/28/23 10:43, Palmer Dabbelt wrote:

On Fri, 28 Apr 2023 08:20:24 PDT (-0700), jeffreya...@gmail.com wrote:



On 4/27/23 01:39, Jan Beulich via Gcc-patches wrote:

On 26.04.2023 17:45, Palmer Dabbelt wrote:
On Wed, 26 Apr 2023 08:26:26 PDT (-0700), gcc-patches@gcc.gnu.org 
wrote:



On 4/25/23 08:50, Jan Beulich via Gcc-patches wrote:

RISC-V will emit ".option nopic" when -fno-pie is in effect, which
matches the generic pattern. Just like done for Alpha, special-case
RISC-V.
---
A couple more targets look to be affected as well, simply because 
their
"no-operation" insn doesn't match the expectation. With the 
apparently

necessary further special casing I then also question the presence of
"SWYM" in the generic pattern.

An alternative here might be to use dg-additional-options to add e.g.
-fpie. I don't think I know all possible implications of doing so,
though.


Looks like there's already a no-pie for SPARC.  Nothing's jumping 
out as

to why, but I'm not super familiar with `-fpatchable-function-entry`.


I think this is fine.  Go ahead and install it.


We run into this sort of thing somewhat frequently.  Maybe we want a DG
matcher that avoids matching assembler directives?  Or maybe even a
"scan-assembler-nop-times" type thing, given that different ports have
different names for the instruction?

I don't see reason to block fixing the test on something bigger, 
though,
so seems fine for trunk.  Presumably we'd want to backport this as 
well?


Perhaps, but in order to do so I'd need to be given the respective okay.

Given how often we're trying to avoid matching directives, particularly
directives which refer to filenames this sounds like a good idea to me.


I think the ask there was for an OK to backport this fix to 13?  So I 
guess more concretely:


OK for trunk.  OK to backport for 13?

Sure, OK for backporting as well.
jeff


Re: [PATCH v5 02/11] RISC-V: Enforce Libatomic LR/SC SEQ_CST

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/27/23 10:22, Patrick O'Neill wrote:

Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

libgcc/ChangeLog:

* config/riscv/atomic.c: Change LR.aq/SC.rl pairs into
sequentially consistent LR.aqrl/SC.rl pairs.
OK.  When you install this, make sure you also install #3 of the kit 
which mirrors these changes for the inline subword atomics.


jeff


RE: [PATCH 06/10] arm: Fix overloading of MVE scalar constant parameters on vbicq, vmvnq_m

2023-04-28 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Friday, April 28, 2023 12:30 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Stam Markianos-Wright  wri...@arm.com>
> Subject: [PATCH 06/10] arm: Fix overloading of MVE scalar constant
> parameters on vbicq, vmvnq_m
> 
> From: Stam Markianos-Wright 
> 
> We found this as part of the wider testsuite updates.
> 
> The applicable tests are authored by Andrea earlier in this patch series
> 
> Ok for trunk?

Ok.
Thanks,
Kyrill

> 
> gcc/ChangeLog:
> 
>   * config/arm/arm_mve.h (__arm_vbicq): Change coerce on
>   scalar constant.
>   (__arm_vmvnq_m): Likewise.
> ---
>  gcc/config/arm/arm_mve.h | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 3d386f320c3..3a1cffb4063 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -35906,10 +35906,10 @@ extern void *__ARM_undef;
>  #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>__typeof(p1) __p1 = (p1); \
>_Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce1 (__p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce1 (__p1, int)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce1 (__p1, int)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce1 (__p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3 (p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3 (p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3 (p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3 (p1, int)), \
>int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vbicq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vbicq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vbicq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -38825,10 +38825,10 @@ extern void *__ARM_undef;
>  #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>__typeof(p1) __p1 = (p1); \
>_Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce1 (__p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce1 (__p1, int)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce1 (__p1, int)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce1 (__p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3 (p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3 (p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3 (p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vbicq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3 (p1, int)), \
>int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vbicq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vbicq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vbicq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -40962,10 +40962,10 @@ extern void *__ARM_undef;
>int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vmvnq_m_u8 

RE: [PATCH 05/10] arm: Add vorrq_n overloading into vorrq _Generic

2023-04-28 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Friday, April 28, 2023 12:30 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Stam Markianos-Wright  wri...@arm.com>
> Subject: [PATCH 05/10] arm: Add vorrq_n overloading into vorrq _Generic
> 
> From: Stam Markianos-Wright 
> 
> We found this as part of the wider testsuite updates.
> 
> The applicable tests are authored by Andrea earlier in this patch series
> 
> Ok for trunk?

Ok as a stopgap measure. I'm looking forward to the work from Christophe 
overhauling this whole part.
Thanks,
Kyrill

> 
> gcc/ChangeLog:
> 
>   * config/arm/arm_mve.h (__arm_vorrq): Add _n variant.
> ---
>  gcc/config/arm/arm_mve.h | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 8778216304b..3d386f320c3 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -35852,6 +35852,10 @@ extern void *__ARM_undef;
>int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vorrq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
>int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vorrq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vorrq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vorrq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vorrq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vorrq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vorrq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t)), \
>int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vorrq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> 
> @@ -38637,7 +38641,11 @@ extern void *__ARM_undef;
>int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vorrq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
>int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vorrq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
>int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vorrq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vorrq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vorrq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vorrq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vorrq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vornq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>__typeof(p1) __p1 = (p1); \
> --
> 2.25.1



RE: [PATCH 04/10] arm: Stop vadcq, vsbcq intrinsics from overwriting the FPSCR NZ flags

2023-04-28 Thread Kyrylo Tkachov via Gcc-patches
Hi Andrea, Stam,

> -Original Message-
> From: Andrea Corallo 
> Sent: Friday, April 28, 2023 12:30 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Stam Markianos-Wright  wri...@arm.com>
> Subject: [PATCH 04/10] arm: Stop vadcq, vsbcq intrinsics from overwriting the
> FPSCR NZ flags
> 
> From: Stam Markianos-Wright 
> 
> Hi all,
> 
> We noticed that calls to the vadcq and vsbcq intrinsics, both of
> which use __builtin_arm_set_fpscr_nzcvqc to set the Carry flag in
> the FPSCR, would produce the following code:
> 
> ```
> < r2 is the *carry input >
> vmrs  r3, FPSCR_nzcvqc
> bic   r3, r3, #536870912
> orr   r3, r3, r2, lsl #29
> vmsr  FPSCR_nzcvqc, r3
> ```
> 
> when the MVE ACLE instead gives a different instruction sequence of:
> ```
> < Rt is the *carry input >
> VMRS Rs,FPSCR_nzcvqc
> BFI Rs,Rt,#29,#1
> VMSR FPSCR_nzcvqc,Rs
> ```
> 
> the bic + orr pair is slower and it's also wrong, because, if the
> *carry input is greater than 1, then we risk overwriting the top two
> bits of the FPSCR register (the N and Z flags).
> 
> This turned out to be a problem in the header file and the solution was
> to simply add a `& 1x0u` to the `*carry` input: then the compiler knows
> that we only care about the lowest bit and can optimise to a BFI.
> 
> Ok for trunk?

Ok, but I think this needs testsuite coverage for the bug?
Thanks,
Kyrill

> 
> Thanks,
> Stam Markianos-Wright
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm_mve.h (__arm_vadcq_s32): Fix arithmetic.
>   (__arm_vadcq_u32): Likewise.
>   (__arm_vadcq_m_s32): Likewise.
>   (__arm_vadcq_m_u32): Likewise.
>   (__arm_vsbcq_s32): Likewise.
>   (__arm_vsbcq_u32): Likewise.
>   (__arm_vsbcq_m_s32): Likewise.
>   (__arm_vsbcq_m_u32): Likewise.
> ---
>  gcc/config/arm/arm_mve.h | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 1262d668121..8778216304b 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -16055,7 +16055,7 @@ __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vadcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
>  {
> -  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () &
> ~0x2000u) | (*__carry << 29));
> +  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () &
> ~0x2000u) | ((*__carry & 0x1u) << 29));
>int32x4_t __res = __builtin_mve_vadcq_sv4si (__a, __b);
>*__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
>return __res;
> @@ -16065,7 +16065,7 @@ __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vadcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
>  {
> -  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () &
> ~0x2000u) | (*__carry << 29));
> +  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () &
> ~0x2000u) | ((*__carry & 0x1u) << 29));
>uint32x4_t __res = __builtin_mve_vadcq_uv4si (__a, __b);
>*__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
>return __res;
> @@ -16075,7 +16075,7 @@ __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vadcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> unsigned * __carry, mve_pred16_t __p)
>  {
> -  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () &
> ~0x2000u) | (*__carry << 29));
> +  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () &
> ~0x2000u) | ((*__carry & 0x1u) << 29));
>int32x4_t __res = __builtin_mve_vadcq_m_sv4si (__inactive, __a, __b, __p);
>*__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
>return __res;
> @@ -16085,7 +16085,7 @@ __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vadcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> unsigned * __carry, mve_pred16_t __p)
>  {
> -  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () &
> ~0x2000u) | (*__carry << 29));
> +  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () &
> ~0x2000u) | ((*__carry & 0x1u) << 29));
>uint32x4_t __res =  __builtin_mve_vadcq_m_uv4si (__inactive, __a, __b,
> __p);
>*__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
>return __res;
> @@ -16131,7 +16131,7 @@ __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vsbcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
>  {
> -  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () &
> ~0x2000u) | (*__carry << 29));
> +  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () &
> ~0x2000u) | ((*__carry & 0x1u) << 29));
>

Re: [PATCH] testsuite: adjust NOP expectations for RISC-V

2023-04-28 Thread Palmer Dabbelt

On Fri, 28 Apr 2023 08:20:24 PDT (-0700), jeffreya...@gmail.com wrote:



On 4/27/23 01:39, Jan Beulich via Gcc-patches wrote:

On 26.04.2023 17:45, Palmer Dabbelt wrote:

On Wed, 26 Apr 2023 08:26:26 PDT (-0700), gcc-patches@gcc.gnu.org wrote:



On 4/25/23 08:50, Jan Beulich via Gcc-patches wrote:

RISC-V will emit ".option nopic" when -fno-pie is in effect, which
matches the generic pattern. Just like done for Alpha, special-case
RISC-V.
---
A couple more targets look to be affected as well, simply because their
"no-operation" insn doesn't match the expectation. With the apparently
necessary further special casing I then also question the presence of
"SWYM" in the generic pattern.

An alternative here might be to use dg-additional-options to add e.g.
-fpie. I don't think I know all possible implications of doing so,
though.


Looks like there's already a no-pie for SPARC.  Nothing's jumping out as
to why, but I'm not super familiar with `-fpatchable-function-entry`.


I think this is fine.  Go ahead and install it.


We run into this sort of thing somewhat frequently.  Maybe we want a DG
matcher that avoids matching assembler directives?  Or maybe even a
"scan-assembler-nop-times" type thing, given that different ports have
different names for the instruction?

I don't see reason to block fixing the test on something bigger, though,
so seems fine for trunk.  Presumably we'd want to backport this as well?


Perhaps, but in order to do so I'd need to be given the respective okay.

Given how often we're trying to avoid matching directives, particularly
directives which refer to filenames this sounds like a good idea to me.


I think the ask there was for an OK to backport this fix to 13?  So I 
guess more concretely:


OK for trunk.  OK to backport for 13?


RE: [PATCH 03/10] arm: Mve backend + testsuite fixes 2

2023-04-28 Thread Kyrylo Tkachov via Gcc-patches
Hi Andrea,

> -Original Message-
> From: Andrea Corallo 
> Sent: Friday, April 28, 2023 12:30 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 03/10] arm: Mve backend + testsuite fixes 2
> 
> Hi all,
> 
> this patch improves a number of MVE tests in the testsuite for more
> precise and better coverage using check-function-bodies instead of
> scan-assembler checks.  Also all intrusctions prescribed in the
> ACLE[1] are now checked.
> 
> Also a number of simple fixes are done in the backend to fix
> capitalization and spacing.

Looks good, but I don't think the original patch made it to the list archives:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/thread.html
You can commit this patch but please resend a compressed form of it to the list 
for archival.
Thanks,
Kyrill



Re: [PATCH v5 00/11] RISC-V: Implement ISA Manual Table A.6 Mappings

2023-04-28 Thread Palmer Dabbelt

On Fri, 28 Apr 2023 09:14:00 PDT (-0700), jeffreya...@gmail.com wrote:



On 4/27/23 10:22, Patrick O'Neill wrote:

This patchset aims to make the RISCV atomics implementation stronger
than the recommended mapping present in table A.6 of the ISA manual.
https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157

Context
-
GCC defined RISC-V mappings [1] before the Memory Model task group
finalized their work and provided the ISA Manual Table A.6/A.7 mappings[2].

For at least a year now, we've known that the mappings were different,
but it wasn't clear if these unique mappings had correctness issues.

Andrea Parri found an issue with the GCC mappings, showing that
atomic_compare_exchange_weak_explicit(-,-,-,release,relaxed) mappings do
not enforce release ordering guarantees. (Meaning the GCC mappings have
a correctness issue).
   https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/

Right.  I recall this discussion, but thanks for the back reference.


Yep, and it's an important one: that's why we're calling the change a 
bug fix and dropping the current GCC mappings.  If we didn't have the 
bug we'd be talking about an ABI break, and since the GCC mappings 
predate the ISA mappings we'd likely need an additional compatibility 
mode.


So I guess we're lucky that we have a concurrency bug.  I think it's the 
first time I've said that ;)



Why not A.6?
-
We can update our mappings now, so the obvious choice would be to
implement Table A.6 (what LLVM implements/ISA manual recommends).

The reason why that isn't the best path forward for GCC is due to a
proposal by Hans Boehm to add L{d|w|b|h}.aq/rl and S{d|w|b|h}.aq/rl.

For context, there is discussion about fast-tracking the addition of
these instructions. The RISCV architectural review committee supports
adopting a "new and common atomics ABI for gcc and LLVM toochains ...
that assumes the addition of the preceding instructions”. That common
ABI is likely to be A.7.
   https://lists.riscv.org/g/tech-privileged/message/1284

Transitioning from A.6 to A.7 will cause an ABI break. We can hedge
against that risk by emitting a conservative fence after SEQ_CST stores
to make the mapping compatible with both A.6 and A.7.

So I like that we can have compatible sequences across A.6 and A.7.  Of
course the concern is performance ;-)




What does a mapping compatible with both A.6 & A.7 look like?
-
It is exactly the same as Table A.6, but SEQ_CST stores have a trailing
fence rw,rw. It's strictly stronger than Table A.6.

Right.  So my worry here is silicon that is either already available or
coming online shortly.   Those implementations simply aren't going to be
able to use the A.7 mapping, so they pay a penalty.  Does it make sense
to have the compatibility fences conditional?


IIRC this was discussed somewhere in some thread, but I think there's 
really three ABIs that could be implemented here (ignoring the current 
GCC mappings as they're broken):


* ABI compatible with the current mappings in the ISA manual (A.6).  
 This will presumably perform best on extant hardware, given that it's 
 what the words in the PDF say to do.
* ABI compatible with the proposed mappings for the ISA manual (A.7).  
 This may perform better on new hardware.
* ABI compatible with both A.6 and A.7.  This is likely slow on both new 
 and old hardware, but allows cross-linking.  If there's no performance 
 issues this would be the only mode we need, but that seems unlikely.


IMO those should be encoded somewhere in the ELF.  I'd just do it as two 
bits in the header, but last time I proposed header bits the psABI folks 
wanted to do something different.  I don't think where we encode this 
matters all that much, but if we're doing to treat these as real 
long-term ABIs we should have some way to encode that.


There's also the orthogonal axis of whether we use the new instructions.  
Those aren't in specs yet so I think we can hold off on them for a bit, 
but they're the whole point of doing the ABI break so we should at least 
think them over.  I think we're OK because we've just split out the ABI 
from the ISA here, but I'm not sure if I'm missing something.


Now that I wrote that, though, I remember talking to Patrick about it 
and we drew a bunch of stuff on the whiteboard and then got confused.  
So sorry if I'm just out of the loop here...








Benchmark Interpretation

As expected, out of order machines are significantly faster with the
REL_STORE mappings. Unexpectedly, the in-order machines are
significantly slower with REL_STORE rather than REL_STORE_FENCE.

Yea, that's a bit of a surprise.



Most machines in the wild are expected to use Table A.7 once the
instructions are introduced.
Incurring this added cost now will make it easier for compiled RISC-V
binaries to transition to the A.7 memory model mapping.

The performance benefits of moving to A.7 can be more 

RE: [PATCH 02/10] arm: Fix vstrwq* backend + testsuite

2023-04-28 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Friday, April 28, 2023 12:30 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 02/10] arm: Fix vstrwq* backend + testsuite
> 
> Hi all,
> 
> this patch fixes the vstrwq* MVE instrinsics failing to emit the
> correct sequence of instruction due to a missing predicates. Also the
> immediate range is fixed to be multiples of 2 up between [-252, 252].
> 

Ok.
Thanks,
Kyrill

> Best Regards
> 
>   Andrea
> 
> gcc/ChangeLog:
> 
>   * config/arm/constraints.md (mve_vldrd_immediate): Move it to
>   predicates.md.
>   (Ri): Move constraint definition from predicates.md.
>   (Rl): Define new constraint.
>   * config/arm/mve.md (mve_vstrwq_scatter_base_wb_p_v4si):
> Add
>   missing constraint.
>   (mve_vstrwq_scatter_base_wb_p_fv4sf): Add missing Up constraint
>   for op 1, use mve_vstrw_immediate predicate and Rl constraint for
>   op 2. Fix asm output spacing.
>   (mve_vstrdq_scatter_base_wb_p_v2di): Add missing
> constraint.
>   * config/arm/predicates.md (Ri) Move constraint to constraints.md
>   (mve_vldrd_immediate): Move it from
>   constraints.md.
>   (mve_vstrw_immediate): New predicate.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vstrwq_f32.c: Use
>   check-function-bodies instead of scan-assembler checks.  Use
>   extern "C" for C++ testing.
>   * gcc.target/arm/mve/intrinsics/vstrwq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_f32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_f32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_s32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_u32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_s32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_u32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_f32.c:
> Likewise.
>   *
> gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_f32.c:
> Likewise.
>   *
> gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_s32.c:
> Likewise.
>   *
> gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_u32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_s32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_u32.c:
> Likewise.
>   * gcc.target/arm/mve/intrinsics/vstrwq_u32.c: Likewise.
> ---
>  gcc/config/arm/constraints.md | 20 --
>  gcc/config/arm/mve.md | 10 ++---
>  gcc/config/arm/predicates.md  | 14 +++
>  .../arm/mve/intrinsics/vstrwq_f32.c   | 32 ---
>  .../arm/mve/intrinsics/vstrwq_p_f32.c | 40 ---
>  .../arm/mve/intrinsics/vstrwq_p_s32.c | 40 ---
>  .../arm/mve/intrinsics/vstrwq_p_u32.c | 40 ---
>  .../arm/mve/intrinsics/vstrwq_s32.c   | 32 ---
>  .../mve/intrinsics/vstrwq_scatter_base_f32.c  | 28 +++--
>  .../intrinsics/vstrwq_scatter_base_p_f32.c| 36 +++--
>  .../intrinsics/vstrwq_scatter_base_p_s32.c| 36 +++--
>  .../intrinsics/vstrwq_scatter_base_p_u32.c| 36 +++--
>  .../mve/intrinsics/vstrwq_scatter_base_s32.c  | 28 +++--
>  .../mve/intrinsics/vstrwq_scatter_base_u32.c  | 28 +++--
>  .../intrinsics/vstrwq_scatter_base_wb_f32.c   | 32 ---
>  .../intrinsics/vstrwq_scatter_base_wb_p_f32.c | 40 ---
>  .../intrinsics/vstrwq_scatter_base_wb_p_s32.c | 40 

Re: [PATCH v5 01/11] RISC-V: Eliminate SYNC memory models

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/27/23 10:22, Patrick O'Neill wrote:

Remove references to MEMMODEL_SYNC_* models by converting via
memmodel_base().

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv.cc: Remove MEMMODEL_SYNC_* cases and
sanitize memmodel input with memmodel_base.
OK.  Not sure if you want to commit it now or wait for the full set to 
get ACK'd (since there are some questions on the trailing sync approach).


Jeff


Re: [PATCH v5 00/11] RISC-V: Implement ISA Manual Table A.6 Mappings

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/27/23 10:22, Patrick O'Neill wrote:

This patchset aims to make the RISCV atomics implementation stronger
than the recommended mapping present in table A.6 of the ISA manual.
https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157

Context
-
GCC defined RISC-V mappings [1] before the Memory Model task group
finalized their work and provided the ISA Manual Table A.6/A.7 mappings[2].

For at least a year now, we've known that the mappings were different,
but it wasn't clear if these unique mappings had correctness issues.

Andrea Parri found an issue with the GCC mappings, showing that
atomic_compare_exchange_weak_explicit(-,-,-,release,relaxed) mappings do
not enforce release ordering guarantees. (Meaning the GCC mappings have
a correctness issue).
   https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/

Right.  I recall this discussion, but thanks for the back reference.



Why not A.6?
-
We can update our mappings now, so the obvious choice would be to
implement Table A.6 (what LLVM implements/ISA manual recommends).

The reason why that isn't the best path forward for GCC is due to a
proposal by Hans Boehm to add L{d|w|b|h}.aq/rl and S{d|w|b|h}.aq/rl.

For context, there is discussion about fast-tracking the addition of
these instructions. The RISCV architectural review committee supports
adopting a "new and common atomics ABI for gcc and LLVM toochains ...
that assumes the addition of the preceding instructions”. That common
ABI is likely to be A.7.
   https://lists.riscv.org/g/tech-privileged/message/1284

Transitioning from A.6 to A.7 will cause an ABI break. We can hedge
against that risk by emitting a conservative fence after SEQ_CST stores
to make the mapping compatible with both A.6 and A.7.
So I like that we can have compatible sequences across A.6 and A.7.  Of 
course the concern is performance ;-)





What does a mapping compatible with both A.6 & A.7 look like?
-
It is exactly the same as Table A.6, but SEQ_CST stores have a trailing
fence rw,rw. It's strictly stronger than Table A.6.
Right.  So my worry here is silicon that is either already available or 
coming online shortly.   Those implementations simply aren't going to be 
able to use the A.7 mapping, so they pay a penalty.  Does it make sense 
to have the compatibility fences conditional?






Benchmark Interpretation

As expected, out of order machines are significantly faster with the
REL_STORE mappings. Unexpectedly, the in-order machines are
significantly slower with REL_STORE rather than REL_STORE_FENCE.

Yea, that's a bit of a surprise.



Most machines in the wild are expected to use Table A.7 once the
instructions are introduced.
Incurring this added cost now will make it easier for compiled RISC-V
binaries to transition to the A.7 memory model mapping.

The performance benefits of moving to A.7 can be more clearly seen using
an almost-all-load microbenchmark (included on page 3 of Hans’
proposal). The code for that microbenchmark is attached below [5].
   
https://lists.riscv.org/g/tech-unprivileged/attachment/382/0/load-acquire110422.pdf
   https://lists.riscv.org/g/tech-unprivileged/topic/92916241
Yea.  I'm not questioning the value of the new instructions that are on 
the horizon, just the value of trying to make everything A.7 compatible.





Conformance test cases notes

The conformance tests in this patch are a good sanity check but do not
guarantee exactly following Table A.6. It checks that the right
instructions are emitted (ex. fence rw,r) but not the order of those
instructions.
Note there is a way to check ordering as well.  You might look at the 
check-function-bodies approach.  I think there are some recent examples 
in the gcc risc-v specific tests.





LLVM mapping notes

LLVM emits corresponding fences for atomic_signal_fence instructions.
This seems to be an oversight since AFAIK atomic_signal_fence acts as a
compiler directive. GCC does not emit any fences for atomic_signal_fence
instructions.
This starts to touch on a larger concern.  Specifically I'd really like 
the two compilers to be compatible in terms of the code they generate 
for the various atomics.


What I worry about is code being written (by design or accident) that is 
dependent on the particular behavior of one compiler and then if that 
code gets built with the other compiler, and we end up different 
behavior.  Worse yet, if/when this happens, it's likely to be tough to 
expose, reproduce & debug.


Do you have any sense of where Clang/LLVM is going to go WRT providing 
an A.6 mapping that is compatible with A.7 by using the additional fences?



Jeff


[PATCH] riscv: Allow vector constants in riscv_const_insns.

2023-04-28 Thread Robin Dapp via Gcc-patches
Hi,

I figured I'm going to start sending some patches that build on top
of the upcoming RISC-V autovectorization.  This one is obviously
not supposed to be installed before the basic support lands but
it's small enough that it shouldn't hurt to send it now.

This patch allows vector constants in riscv_const_insns in order
for them to be properly recognized as immediate operands such that
we can emit vmv.v.i instructions via autovec.

Bootstrapped and regtested on riscv32gcv and riscv64gcv.

Regards
 Robin

--

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Add permissible
vector constants.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vmv-imm.c: New test.
---
 gcc/config/riscv/riscv.cc |  10 +-
 .../gcc.target/riscv/rvv/autovec/vmv-imm.c| 109 ++
 2 files changed, 118 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index eb7364ca110..6f9c6743028 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1228,7 +1228,15 @@ riscv_const_insns (rtx x)
 case CONST_DOUBLE:
 case CONST_VECTOR:
   /* We can use x0 to load floating-point zero.  */
-  return x == CONST0_RTX (GET_MODE (x)) ? 1 : 0;
+  if (x == CONST0_RTX (GET_MODE (x)))
+   return 1;
+  /* Constants from -16 to 15 can be loaded with vmv.v.i.
+The Wc0, Wc1 constraints are already covered by the
+vi constraint so we do not need to check them here
+separately.  */
+  else if (TARGET_VECTOR && satisfies_constraint_vi (x))
+   return 1;
+  return 0;
 
 case CONST:
   /* See if we can refer to X directly.  */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm.c
new file mode 100644
index 000..42ca56d4b5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm.c
@@ -0,0 +1,109 @@
+/* { dg-do run } */
+/* { dg-additional-options "-std=c99 --param=riscv-autovec-preference=scalable 
-fno-vect-cost-model -fno-builtin --save-temps" } */
+
+#include 
+#include 
+
+#define VMV_POS(TYPE,VAL)  \
+  __attribute__ ((noipa))   \
+  void vmv_##VAL (TYPE dst[], int n)   \
+  { \
+for (int i = 0; i < n; i++) \
+  dst[i] = VAL;\
+  }
+
+#define VMV_NEG(TYPE,VAL)  \
+  __attribute__ ((noipa))   \
+  void vmv_m##VAL (TYPE dst[], int n)  \
+  { \
+for (int i = 0; i < n; i++) \
+  dst[i] = -VAL;   \
+  }
+
+#define TEST_ALL() \
+VMV_NEG(int8_t,16) \
+VMV_NEG(int8_t,15) \
+VMV_NEG(int8_t,14) \
+VMV_NEG(int8_t,13) \
+VMV_NEG(int16_t,12) \
+VMV_NEG(int16_t,11) \
+VMV_NEG(int16_t,10) \
+VMV_NEG(int16_t,9) \
+VMV_NEG(int32_t,8) \
+VMV_NEG(int32_t,7) \
+VMV_NEG(int32_t,6) \
+VMV_NEG(int32_t,5) \
+VMV_NEG(int64_t,4) \
+VMV_NEG(int64_t,3) \
+VMV_NEG(int64_t,2) \
+VMV_NEG(int64_t,1) \
+VMV_POS(uint8_t,0) \
+VMV_POS(uint8_t,1) \
+VMV_POS(uint8_t,2) \
+VMV_POS(uint8_t,3) \
+VMV_POS(uint16_t,4)\
+VMV_POS(uint16_t,5)\
+VMV_POS(uint16_t,6)\
+VMV_POS(uint16_t,7)\
+VMV_POS(uint32_t,8)\
+VMV_POS(uint32_t,9)\
+VMV_POS(uint32_t,10)   \
+VMV_POS(uint32_t,11)   \
+VMV_POS(uint64_t,12)   \
+VMV_POS(uint64_t,13)   \
+VMV_POS(uint64_t,14)   \
+VMV_POS(uint64_t,15)
+
+TEST_ALL()
+
+#define SZ 32
+
+#define TEST_POS(TYPE,VAL) \
+  TYPE a##TYPE##VAL[SZ];   \
+  vmv_##VAL (a##TYPE##VAL, SZ);\
+  for (int i = 0; i < SZ; i++) \
+assert (a##TYPE##VAL[i] == VAL);
+
+#define TEST_NEG(TYPE,VAL) \
+  TYPE am##TYPE##VAL[SZ];  \
+  vmv_m##VAL (am##TYPE##VAL, SZ);  \
+  for (int i = 0; i < SZ; i++) \
+assert (am##TYPE##VAL[i] == -VAL);
+
+int main ()
+{
+  TEST_NEG(int8_t, 16)
+  TEST_NEG(int8_t, 15)
+  TEST_NEG(int8_t, 14)
+  TEST_NEG(int8_t, 13)
+  TEST_NEG(int16_t, 12)
+  TEST_NEG(int16_t, 11)
+  TEST_NEG(int16_t, 10)
+  TEST_NEG(int16_t, 9)
+  TEST_NEG(int32_t, 8)
+  TEST_NEG(int32_t, 7)
+  TEST_NEG(int32_t, 6)
+  TEST_NEG(int32_t, 5)
+  TEST_NEG(int64_t, 4)
+  TEST_NEG(int64_t, 3)
+  TEST_NEG(int64_t, 2)
+  TEST_NEG(int64_t, 1)
+  TEST_POS(uint8_t, 0)
+  TEST_POS(uint8_t, 1)
+  TEST_POS(uint8_t, 2)
+  TEST_POS(uint8_t, 3)
+  TEST_POS(uint16_t, 4)
+  TEST_POS(uint16_t, 5)
+  TEST_POS(uint16_t, 6)
+  TEST_POS(uint16_t, 7)
+  TEST_POS(uint32_t, 8)
+  TEST_POS(uint32_t, 9)
+  TEST_POS(uint32_t, 10)
+  TEST_POS(uint32_t, 11)
+  TEST_POS(uint64_t, 12)
+  

Re: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2023-04-28 Thread Andre Vieira (lists) via Gcc-patches




On 25/04/2023 13:30, Richard Biener wrote:

On Mon, 24 Apr 2023, Richard Sandiford wrote:


Richard Biener  writes:

On Thu, Apr 20, 2023 at 3:24?PM Andre Vieira (lists) via Gcc-patches
 wrote:


Rebased all three patches and made some small changes to the second one:
- removed sub and abd optabs from commutative_optab_p, I suspect this
was a copy paste mistake,
- removed what I believe to be a superfluous switch case in vectorizable
conversion, the one that was here:
+  if (code.is_fn_code ())
+ {
+  internal_fn ifn = as_internal_fn (code.as_fn_code ());
+  int ecf_flags = internal_fn_flags (ifn);
+  gcc_assert (ecf_flags & ECF_MULTI);
+
+  switch (code.as_fn_code ())
+   {
+   case CFN_VEC_WIDEN_PLUS:
+ break;
+   case CFN_VEC_WIDEN_MINUS:
+ break;
+   case CFN_LAST:
+   default:
+ return false;
+   }
+
+  internal_fn lo, hi;
+  lookup_multi_internal_fn (ifn, , );
+  *code1 = as_combined_fn (lo);
+  *code2 = as_combined_fn (hi);
+  optab1 = lookup_multi_ifn_optab (lo, !TYPE_UNSIGNED (vectype));
+  optab2 = lookup_multi_ifn_optab (hi, !TYPE_UNSIGNED (vectype));
   }

I don't think we need to check they are a specfic fn code, as we look-up
optabs and if they succeed then surely we can vectorize?

OK for trunk?


In the first patch I see some uses of safe_as_tree_code like

+  if (ch.is_tree_code ())
+return op1 == NULL_TREE ? gimple_build_assign (lhs,
ch.safe_as_tree_code (),
+  op0) :
+ gimple_build_assign (lhs, ch.safe_as_tree_code (),
+  op0, op1);
+  else
+  {
+internal_fn fn = as_internal_fn (ch.safe_as_fn_code ());
+gimple* stmt;

where the context actually requires a valid tree code.  Please change those
to force to tree code / ifn code.  Just use explicit casts here and the other
places that are similar.  Before the as_internal_fn just put a
gcc_assert (ch.is_internal_fn ()).


Also, doesn't the above ?: simplify to the "else" arm?  Null trailing
arguments would be ignored for unary operators.

I wasn't sure what to make of the op0 handling:


+/* Build a GIMPLE_ASSIGN or GIMPLE_CALL with the tree_code,
+   or internal_fn contained in ch, respectively.  */
+gimple *
+vect_gimple_build (tree lhs, code_helper ch, tree op0, tree op1)
+{
+  if (op0 == NULL_TREE)
+return NULL;


Can that happen, and if so, does returning null make sense?
Maybe an assert would be safer.


Yeah, I was hoping to have a look whether the new gimple_build
overloads could be used to make this all better (but hoped we can
finally get this series in in some way).

Richard.


Yeah, in the newest version of the first patch of the series I found 
that most of the time I can get away with only really needing to 
distinguish between tree_code and internal_fn when building gimple, for 
which it currently uses vect_gimple_build, but it does feel like that 
could easily be a gimple function.


Having said that, as I partially mention in the patch, I didn't rewrite 
the optabs-tree supportable_half_widening and supportable_conversion (or 
whatever they are called) because those also at some point need to 
access the stmt and there is a massive difference in how we handle 
gassigns and gcall's from that perspective, but maybe we can generalize 
that too somehow...


Anyway have a look at the new versions (posted just some minutes after 
the email I'm replying too haha! timing :P)


[PATCH] testsuite: Handle empty assembly lines in check-function-bodies

2023-04-28 Thread Hans-Peter Nilsson via Gcc-patches
Ok to commit?
-- >8 --
I tried to make use of check-function-bodies for cris-elf and was a
bit surprised to see it failing.  There's a deliberate empty line
after the filled delay slot of the return-function which was
mishandled.  I thought "aha" and tried to add an empty line
(containing just a "**" prefix) to the match, but that didn't help.
While it was added as input from the function's assembly output
to-be-matched like any other line, it couldn't be matched: I had to
use "...", which works but is...distracting.

Some digging shows that an empty assembly line can't be deliberately
matched because all matcher lines (lines starting with the prefix,
the ubiquitous "**") are canonicalized by trimming leading
whitespace (the "string trim" in check-function-bodies) and instead
adding a leading TAB character, thus empty lines end up containing
just a TAB.  For usability it's better to treat empty lines as fluff
than to uglifying the test-case and the code to properly match them.
Double-checking, no test-case tries to match an line containing just
TAB (by providing an a line containing just "**\s*", i.e. zero or
more whitespace characters).

* lib/scanasm.exp (parse_function_bodies): Set fluff to include
empty lines (besides optionally leading whitespace).
---
 gcc/testsuite/lib/scanasm.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index fb53544d40c7..be2b83a5dd48 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -791,7 +791,7 @@ proc parse_function_bodies { filename result } {
 set terminator {^\s*\.size}
 
 # Regexp for lines that aren't interesting.
-set fluff {^\s*(?:\.|//|@)}
+set fluff {^\s*(?:\.|//|@|$)}
 
 set fd [open $filename r]
 set in_function 0
-- 
2.30.2



[PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-28 Thread Pan Li via Gcc-patches
From: Pan Li 

When some RVV integer compare operators act on the same vector registers
without mask. They can be simplified to VMSET.

This PATCH allows the eq, le, leu, ge, geu to perform such kind of the
simplification by adding one macro in riscv for simplify rtx.

Given we have:
vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl)
{
  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v8,0(a1)
vmseq.vv v8,v8,v8
vsetvli  a5,zero,e8,m8,ta,ma
vsm.vv8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,m8,ta,ma
vmset.m v1  <- optimized to vmset.m
vsetvli a5,zero,e8,m8,ta,ma
vsm.v   v1,0(a0)
ret

As above, we may have one instruction eliminated and require less vector
registers.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
  consumed by simplify_rtx.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
  Adjust test check condition.
---
 gcc/config/riscv/riscv.h| 5 +
 .../riscv/rvv/base/integer_compare_insn_shortcut.c  | 6 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 13038a39e5c..4473115d3a9 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1096,4 +1096,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
 #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
   ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
 
+/* Like s390, riscv also defined this macro for the vector comparision.  Then
+   the simplify-rtx relational_result will canonicalize the result to the
+   CONST1_RTX for the simplification.  */
+#define VECTOR_STORE_FLAG_VALUE(MODE) CONSTM1_RTX (GET_MODE_INNER (MODE))
+
 #endif /* ! GCC_RISCV_H */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
index 8954adad09d..1bca8467a16 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
@@ -283,9 +283,5 @@ vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t 
v1, size_t vl) {
   return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl);
 }
 
-/* { dg-final { scan-assembler-times {vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 
} } */
-/* { dg-final { scan-assembler-times {vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 
} } */
-/* { dg-final { scan-assembler-times {vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 
7 } } */
-/* { dg-final { scan-assembler-times {vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 
} } */
-/* { dg-final { scan-assembler-times {vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 
7 } } */
 /* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
+/* { dg-final { scan-assembler-times {vmset\.m\sv[0-9]} 35 } } */
-- 
2.34.1



Re: [PATCH] testsuite: adjust NOP expectations for RISC-V

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/27/23 01:39, Jan Beulich via Gcc-patches wrote:

On 26.04.2023 17:45, Palmer Dabbelt wrote:

On Wed, 26 Apr 2023 08:26:26 PDT (-0700), gcc-patches@gcc.gnu.org wrote:



On 4/25/23 08:50, Jan Beulich via Gcc-patches wrote:

RISC-V will emit ".option nopic" when -fno-pie is in effect, which
matches the generic pattern. Just like done for Alpha, special-case
RISC-V.
---
A couple more targets look to be affected as well, simply because their
"no-operation" insn doesn't match the expectation. With the apparently
necessary further special casing I then also question the presence of
"SWYM" in the generic pattern.

An alternative here might be to use dg-additional-options to add e.g.
-fpie. I don't think I know all possible implications of doing so,
though.


Looks like there's already a no-pie for SPARC.  Nothing's jumping out as
to why, but I'm not super familiar with `-fpatchable-function-entry`.


I think this is fine.  Go ahead and install it.


We run into this sort of thing somewhat frequently.  Maybe we want a DG
matcher that avoids matching assembler directives?  Or maybe even a
"scan-assembler-nop-times" type thing, given that different ports have
different names for the instruction?

I don't see reason to block fixing the test on something bigger, though,
so seems fine for trunk.  Presumably we'd want to backport this as well?


Perhaps, but in order to do so I'd need to be given the respective okay.
Given how often we're trying to avoid matching directives, particularly 
directives which refer to filenames this sounds like a good idea to me.


jeff


Re: RISC-V: Eliminate redundant zero extension of minu/maxu operands

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/28/23 06:29, Jivan Hakobyan via Gcc-patches wrote:

RV64 the following code:

   unsigned Min(unsigned a, unsigned b) {
   return a < b ? a : b;
   }

Compiles to:
   Min:
zext.w  a1,a1
zext.w  a0,a0
minua0,a1,a0
sext.w  a0,a0
ret

This patch removes unnecessary zero extensions of minu/maxu operands.

gcc/ChangeLog:

  * config/riscv/bitmanip.md: Added expanders for minu/maxu instructions

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max-02.c: Updated scanning check.
* gcc.target/riscv/zbb-min-max-03.c: New tests.
Thanks.  We had almost the exact same patch internally, the differences 
were just in naming rather than anything functional.  I went ahead and 
installed your variant and I'll drop Raphael's from my queue.


I realize this may make some planned work WRT eliminating unnecessary 
extensions in gimple somewhat harder.  If that work progresses to the 
point where this patch is a problem, then we'll re-evaluate.  But it's 
crazy to hold this up -- it's a measurable win on x264 in particular.


Thanks again,
jeff


[PATCH (pushed)] contrib: port doxygen script to Python3

2023-04-28 Thread Martin Liška
Pushed to master as obvious.

Martin

contrib/ChangeLog:

* filter_gcc_for_doxygen: Use python3 and not python2.
* filter_params.py: Likewise.
---
 contrib/filter_gcc_for_doxygen | 2 +-
 contrib/filter_params.py   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/contrib/filter_gcc_for_doxygen b/contrib/filter_gcc_for_doxygen
index d1109a50c88..d3670604b01 100755
--- a/contrib/filter_gcc_for_doxygen
+++ b/contrib/filter_gcc_for_doxygen
@@ -8,5 +8,5 @@
 # process is put on stdout.
 
 dir=`dirname $0`
-python $dir/filter_params.py $1
+env python3 $dir/filter_params.py $1
 exit 0
diff --git a/contrib/filter_params.py b/contrib/filter_params.py
index a82a8d5728c..359d28b07d2 100644
--- a/contrib/filter_params.py
+++ b/contrib/filter_params.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/env python3
 """
 Filters out some of the #defines used throughout the GCC sources:
 - GTY(()) marks declarations for gengtype.c
-- 
2.40.0



RE: [PATCH 2/5] match.pd: Remove commented out line pragmas unless -vv is used.

2023-04-28 Thread Tamar Christina via Gcc-patches
> On the check for verbose==2, should that be verbose >= 2 ?
> 

That's fair enough. Made the change.

Thanks,
Tamar.

>   paul
> 
> > On Apr 28, 2023, at 6:38 AM, Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > genmatch currently outputs commented out line directives that have no
> > effect but the compiler still has to parse only to discard.
> >
> > They are however handy when debugging genmatch output.  As such this
> > moves them behind the -vv flag.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR bootstrap/84402
> > * genmatch.cc (output_line_directive): Only emit commented directive
> > when -vv.
> > (main): Initialize verbose.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc index
> >
> 638606b2502f640e59527fc5a0b23fa3bedd0cee..6d62cdea2082d92e5ecc1
> 102c802
> > 05115a4e3040 100644
> > --- a/gcc/genmatch.cc
> > +++ b/gcc/genmatch.cc
> > @@ -209,7 +209,7 @@ output_line_directive (FILE *f, location_t location,
> >   else
> > fprintf (f, "%s:%d", file, loc.line);
> > }
> > -  else
> > +  else if (verbose == 2)
> > /* Other gen programs really output line directives here, at least for
> >development it's right now more convenient to have line information
> >from the generated file.  Still keep the directives as comment
> > for now @@ -5221,6 +5221,7 @@ main (int argc, char **argv)
> > return 1;
> >
> >   bool gimple = true;
> > +  verbose = 0;
> >   char *input = argv[argc-1];
> >   for (int i = 1; i < argc - 1; ++i)
> > {
> >
> >
> >
> >
> > --
> > 



Re: [PATCH 2/5] match.pd: Remove commented out line pragmas unless -vv is used.

2023-04-28 Thread Paul Koning via Gcc-patches
On the check for verbose==2, should that be verbose >= 2 ?

paul

> On Apr 28, 2023, at 6:38 AM, Tamar Christina via Gcc-patches 
>  wrote:
> 
> Hi All,
> 
> genmatch currently outputs commented out line directives that have no effect
> but the compiler still has to parse only to discard.
> 
> They are however handy when debugging genmatch output.  As such this moves 
> them
> behind the -vv flag.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR bootstrap/84402
>   * genmatch.cc (output_line_directive): Only emit commented directive
>   when -vv.
>   (main): Initialize verbose.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index 
> 638606b2502f640e59527fc5a0b23fa3bedd0cee..6d62cdea2082d92e5ecc1102c80205115a4e3040
>  100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -209,7 +209,7 @@ output_line_directive (FILE *f, location_t location,
>   else
>   fprintf (f, "%s:%d", file, loc.line);
> }
> -  else
> +  else if (verbose == 2)
> /* Other gen programs really output line directives here, at least for
>development it's right now more convenient to have line information
>from the generated file.  Still keep the directives as comment for now
> @@ -5221,6 +5221,7 @@ main (int argc, char **argv)
> return 1;
> 
>   bool gimple = true;
> +  verbose = 0;
>   char *input = argv[argc-1];
>   for (int i = 1; i < argc - 1; ++i)
> {
> 
> 
> 
> 
> -- 
> 



[PATCH v2] MIPS: add speculation_barrier support

2023-04-28 Thread YunQiang Su
speculation_barrier for MIPS needs sync+jr.hb (r2+),
so we implement __speculation_barrier in libgcc, like arm32 does.

gcc/ChangeLog:
* config/mips/mips-protos.h (mips_emit_speculation_barrier): New
prototype.
* config/mips/mips.cc (speculation_barrier_libfunc): New static
variable.
(mips_init_libfuncs): Initialize it.
(mips_emit_speculation_barrier): New function.
* config/mips/mips.md (speculation_barrier): Call
mips_emit_speculation_barrier.

libgcc/ChangeLog:
* config/mips/lib1funcs.S: New file.
define __speculation_barrier and include mips16.S.
* config/mips/t-mips: define LIB1ASMSRC as mips/lib1funcs.S.
define LIB1ASMFUNCS as _speculation_barrier.
set version info for __speculation_barrier.
* config/mips/libgcc-mips.ver: New file.
* config/mips/t-mips16: don't define LIB1ASMSRC as mips16.S is
included in lib1funcs.S now.
---
 gcc/config/mips/mips-protos.h  |  2 +
 gcc/config/mips/mips.cc| 13 +++
 gcc/config/mips/mips.md| 12 ++
 libgcc/config/mips/lib1funcs.S | 60 ++
 libgcc/config/mips/libgcc-mips.ver | 21 +++
 libgcc/config/mips/t-mips  |  7 
 libgcc/config/mips/t-mips16|  3 +-
 7 files changed, 116 insertions(+), 2 deletions(-)
 create mode 100644 libgcc/config/mips/lib1funcs.S
 create mode 100644 libgcc/config/mips/libgcc-mips.ver

diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 20483469105..da7902c235b 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -388,4 +388,6 @@ extern void mips_register_frame_header_opt (void);
 extern void mips_expand_vec_cond_expr (machine_mode, machine_mode, rtx *);
 extern void mips_expand_vec_cmp_expr (rtx *);
 
+extern void mips_emit_speculation_barrier_function (void);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index ca822758b41..139707fda34 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -13611,6 +13611,9 @@ mips_autovectorize_vector_modes (vector_modes *modes, 
bool)
   return 0;
 }
 
+
+static GTY(()) rtx speculation_barrier_libfunc;
+
 /* Implement TARGET_INIT_LIBFUNCS.  */
 
 static void
@@ -13680,6 +13683,7 @@ mips_init_libfuncs (void)
   synchronize_libfunc = init_one_libfunc ("__sync_synchronize");
   init_sync_libfuncs (UNITS_PER_WORD);
 }
+  speculation_barrier_libfunc = init_one_libfunc ("__speculation_barrier");
 }
 
 /* Build up a multi-insn sequence that loads label TARGET into $AT.  */
@@ -19092,6 +19096,15 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn *insn, 
int *hilo_delay,
   }
 }
 
+/* Emit a speculation barrier.
+   JR.HB is needed, so we need to put
+   speculation_barrier_libfunc in libgcc */
+void
+mips_emit_speculation_barrier_function ()
+{
+  emit_library_call (speculation_barrier_libfunc, LCT_NORMAL, VOIDmode);
+}
+
 /* A SEQUENCE is breakable iff the branch inside it has a compact form
and the target has compact branches.  */
 
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index ac1d77afc7d..5d04ac566dd 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -160,6 +160,8 @@
   ;; The `.insn' pseudo-op.
   UNSPEC_INSN_PSEUDO
   UNSPEC_JRHB
+
+  VUNSPEC_SPECULATION_BARRIER
 ])
 
 (define_constants
@@ -7455,6 +7457,16 @@
   mips_expand_conditional_move (operands);
   DONE;
 })
+
+(define_expand "speculation_barrier"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_SPECULATION_BARRIER)]
+  ""
+  "
+  mips_emit_speculation_barrier_function ();
+  DONE;
+  "
+)
+
 
 ;;
 ;;  
diff --git a/libgcc/config/mips/lib1funcs.S b/libgcc/config/mips/lib1funcs.S
new file mode 100644
index 000..45d74e2e762
--- /dev/null
+++ b/libgcc/config/mips/lib1funcs.S
@@ -0,0 +1,60 @@
+/* Copyright (C) 1995-2023 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#include "mips16.S"
+
+#ifdef L_speculation_barrier
+
+/* MIPS16e1 has no sync/jr.hb instructions, 

Re: [PATCH] MIPS: add speculation_barrier support

2023-04-28 Thread YunQiang Su
Jiaxun Yang  于2023年4月28日周五 20:36写道:
>
>
>
> > 2023年4月28日 13:33,YunQiang Su  写道:
> >
> > speculation_barrier for MIPS needs sync+jr.hb (r2+),
> > so we implement __speculation_barrier in libgcc, like arm32 does.
> >
> > gcc/ChangeLog:
> > * config/mips/mips-protos.h (mips_emit_speculation_barrier): New
> >prototype.
> > * config/mips/mips.cc (speculation_barrier_libfunc): New static
> >variable.
> > (mips_init_libfuncs): Initialize it.
> > (mips_emit_speculation_barrier): New function.
> > * config/arm/arm.md (speculation_barrier): Call
> >mips_emit_speculation_barrier.
>
> ^ arm? Typo.
>

ohhh. You are right.
I copied the commit message from:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ebdb6f237772df251378d2c08350d345135bcb9e

New patch will be send.

> Thanks
> Jiaxun


RE: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-28 Thread Li, Pan2 via Gcc-patches
Cool, Thank you!

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, April 28, 2023 8:37 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

pushed, thanks!


[PATCH v2] GCC-13/changes: Add note about iostream usage

2023-04-28 Thread Jonathan Wakely via Gcc-patches

On 26/04/23 20:06 +0100, Jonathan Wakely wrote:

On 26/04/23 09:53 -0700, Andrew Pinski wrote:

This adds a note about iostream usage so it does not catch others
in surpise like it has already.

OK?


Thanks, I agree we should add something, but have some comments below.


---
htdocs/gcc-13/changes.html | 5 +
1 file changed, 5 insertions(+)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 70732ec0..7c83f7c4 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -25,6 +25,11 @@ You may also want to check out our

Caveats

+libstdc++ uses constructors inside the library to initialize 
std::cout/std::cin, etc.
+ instead of having it done in each source which uses iostream header.


We should use code font for std::cout, std::cin and iostream, and
style it as  not just iostream.


+ This requires you to make sure the dynamic loader to load the new 
libstdc++v3 library
+ (examples of how to do this is to use -Wl,-rpath,... while linking or 
LD_LIBRARY_PATH
+ while running the program).  


I think it would be better to link to 
https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dynamic_or_shared.html#manual.intro.using.linkage.dynamic

How about:

 For C++, construction of the global iostream objects   
std::cout, std::cin etc. is now done

 inside the standard library, instead of in every source file that
 includes the iostream header. This change
 improves the start-up performance of C++ programs, but it means that
 code compiled with GCC 13.1 will crash if the correct version of
 libstdc++.so is not used at runtime. See the
 https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dynamic_or_shared.html#manual.intro.using.linkage.dynamic;>documentation
 about using the right libstdc++.so at runtime.
 


Here's a proper patch proposal along those lines.

OK for wwwdocs?


commit cf408a8d7e9ee3c7efd5b4a3fa5697f4a85a036a
Author: Jonathan Wakely 
Date:   Fri Apr 28 13:47:12 2023 +0100

Add caveat about C++ iostream init changes (PR108969)

Co-authored-by: Andrew Pinski 

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 70732ec0..f9533494 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -56,6 +56,18 @@ You may also want to check out our
   https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/ARM-iWMMXt-Built-in-Functions.html;>
   iWMMXt built-in functions.
 
+For C++, construction of the global iostream objects
+  std::cout, std::cin etc. is now done
+  inside the standard library, instead of in every source file that
+  includes the iostream header. This change
+  improves the start-up performance of C++ programs, but it means that
+  code compiled with GCC 13.1 will crash if the correct version of
+  libstdc++.so is not used at runtime. See the
+  https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dynamic_or_shared.html#manual.intro.using.linkage.dynamic;>documentation
+  about using the right libstdc++.so at runtime.
+  Future GCC releases will mitigate the problem so that the program
+  cannot be run at all with an older libstdc++.so.
+
 
 
 


Re: [PATCH] OpenACC: Stand-alone attach/detach clause fixes for Fortran [PR109622]

2023-04-28 Thread Thomas Schwinge
Hi Julian!

On 2023-04-27T11:36:47-0700, Julian Brown  wrote:
> This patch fixes several cases where multiple attach or detach mapping
> nodes were being created for stand-alone attach or detach clauses
> in Fortran.  After the introduction of stricter checking later during
> compilation, these extra nodes could cause ICEs, as seen in the PR.
>
> The patch also fixes cases that "happened to work" previously where
> the user attaches/detaches a pointer to array using a descriptor, and
> (I think!) the "_data" field has offset zero, hence the same address as
> the descriptor as a whole.

Thanks for looking into this.

I haven't reviewed the patch itself, but noticed one thing:

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/pr109622-2.f90

> +!$acc enter data copyin(var)

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/pr109622-3.f90

> +!$acc enter data copyin(var, tgt)

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/pr109622.f90

> +!$acc enter data copyin(var, var2)

You'll want to move these into 'libgomp/testsuite/libgomp.oacc-fortran/'
to actually test them with '-fopenacc' instead of '-fopenmp'.  ;-)


Chalk up one for the idea that I once had, to have '-fopenacc',
'-fopenmp', '-fopenmp-simd' enable '-Wunknown-pragmas' by default.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] c++: outer args for level-lowered ttp [PR109651]

2023-04-28 Thread Patrick Palka via Gcc-patches
On Thu, 27 Apr 2023, Patrick Palka wrote:

> On Thu, Apr 27, 2023 at 4:46 PM Patrick Palka  wrote:
> >
> > Now that with r14-11-g2245459c85a3f4 made us coerce the template
> > arguments of a bound ttp again after level-lowering, this unfortunately
> > causes a crash from coerce_template_args_for_ttp in the below testcase.
> >
> > During the level-lowering substitution T=int into the bound ttp TT
> > as part of substitution into the lambda signature, current_template_parms
> > is just U=U rather than the ideal TT=TT, U=U.  And because we don't
> > consistently set DECL_CONTEXT for level-lowered ttps (it's kind of a
> > chicken of the egg problem in this case), we attempt to use
> > current_template_parms to obtain the outer arguments during
> > coerce_template_args_for_ttp.  But the depth 1 of c_t_p
> > current_template_parms is less than the depth 2 of the level-lowered TT,
> > and we end up segfaulting from there.
> >
> > So for level-lowered ttps it seems we need to get the outer arguments a
> > different way -- namely, we can look at the trailing parms of its
> > DECL_TEMPLATE_PARMS.
> 
> Note this is not an ideal solution because TREE_CHAIN of
> DECL_TEMPLATE_PARMS in this case is just "2 , 1 U", so we're
> missing tparm information for the level that the ttp belongs to :/ So
> the only difference compared to using current_template_parms in this
> case is the extra empty level of args corresponding to the ttp's
> level.

And on the other hand, this issue seems specific to lambdas because
it's in tsubst_lambda_expr that we substitute the function type _before_
substituting and installing the template parameters, which is opposite
to the typical order that tsubst_template_decl does things in.  And
that's ultimately the reason the current_template_parms fallback in
coerce_template_args_for_ttp misbehaves in this testcase.

So the following seems to be a better fix.  With it, current_template_parms
is correctly 2 TT, 1 U during substitution the lambda's function type,
which makes coerce_template_args_for_ttp happy when level lowering
the bound ttp within the function type.

-- >8 --

Subject: [PATCH] c++: bound ttp in lambda function type [PR109651]

PR c++/109651

gcc/cp/ChangeLog:

* pt.cc (tsubst_template_decl): Add default argument to
lambda_fntype parameter.  Add defaulted lambda_tparms parameter.
Prefer to use lambda_tparms instead of substituting
DECL_TEMPLATE_PARMS.
(tsubst_decl) : Adjust tsubst_template_decl
call.
(tsubst_lambda_expr): For a generic lambda, substitute
DECL_TEMPLATE_PARMS and update current_template_parms
before substituting the function type.  Pass the substituted
DECL_TEMPLATE_PARMS to tsubst_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-generic-ttp1.C: New test.
* g++.dg/cpp2a/lambda-generic-ttp2.C: New test.
---
 gcc/cp/pt.cc  | 30 ++-
 .../g++.dg/cpp2a/lambda-generic-ttp1.C| 11 +++
 .../g++.dg/cpp2a/lambda-generic-ttp2.C| 13 
 3 files changed, 47 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-generic-ttp1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-generic-ttp2.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 678cb7930e3..43713d9ab72 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -14629,7 +14629,8 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t 
complain,
 
 static tree
 tsubst_template_decl (tree t, tree args, tsubst_flags_t complain,
- tree lambda_fntype)
+ tree lambda_fntype = NULL_TREE,
+ tree lambda_tparms = NULL_TREE)
 {
   /* We can get here when processing a member function template,
  member class template, or template template parameter.  */
@@ -14719,8 +14720,10 @@ tsubst_template_decl (tree t, tree args, 
tsubst_flags_t complain,
   auto tparm_guard = make_temp_override (current_template_parms);
   DECL_TEMPLATE_PARMS (r)
 = current_template_parms
-= tsubst_template_parms (DECL_TEMPLATE_PARMS (t), args,
-complain);
+= (lambda_tparms
+   ? lambda_tparms
+   : tsubst_template_parms (DECL_TEMPLATE_PARMS (t), args,
+   complain));
 
   bool class_p = false;
   tree inner = decl;
@@ -14888,7 +14891,7 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
   switch (TREE_CODE (t))
 {
 case TEMPLATE_DECL:
-  r = tsubst_template_decl (t, args, complain, /*lambda*/NULL_TREE);
+  r = tsubst_template_decl (t, args, complain);
   break;
 
 case FUNCTION_DECL:
@@ -20130,12 +20133,24 @@ tsubst_lambda_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
  ? DECL_TI_TEMPLATE (oldfn)
  : NULL_TREE);
 
+  tree tparms = NULL_TREE;
+  if (oldtmpl)
+tparms = tsubst_template_parms (DECL_TEMPLATE_PARMS (oldtmpl), args, 

[PATCH 2/3] Refactor widen_plus as internal_fn

2023-04-28 Thread Andre Vieira (lists) via Gcc-patches

This patch replaces the existing tree_code widen_plus and widen_minus
patterns with internal_fn versions.

DEF_INTERNAL_OPTAB_HILO_FN is like DEF_INTERNAL_OPTAB_FN except it 
provides convenience wrappers for defining conversions that require a 
hi/lo split, like widening and narrowing operations.  Each definition 
for  will require an optab named  and two other optabs that 
you specify for signed and unsigned. The hi/lo pair is necessary because 
the widening operations take n narrow elements as inputs and return n/2 
wide elements as outputs. The 'lo' operation operates on the first n/2 
elements of input. The 'hi' operation operates on the second n/2 
elements of input. Defining an internal_fn along with hi/lo variations 
allows a single internal function to be returned from a vect_recog 
function that will later be expanded to hi/lo.


DEF_INTERNAL_OPTAB_HILO_FN is used in internal-fn.def to register a 
widening internal_fn. It is defined differently in different places and 
internal-fn.def is sourced from those places so the parameters given can 
be reused.
  internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, 
later defined to generate the  'expand_' functions for the hi/lo 
versions of the fn.
  internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the 
original and hi/lo variants of the internal_fn


 For example:
 IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_addl_hi_ -> 
(u/s)addl2
   IFN_VEC_WIDEN_PLUS_LO  -> 
vec_widen_addl_lo_ -> (u/s)addl


This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS 
tree codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.


gcc/ChangeLog:

2023-04-28  Andre Vieira  
Joel Hutton  
Tamar Christina  

* internal-fn.cc (INCLUDE_MAP): Include maps for use in optab
lookup.
(DEF_INTERNAL_OPTAB_HILO_FN): Macro to define an internal_fn that
expands into multiple internal_fns (for widening).
(ifn_cmp): Function to compare ifn's for sorting/searching.
(lookup_hilo_ifn_optab): Add lookup function.
(lookup_hilo_internal_fn): Add lookup function.
(commutative_binary_fn_p): Add widen_plus fn's.
(widening_fn_p): New function.
(decomposes_to_hilo_fn_p): New function.
* internal-fn.def (DEF_INTERNAL_OPTAB_HILO_FN): Define widening
plus,minus functions.
(VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS tree code.
(VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS tree code.
* internal-fn.h (GCC_INTERNAL_FN_H): Add headers.
(lookup_hilo_ifn_optab): Add prototype.
(lookup_hilo_internal_fn): Likewise.
(widening_fn_p): Likewise.
(decomposes_to_hilo_fn_p): Likewise.
* optabs.cc (commutative_optab_p): Add widening plus, minus optabs.
* optabs.def (OPTAB_CD): widen add, sub optabs
* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
patterns with a hi/lo split.
(vect_recog_widen_plus_pattern): Refactor to return
IFN_VECT_WIDEN_PLUS.
(vect_recog_widen_minus_pattern): Refactor to return new
IFN_VEC_WIDEN_MINUS.
* tree-vect-stmts.cc (vectorizable_conversion): Add widen plus/minus
ifn
support.
(supportable_widening_operation): Add widen plus/minus ifn support.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-widen-add.c: Test that new
IFN_VEC_WIDEN_PLUS is being used.
* gcc.target/aarch64/vect-widen-sub.c: Test that new
IFN_VEC_WIDEN_MINUS is being used.diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 
6e81dc05e0e0714256759b0594816df451415a2d..e4d815cd577d266d2bccf6fb68d62aac91a8b4cf
 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+#define INCLUDE_MAP
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -70,6 +71,26 @@ const int internal_fn_flags_array[] = {
   0
 };
 
+const enum internal_fn internal_fn_hilo_keys_array[] = {
+#undef DEF_INTERNAL_OPTAB_HILO_FN
+#define DEF_INTERNAL_OPTAB_HILO_FN(NAME, FLAGS, OPTAB, SOPTAB, UOPTAB, TYPE) \
+  IFN_##NAME##_LO, \
+  IFN_##NAME##_HI,
+#include "internal-fn.def"
+  IFN_LAST
+#undef DEF_INTERNAL_OPTAB_HILO_FN
+};
+
+const optab internal_fn_hilo_values_array[] = {
+#undef DEF_INTERNAL_OPTAB_HILO_FN
+#define DEF_INTERNAL_OPTAB_HILO_FN(NAME, FLAGS, OPTAB, SOPTAB, UOPTAB, TYPE) \
+  SOPTAB##_lo_optab, UOPTAB##_lo_optab, \
+  SOPTAB##_hi_optab, UOPTAB##_hi_optab,
+#include "internal-fn.def"
+  unknown_optab, unknown_optab
+#undef DEF_INTERNAL_OPTAB_HILO_FN
+};
+
 /* Return the internal function called NAME, or IFN_LAST if there's
no such function.  */
 
@@ -90,6 +111,61 @@ lookup_internal_fn (const char *name)
   return 

[PATCH] Add emulated scatter capability to the vectorizer

2023-04-28 Thread Richard Biener via Gcc-patches
This adds a scatter vectorization capability to the vectorizer
without target support by decomposing the offset and data vectors
and then performing scalar stores in the order of vector lanes.
This is aimed at cases where vectorizing the rest of the loop
offsets the cost of vectorizing the scatter.

The offset load is still vectorized and costed as such, but like
with emulated gather those will be turned back to scalar loads
by forwrpop.

Slightly fixed compared to the version posted in autumn,
re-bootstrapped & tested on x86_64-unknown-linux-gnu and pushed.

Richard.

* tree-vect-data-refs.cc (vect_analyze_data_refs): Always
consider scatters.
* tree-vect-stmts.cc (vect_model_store_cost): Pass in the
gather-scatter info and cost emulated scatters accordingly.
(get_load_store_type): Support emulated scatters.
(vectorizable_store): Likewise.  Emulate them by extracting
scalar offsets and data, doing scalar stores.

* gcc.dg/vect/pr25413a.c: Un-XFAIL everywhere.
* gcc.dg/vect/vect-71.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s4113.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s491.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-vas.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr25413a.c  |   3 +-
 .../gcc.dg/vect/tsvc/vect-tsvc-s4113.c|   2 +-
 .../gcc.dg/vect/tsvc/vect-tsvc-s491.c |   2 +-
 .../gcc.dg/vect/tsvc/vect-tsvc-vas.c  |   2 +-
 gcc/testsuite/gcc.dg/vect/vect-71.c   |   2 +-
 gcc/tree-vect-data-refs.cc|   4 +-
 gcc/tree-vect-stmts.cc| 117 ++
 7 files changed, 97 insertions(+), 35 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr25413a.c 
b/gcc/testsuite/gcc.dg/vect/pr25413a.c
index e444b2c3e8e..ffb517c9ce0 100644
--- a/gcc/testsuite/gcc.dg/vect/pr25413a.c
+++ b/gcc/testsuite/gcc.dg/vect/pr25413a.c
@@ -123,7 +123,6 @@ int main (void)
   return 0;
 } 
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
! vect_scatter_store } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target 
vect_scatter_store } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "vector alignment may not be reachable" 1 
"vect" { target { ! vector_alignment_reachable  } } } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using 
versioning" 1 "vect" { target { ! vector_alignment_reachable } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
index b64682a65df..ddb7e9dc0e8 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
aarch64_sve }  } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
index 8465e137070..29e90ff0aff 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
aarch64_sve }  } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c
index 5ff38851f43..b72ee21a9a3 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
aarch64_sve }  } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-71.c 
b/gcc/testsuite/gcc.dg/vect/vect-71.c
index f15521176df..581473fa4a1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-71.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-71.c
@@ -36,4 +36,4 @@ int main (void)
   return main1 ();
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! 
vect_scatter_store } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index c03ffb3aaf1..6721ab6efc4 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4464,9 +4464,7 @@ vect_analyze_data_refs (vec_info *vinfo, poly_uint64 
*min_vf, bool *fatal)
  && !TREE_THIS_VOLATILE (DR_REF (dr));
  bool maybe_scatter
= DR_IS_WRITE (dr)
- && !TREE_THIS_VOLATILE (DR_REF (dr))
- && (targetm.vectorize.builtin_scatter != NULL
- 

[PATCH 1/3] Refactor to allow internal_fn's

2023-04-28 Thread Andre Vieira (lists) via Gcc-patches

Hi,

I'm posting the patches separately now with ChangeLogs.

I made the suggested changes and tried to simplify the code a bit 
further. Where internal to tree-vect-stmts I changed most functions to 
use code_helper to avoid having to check at places we didn't need to. I 
was trying to simplify things further by also modifying 
supportable_half_widening_operation and supportable_convert_operation 
but the result of that was that I ended up moving the code to cast to 
tree code inside them rather than at the call site and it didn't look 
simpler, so I left those. Though if we did make those changes we'd no 
longer need to keep around the tc1 variable in 
vectorizable_conversion... Let me know what you think.


gcc/ChangeLog:

2023-04-28  Andre Vieira  
Joel Hutton  

* tree-vect-patterns.cc (vect_gimple_build): New Function.
(vect_recog_widen_op_pattern): Refactor to use code_helper.
* tree-vect-stmts.cc (vect_gen_widened_results_half): Likewise.
(vect_create_vectorized_demotion_stmts): Likewise.
(vect_create_vectorized_promotion_stmts): Likewise.
(vect_create_half_widening_stmts): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_call): Likewise.
(supportable_widening_operation): Likewise.
(supportable_narrowing_operation): Likewise.
(simple_integer_narrowing): Likewise.
* tree-vectorizer.h (supportable_widening_operation): Likewise.
(supportable_narrowing_operation): Likewise.
(vect_gimple_build): New function prototype.
* tree.h (code_helper::safe_as_tree_code): New function.
(code_helper::safe_as_fn_code): New function.diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
8802141cd6edb298866025b8a55843eae1f0eb17..b35023adade94c1996cd076c4b7419560e819c6b
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -25,6 +25,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl.h"
 #include "tree.h"
 #include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimple-fold.h"
 #include "ssa.h"
 #include "expmed.h"
 #include "optabs-tree.h"
@@ -1391,7 +1393,7 @@ vect_recog_sad_pattern (vec_info *vinfo,
 static gimple *
 vect_recog_widen_op_pattern (vec_info *vinfo,
 stmt_vec_info last_stmt_info, tree *type_out,
-tree_code orig_code, tree_code wide_code,
+tree_code orig_code, code_helper wide_code,
 bool shift_p, const char *name)
 {
   gimple *last_stmt = last_stmt_info->stmt;
@@ -1434,7 +1436,7 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
   vecctype = get_vectype_for_scalar_type (vinfo, ctype);
 }
 
-  enum tree_code dummy_code;
+  code_helper dummy_code;
   int dummy_int;
   auto_vec dummy_vec;
   if (!vectype
@@ -1455,8 +1457,7 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
   2, oprnd, half_type, unprom, vectype);
 
   tree var = vect_recog_temp_ssa_var (itype, NULL);
-  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
- oprnd[0], oprnd[1]);
+  gimple *pattern_stmt = vect_gimple_build (var, wide_code, oprnd[0], 
oprnd[1]);
 
   if (vecctype != vecitype)
 pattern_stmt = vect_convert_output (vinfo, last_stmt_info, ctype,
@@ -6406,3 +6407,20 @@ vect_pattern_recog (vec_info *vinfo)
   /* After this no more add_stmt calls are allowed.  */
   vinfo->stmt_vec_info_ro = true;
 }
+
+/* Build a GIMPLE_ASSIGN or GIMPLE_CALL with the tree_code,
+   or internal_fn contained in ch, respectively.  */
+gimple *
+vect_gimple_build (tree lhs, code_helper ch, tree op0, tree op1)
+{
+  gcc_assert (op0 != NULL_TREE);
+  if (ch.is_tree_code ())
+return gimple_build_assign (lhs, (tree_code) ch, op0, op1);
+
+  gcc_assert (ch.is_internal_fn ());
+  gimple* stmt = gimple_build_call_internal (as_internal_fn ((combined_fn) ch),
+op1 == NULL_TREE ? 1 : 2,
+op0, op1);
+  gimple_call_set_lhs (stmt, lhs);
+  return stmt;
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
6b7dbfd4a231baec24e740ffe0ce0b0bf7a1de6b..ce47f4940fa9a1baca4ba1162065cfc3b4072eba
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3258,13 +3258,13 @@ vectorizable_bswap (vec_info *vinfo,
 
 static bool
 simple_integer_narrowing (tree vectype_out, tree vectype_in,
- tree_code *convert_code)
+ code_helper *convert_code)
 {
   if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype_out))
   || !INTEGRAL_TYPE_P (TREE_TYPE (vectype_in)))
 return false;
 
-  tree_code code;
+  code_helper code;
   int multi_step_cvt = 0;
   auto_vec  interm_types;
   if (!supportable_narrowing_operation (NOP_EXPR, vectype_out, vectype_in,
@@ -3478,7 +3478,7 @@ vectorizable_call (vec_info *vinfo,
   tree callee = 

[PATCH 3/3] Remove widen_plus/minus_expr tree codes

2023-04-28 Thread Andre Vieira (lists) via Gcc-patches

This is a rebase of Joel's previous patch.

This patch removes the old widen plus/minus tree codes which have been
replaced by internal functions.

gcc/ChangeLog:

2023-04-28  Andre Vieira  
Joel Hutton  

* doc/generic.texi: Remove old tree codes.
* expr.cc (expand_expr_real_2): Remove old tree code cases.
* gimple-pretty-print.cc (dump_binary_rhs): Likewise.
* optabs-tree.cc (optab_for_tree_code): Likewise.
(supportable_half_widening_operation): Likewise.
* tree-cfg.cc (verify_gimple_assign_binary): Likewise.
* tree-inline.cc (estimate_operator_cost): Likewise.
(op_symbol_code): Likewise.
* tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Likewise.
(vect_analyze_data_ref_accesses): Likewise.
* tree-vect-generic.cc (expand_vector_operations_1): Likewise.
* cfgexpand.cc (expand_debug_expr): Likewise.
* tree-vect-stmts.cc (vectorizable_conversion): Likewise.
(supportable_widening_operation): Likewise.
* gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard):
Likewise.
* tree-vect-patterns.cc (vect_widened_op_tree): Refactor to replace
usage in vect_recog_sad_pattern.
(vect_recog_sad_pattern): Replace tree code widening pattern with
internal function.
(vect_recog_average_pattern): Likewise.
* tree-pretty-print.cc (dump_generic_node): Remove tree code definition.
* tree.def (WIDEN_PLUS_EXPR, WIDEN_MINUS_EXPR, VEC_WIDEN_PLUS_HI_EXPR,
VEC_WIDEN_PLUS_LO_EXPR, VEC_WIDEN_MINUS_HI_EXPR,
VEC_WIDEN_MINUS_LO_EXPR): Likewisediff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 
1a1b26b1c6c23ce273bcd08dc9a973f777174007..25b1558dcb941ea491a19aeeb2cd8f4d2dbdf7c6
 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -5365,10 +5365,6 @@ expand_debug_expr (tree exp)
 case VEC_WIDEN_MULT_ODD_EXPR:
 case VEC_WIDEN_LSHIFT_HI_EXPR:
 case VEC_WIDEN_LSHIFT_LO_EXPR:
-case VEC_WIDEN_PLUS_HI_EXPR:
-case VEC_WIDEN_PLUS_LO_EXPR:
-case VEC_WIDEN_MINUS_HI_EXPR:
-case VEC_WIDEN_MINUS_LO_EXPR:
 case VEC_PERM_EXPR:
 case VEC_DUPLICATE_EXPR:
 case VEC_SERIES_EXPR:
@@ -5405,8 +5401,6 @@ expand_debug_expr (tree exp)
 case WIDEN_MULT_EXPR:
 case WIDEN_MULT_PLUS_EXPR:
 case WIDEN_MULT_MINUS_EXPR:
-case WIDEN_PLUS_EXPR:
-case WIDEN_MINUS_EXPR:
   if (SCALAR_INT_MODE_P (GET_MODE (op0))
  && SCALAR_INT_MODE_P (mode))
{
@@ -5419,10 +5413,6 @@ expand_debug_expr (tree exp)
op1 = simplify_gen_unary (ZERO_EXTEND, mode, op1, inner_mode);
  else
op1 = simplify_gen_unary (SIGN_EXTEND, mode, op1, inner_mode);
- if (TREE_CODE (exp) == WIDEN_PLUS_EXPR)
-   return simplify_gen_binary (PLUS, mode, op0, op1);
- else if (TREE_CODE (exp) == WIDEN_MINUS_EXPR)
-   return simplify_gen_binary (MINUS, mode, op0, op1);
  op0 = simplify_gen_binary (MULT, mode, op0, op1);
  if (TREE_CODE (exp) == WIDEN_MULT_EXPR)
return op0;
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 
2c14b7abce2db0a3da0a21e916907947cb56a265..3816abaaf4d364d604a44942317f96f3f303e5b6
 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1811,10 +1811,6 @@ a value from @code{enum annot_expr_kind}, the third is 
an @code{INTEGER_CST}.
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
 @tindex VEC_WIDEN_MULT_LO_EXPR
-@tindex VEC_WIDEN_PLUS_HI_EXPR
-@tindex VEC_WIDEN_PLUS_LO_EXPR
-@tindex VEC_WIDEN_MINUS_HI_EXPR
-@tindex VEC_WIDEN_MINUS_LO_EXPR
 @tindex VEC_UNPACK_HI_EXPR
 @tindex VEC_UNPACK_LO_EXPR
 @tindex VEC_UNPACK_FLOAT_HI_EXPR
@@ -1861,33 +1857,6 @@ vector of @code{N/2} products. In the case of 
@code{VEC_WIDEN_MULT_LO_EXPR} the
 low @code{N/2} elements of the two vector are multiplied to produce the
 vector of @code{N/2} products.
 
-@item VEC_WIDEN_PLUS_HI_EXPR
-@itemx VEC_WIDEN_PLUS_LO_EXPR
-These nodes represent widening vector addition of the high and low parts of
-the two input vectors, respectively.  Their operands are vectors that contain
-the same number of elements (@code{N}) of the same integral type. The result
-is a vector that contains half as many elements, of an integral type whose size
-is twice as wide.  In the case of @code{VEC_WIDEN_PLUS_HI_EXPR} the high
-@code{N/2} elements of the two vectors are added to produce the vector of
-@code{N/2} products.  In the case of @code{VEC_WIDEN_PLUS_LO_EXPR} the low
-@code{N/2} elements of the two vectors are added to produce the vector of
-@code{N/2} products.
-
-@item VEC_WIDEN_MINUS_HI_EXPR
-@itemx VEC_WIDEN_MINUS_LO_EXPR
-These nodes represent widening vector subtraction of the high and low parts of
-the two input vectors, respectively.  Their operands are vectors that contain
-the same number of elements (@code{N}) of the same integral type. The high/low
-elements of the second vector are subtracted from 

RE: [PATCH 3/3]middle-end RFC - match.pd: automatically partition *-match.cc files.

2023-04-28 Thread Richard Biener via Gcc-patches
On Fri, 28 Apr 2023, Tamar Christina wrote:

> > > [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2018-04/msg01125.html
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > 
> > Some comments - I have to leave the Makefile bits to somebody else to see
> > whether they are portable as-is.
> > 
> > The private functions now in gimple-match-exports.cc are not supposed to be
> > public API, so the additions to gimple-match.h should be avoided - can
> > you add the declarations to gimple-match-head.cc instead?  At least I don't
> > see how the refactoring needs to add anything to gimple-match.h?
> > 
> > -decision_tree::gen (FILE *f, bool gimple)
> > +decision_tree::gen (FILE **files, int n_parts, bool gimple)
> > 
> > can you use a vec<> please to avoid passing n_parts separately?
> > 
> > +  /* Set a default value for the tool to 5, but GCC itself uses
> > + whatever default is determined by the configure variable
> > + DEFAULT_MATCHPD_PARTITIONS.  */
> > +  int n_parts = 5;
> > +  char *input = argv[argc-2];
> > ...
> >   fprintf (stderr, "Usage: genmatch "
> > -  "[--gimple] [--generic] [-v[v]] input\n");
> > +  "[--gimple] [--generic] [--splits=] [-v[v]]
> > input outdir\n");
> > 
> > I don't like this - I'm using ./build/genmatch --gimple test.pd | less to 
> > debug
> > genmatch changes with a small test input and like to preserve that.  Can
> > you instead change the usage to
> > 
> >   genmatch --gimple match.pd gimple-match-1.c gimple-match-2.c
> > gimple-match-3.c ...
> > 
> > thus
> > 
> > -  "[--gimple] [--generic] [-v[v]] input\n");
> > +  "[--gimple] [--generic] [-v[v]] input [output...]\n");
> > 
> > and when no output is specified continue to use stdout?  Possibly when
> > more than one output is given require a --header outfile argument to
> > specify the header file to use (and for one output make emit_func
> > not ICE but instead not emit to the header, aka header_file == NULL?).
> > Ideally without makefile changes that would produce the same
> > gimple-match.cc as before (minus the -head.cc changes of course).
> > 
> > The gimple-match-head.cc/exports changes could be split out as
> > far as I can see?  Likewise the Makefile changes if the argument
> > control is changed as I sugggest?
> >
> 
> All changes done.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR bootstrap/84402
>   * genmatch.cc (emit_func, SIZED_BASED_CHUNKS, get_out_file): New.
>   (decision_tree::gen): Accept list of files instead of single and update
>   to write function definition to header and main file.
>   (write_predicate): Likewise.
>   (write_header): Emit pragmas and new includes.
>   (main): Create file buffers and cleanup.
>   (showUsage): New.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index 
> 716fb97aac4c3c2baae82e068df3ce158b9afee9..f56b4bc992d87cb7d707e59be2d61c44a45b68e6
>  100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -183,6 +183,33 @@ fprintf_indent (FILE *f, unsigned int indent, const char 
> *format, ...)
>va_end (ap);
>  }
>  
> +/* Like fprintf, but print to two files, one header one C implementation.  */
> +FILE *header_file = NULL;
> +
> +static void
> +#if GCC_VERSION >= 4001
> +__attribute__((format (printf, 4, 5)))
> +#endif
> +emit_func (FILE *f, bool open, bool close, const char *format, ...)
> +{
> +  va_list ap1, ap2;
> +  if (header_file != stdout)

would making this header_file == NULL also work?

> +{
> +  if (open)
> + fprintf (header_file, "extern ");
> +  va_start (ap2, format);
> +  vfprintf (header_file, format, ap2);
> +  va_end (ap2);
> +  if (close)
> + fprintf (header_file, ";\n");
> +}
> +
> +  va_start (ap1, format);
> +  vfprintf (f, format, ap1);
> +  va_end (ap1);
> +  fputc ('\n', f);
> +}
> +
>  static void
>  output_line_directive (FILE *f, location_t location,
>  bool dumpfile = false, bool fnargs = false)
> @@ -217,6 +244,34 @@ output_line_directive (FILE *f, location_t location,
>  fprintf (f, "/* #line %d \"%s\" */\n", loc.line, loc.file);
>  }
>  
> +/* Find the file to write into next.  We try to evenly distribute the 
> contents
> +   over the different files.  */
> +
> +#define SIZED_BASED_CHUNKS 1
> +
> +int current_file = 0;
> +FILE *get_out_file (vec  )
> +{
> +#ifdef SIZED_BASED_CHUNKS
> +   FILE *f = NULL;
> +   long min = 0;
> +   /* We've started writing all the files at pos 0, so ftell is equivalent
> +  to the size and should be much faster.  */
> +   for (unsigned i = 0; i < parts.length (); i++)
> + {
> + long res = ftell (parts[i]);

Looks good to me, but I wonder what this will do if the single
part is 'stdout' - I think ftell will error (return -1) and set
errno 

Re: [PATCH] MIPS: add speculation_barrier support

2023-04-28 Thread Jiaxun Yang via Gcc-patches



> 2023年4月28日 13:33,YunQiang Su  写道:
> 
> speculation_barrier for MIPS needs sync+jr.hb (r2+),
> so we implement __speculation_barrier in libgcc, like arm32 does.
> 
> gcc/ChangeLog:
> * config/mips/mips-protos.h (mips_emit_speculation_barrier): New
>prototype.
> * config/mips/mips.cc (speculation_barrier_libfunc): New static
>variable.
> (mips_init_libfuncs): Initialize it.
> (mips_emit_speculation_barrier): New function.
> * config/arm/arm.md (speculation_barrier): Call
>mips_emit_speculation_barrier.

^ arm? Typo.

Thanks
Jiaxun

Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-28 Thread Kito Cheng via Gcc-patches
pushed, thanks!


[PATCH] MIPS: add speculation_barrier support

2023-04-28 Thread YunQiang Su
speculation_barrier for MIPS needs sync+jr.hb (r2+),
so we implement __speculation_barrier in libgcc, like arm32 does.

gcc/ChangeLog:
* config/mips/mips-protos.h (mips_emit_speculation_barrier): New
prototype.
* config/mips/mips.cc (speculation_barrier_libfunc): New static
variable.
(mips_init_libfuncs): Initialize it.
(mips_emit_speculation_barrier): New function.
* config/arm/arm.md (speculation_barrier): Call
mips_emit_speculation_barrier.

libgcc/ChangeLog:
* config/mips/lib1funcs.S: New file.
define __speculation_barrier and include mips16.S.
* config/mips/t-mips: define LIB1ASMSRC as mips/lib1funcs.S.
define LIB1ASMFUNCS as _speculation_barrier.
set version info for __speculation_barrier.
* config/mips/libgcc-mips.ver: New file.
* config/mips/t-mips16: don't define LIB1ASMSRC as mips16.S is
included in lib1funcs.S now.
---
 gcc/config/mips/mips-protos.h  |  2 +
 gcc/config/mips/mips.cc| 13 +++
 gcc/config/mips/mips.md| 12 ++
 libgcc/config/mips/lib1funcs.S | 60 ++
 libgcc/config/mips/libgcc-mips.ver | 21 +++
 libgcc/config/mips/t-mips  |  7 
 libgcc/config/mips/t-mips16|  3 +-
 7 files changed, 116 insertions(+), 2 deletions(-)
 create mode 100644 libgcc/config/mips/lib1funcs.S
 create mode 100644 libgcc/config/mips/libgcc-mips.ver

diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 20483469105..da7902c235b 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -388,4 +388,6 @@ extern void mips_register_frame_header_opt (void);
 extern void mips_expand_vec_cond_expr (machine_mode, machine_mode, rtx *);
 extern void mips_expand_vec_cmp_expr (rtx *);
 
+extern void mips_emit_speculation_barrier_function (void);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index ca822758b41..139707fda34 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -13611,6 +13611,9 @@ mips_autovectorize_vector_modes (vector_modes *modes, 
bool)
   return 0;
 }
 
+
+static GTY(()) rtx speculation_barrier_libfunc;
+
 /* Implement TARGET_INIT_LIBFUNCS.  */
 
 static void
@@ -13680,6 +13683,7 @@ mips_init_libfuncs (void)
   synchronize_libfunc = init_one_libfunc ("__sync_synchronize");
   init_sync_libfuncs (UNITS_PER_WORD);
 }
+  speculation_barrier_libfunc = init_one_libfunc ("__speculation_barrier");
 }
 
 /* Build up a multi-insn sequence that loads label TARGET into $AT.  */
@@ -19092,6 +19096,15 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn *insn, 
int *hilo_delay,
   }
 }
 
+/* Emit a speculation barrier.
+   JR.HB is needed, so we need to put
+   speculation_barrier_libfunc in libgcc */
+void
+mips_emit_speculation_barrier_function ()
+{
+  emit_library_call (speculation_barrier_libfunc, LCT_NORMAL, VOIDmode);
+}
+
 /* A SEQUENCE is breakable iff the branch inside it has a compact form
and the target has compact branches.  */
 
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index ac1d77afc7d..5d04ac566dd 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -160,6 +160,8 @@
   ;; The `.insn' pseudo-op.
   UNSPEC_INSN_PSEUDO
   UNSPEC_JRHB
+
+  VUNSPEC_SPECULATION_BARRIER
 ])
 
 (define_constants
@@ -7455,6 +7457,16 @@
   mips_expand_conditional_move (operands);
   DONE;
 })
+
+(define_expand "speculation_barrier"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_SPECULATION_BARRIER)]
+  ""
+  "
+  mips_emit_speculation_barrier_function ();
+  DONE;
+  "
+)
+
 
 ;;
 ;;  
diff --git a/libgcc/config/mips/lib1funcs.S b/libgcc/config/mips/lib1funcs.S
new file mode 100644
index 000..45d74e2e762
--- /dev/null
+++ b/libgcc/config/mips/lib1funcs.S
@@ -0,0 +1,60 @@
+/* Copyright (C) 1995-2023 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#include "mips16.S"
+
+#ifdef L_speculation_barrier
+
+/* MIPS16e1 has no sync/jr.hb instructions, and 

RISC-V: Eliminate redundant zero extension of minu/maxu operands

2023-04-28 Thread Jivan Hakobyan via Gcc-patches
RV64 the following code:

  unsigned Min(unsigned a, unsigned b) {
  return a < b ? a : b;
  }

Compiles to:
  Min:
   zext.w  a1,a1
   zext.w  a0,a0
   minua0,a1,a0
   sext.w  a0,a0
   ret

This patch removes unnecessary zero extensions of minu/maxu operands.

gcc/ChangeLog:

 * config/riscv/bitmanip.md: Added expanders for minu/maxu instructions

gcc/testsuite/ChangeLog:

   * gcc.target/riscv/zbb-min-max-02.c: Updated scanning check.
   * gcc.target/riscv/zbb-min-max-03.c: New tests.


--
With the best regards
Jivan Hakobyan
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 44ad350c747..8580bb37ba0 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -401,7 +401,30 @@
   DONE;
 })
 
-(define_insn "3"
+(define_expand "di3"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(bitmanip_minmax:DI (match_operand:DI 1 "register_operand" "r")
+(match_operand:DI 2 "register_operand" "r")))]
+  "TARGET_64BIT && TARGET_ZBB")
+
+(define_expand "si3"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
+(match_operand:SI 2 "register_operand" "r")))]
+  "TARGET_ZBB"
+{
+  if (TARGET_64BIT)
+{
+  rtx t = gen_reg_rtx (DImode);
+  operands[1] = force_reg (DImode, gen_rtx_SIGN_EXTEND (DImode, operands[1]));
+  operands[2] = force_reg (DImode, gen_rtx_SIGN_EXTEND (DImode, operands[2]));
+  emit_insn (gen_di3 (t, operands[1], operands[2]));
+  emit_move_insn (operands[0], gen_lowpart (SImode, t));
+  DONE;
+}
+})
+
+(define_insn "*3"
   [(set (match_operand:X 0 "register_operand" "=r")
 (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
 			   (match_operand:X 2 "reg_or_0_operand" "rJ")))]
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
index b462859f10f..edfbf807d45 100644
--- a/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64" } */
-/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Os" "-Oz" "-Og" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
 int f(unsigned int* a)
 {
@@ -9,6 +9,6 @@ int f(unsigned int* a)
 }
 
 /* { dg-final { scan-assembler-times "minu" 1 } } */
-/* { dg-final { scan-assembler-times "sext.w" 1 } } */
+/* { dg-final { scan-assembler-not "sext.w" } } */
 /* { dg-final { scan-assembler-not "zext.w" } } */
 
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c b/gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c
index c7de1004048..38c932b9580 100644
--- a/gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c
@@ -6,5 +6,18 @@ int f(int x) {
  return x >= 0 ? x : 0;
 }
 
+unsigned f2(unsigned x, unsigned y) {
+  return x > y ? x : y;
+}
+
+unsigned f3(unsigned x, unsigned y) {
+  return x < y ? x : y;
+}
+
 /* { dg-final { scan-assembler-times "max\t" 1 } } */
 /* { dg-final { scan-assembler-not "li\t" } } */
+/* { dg-final { scan-assembler-times "maxu\t" 1 } } */
+/* { dg-final { scan-assembler-times "minu\t" 1 } } */
+/* { dg-final { scan-assembler-not "zext.w" } } */
+/* { dg-final { scan-assembler-not "sext.w" } } */
+


[committed] libstdc++: Strip absolute paths from files shown in Doxygen docs

2023-04-28 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

This avoids showing absolute paths from the expansion of
@srcdir@/libsupc++/ in the doxygen File List view.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (STRIP_FROM_PATH): Remove prefixes
from header paths.
---
 libstdc++-v3/doc/doxygen/user.cfg.in | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index 75108604a07..14981c96f95 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -152,7 +152,7 @@ FULL_PATH_NAMES= NO
 # will be relative from the directory where doxygen is started.
 # This tag requires that the tag FULL_PATH_NAMES is set to YES.
 
-STRIP_FROM_PATH=
+STRIP_FROM_PATH= @srcdir@/doc/ @srcdir@/libsupc++/ include/
 
 # The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of the
 # path mentioned in the documentation of a class, which tells the reader which
@@ -837,6 +837,7 @@ WARN_LOGFILE   =
 # spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
 # Note: If this tag is empty the current directory is searched.
 
+# N.B. update STRIP_FROM_PATH to sanitize paths outside the build tree.
 INPUT  = @srcdir@/doc/doxygen/doxygroups.cc \
  @srcdir@/libsupc++/compare \
  @srcdir@/libsupc++/cxxabi.h \
-- 
2.40.0



[committed] libstdc++: Minor fixes to doxygen comments

2023-04-28 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/uses_allocator.h: Add missing @file comment.
* include/bits/regex.tcc: Remove stray doxygen comments.
* include/experimental/memory_resource: Likewise.
* include/std/bit: Tweak doxygen @cond comments.
* include/std/expected: Likewise.
* include/std/numbers: Likewise.
---
 libstdc++-v3/include/bits/regex.tcc   | 4 
 libstdc++-v3/include/bits/uses_allocator.h| 5 +
 libstdc++-v3/include/experimental/memory_resource | 2 --
 libstdc++-v3/include/std/bit  | 4 ++--
 libstdc++-v3/include/std/expected | 4 ++--
 libstdc++-v3/include/std/numbers  | 2 +-
 6 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/libstdc++-v3/include/bits/regex.tcc 
b/libstdc++-v3/include/bits/regex.tcc
index a58ed3f555f..6f0a486eb04 100644
--- a/libstdc++-v3/include/bits/regex.tcc
+++ b/libstdc++-v3/include/bits/regex.tcc
@@ -116,8 +116,6 @@ namespace __detail
   /// @endcond
 } // namespace __detail
 
-  /// @cond
-
   template
   template
 typename regex_traits<_Ch_type>::string_type
@@ -665,7 +663,5 @@ namespace __detail
_M_result = nullptr;
 }
 
-  /// @endcond
-
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
diff --git a/libstdc++-v3/include/bits/uses_allocator.h 
b/libstdc++-v3/include/bits/uses_allocator.h
index d1841bbee5d..d3b26c7d974 100644
--- a/libstdc++-v3/include/bits/uses_allocator.h
+++ b/libstdc++-v3/include/bits/uses_allocator.h
@@ -22,6 +22,11 @@
 // see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 // .
 
+/** @file include/bits/uses_allocator_args.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{memory}
+ */
+
 #ifndef _USES_ALLOCATOR_H
 #define _USES_ALLOCATOR_H 1
 
diff --git a/libstdc++-v3/include/experimental/memory_resource 
b/libstdc++-v3/include/experimental/memory_resource
index 070cf791c65..9f1cb42373e 100644
--- a/libstdc++-v3/include/experimental/memory_resource
+++ b/libstdc++-v3/include/experimental/memory_resource
@@ -45,14 +45,12 @@
 #include 
 #include 
 
-/// @cond
 namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template class malloc_allocator;
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace __gnu_cxx
-/// @endcond
 
 namespace std {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit
index 0c58971bd59..5eb40218be9 100644
--- a/libstdc++-v3/include/std/bit
+++ b/libstdc++-v3/include/std/bit
@@ -144,7 +144,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 #endif
 
-  /// @cond undoc
+  /// @cond undocumented
 
   template
 constexpr _Tp
@@ -374,7 +374,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #define __cpp_lib_bitops 201907L
 
-  /// @cond undoc
+  /// @cond undocumented
   template
 using _If_is_unsigned_integer
   = enable_if_t<__is_unsigned_integer<_Tp>::value, _Up>;
diff --git a/libstdc++-v3/include/std/expected 
b/libstdc++-v3/include/std/expected
index 058188248bb..c6d26b0d224 100644
--- a/libstdc++-v3/include/std/expected
+++ b/libstdc++-v3/include/std/expected
@@ -139,7 +139,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
   inline constexpr unexpect_t unexpect{};
 
-/// @cond undoc
+/// @cond undocumented
 namespace __expected
 {
   template
@@ -254,7 +254,7 @@ namespace __expected
 
   template unexpected(_Er) -> unexpected<_Er>;
 
-/// @cond undoc
+/// @cond undocumented
 namespace __expected
 {
   template
diff --git a/libstdc++-v3/include/std/numbers b/libstdc++-v3/include/std/numbers
index d9d202f5392..d7d9e81e540 100644
--- a/libstdc++-v3/include/std/numbers
+++ b/libstdc++-v3/include/std/numbers
@@ -49,7 +49,7 @@ namespace numbers
 {
 #define __cpp_lib_math_constants 201907L
 
-  /// @cond undoc
+  /// @cond undocumented
   template
 using _Enable_if_floating = enable_if_t, _Tp>;
   /// @endcond
-- 
2.40.0



[committed] libstdc++: Improve doxygen docs for

2023-04-28 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

Add @headerfile and @since tags. Add gamma_distribution to the correct
group (poisson distributions). Add a group for the sampling
distributions and add the missing definitions of their probability
functions. Add uniform_int_distribution back to the uniform
distributions group.

libstdc++-v3/ChangeLog:

* include/bits/random.h (gamma_distribution): Add to the right
doxygen group.
(discrete_distribution, piecewise_constant_distribution)
(piecewise_linear_distribution): Create a new doxygen group and
fix the incomplete doxygen comments.
* include/bits/uniform_int_dist.h (uniform_int_distribution):
Add to doxygen group.
---
 libstdc++-v3/include/bits/random.h   | 127 ++-
 libstdc++-v3/include/bits/uniform_int_dist.h |  11 ++
 2 files changed, 132 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/random.h 
b/libstdc++-v3/include/bits/random.h
index 42f37c1e77e..f77005adec5 100644
--- a/libstdc++-v3/include/bits/random.h
+++ b/libstdc++-v3/include/bits/random.h
@@ -256,6 +256,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* parameters @p __a and @p __c must be less than @p __m.
*
* The size of the state is @f$1@f$.
+   *
+   * @headerfile random
+   * @since C++11
*/
   template
 class linear_congruential_engine
@@ -471,6 +474,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @tparam __c  The second left-shift tempering matrix mask.
* @tparam __l  The second right-shift tempering matrix parameter.
* @tparam __f  Initialization multiplier.
+   *
+   * @headerfile random
+   * @since C++11
*/
   template
 class subtract_with_carry_engine
@@ -890,7 +899,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* Produces random numbers from some base engine by discarding blocks of
* data.
*
-   * 0 <= @p __r <= @p __p
+   * @pre @f$ 0 \leq r \leq p @f$
+   *
+   * @headerfile random
+   * @since C++11
*/
   template
 class discard_block_engine
@@ -1114,6 +1126,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /**
* Produces random numbers by combining random numbers from some base
* engine to produce random numbers with a specified number of bits @p __w.
+   *
+   * @headerfile random
+   * @since C++11
*/
   template
 class independent_bits_engine
@@ -1338,6 +1353,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*
* The values from the base engine are stored in a sequence of size @p __k
* and shuffled by an algorithm that depends on those values.
+   *
+   * @headerfile random
+   * @since C++11
*/
   template
 class shuffle_order_engine
@@ -1625,6 +1643,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /**
* A standard interface to a platform-specific non-deterministic
* random number generator (if any are available).
+   *
+   * @headerfile random
+   * @since C++11
*/
   class random_device
   {
@@ -1750,6 +1771,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* A continuous random distribution on the range [min, max) with equal
* probability throughout the range.  The URNG should be real-valued and
* deliver number in the range [0, 1).
+   *
+   * @headerfile random
+   * @since C++11
*/
   template
 class uniform_real_distribution
@@ -1984,6 +2008,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* p(x|\mu,\sigma) = \frac{1}{\sigma \sqrt{2 \pi}}
*e^{- \frac{{x - \mu}^ {2}}{2 \sigma ^ {2}} } 
* @f]
+   *
+   * @headerfile random
+   * @since C++11
*/
   template
 class normal_distribution
@@ -2208,6 +2235,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* p(x|m,s) = \frac{1}{sx\sqrt{2\pi}}
*\exp{-\frac{(\ln{x} - m)^2}{2s^2}} 
* @f]
+   *
+   * @headerfile random
+   * @since C++11
*/
   template
 class lognormal_distribution
@@ -2414,6 +2444,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return !(__d1 == __d2); }
 #endif
 
+  /// @} group random_distributions_normal
+
+  /**
+   * @addtogroup random_distributions_poisson Poisson Distributions
+   * @ingroup random_distributions
+   * @{
+   */
+
   /**
* @brief A gamma continuous distribution for random numbers.
*
@@ -2422,6 +2460,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* p(x|\alpha,\beta) = \frac{1}{\beta\Gamma(\alpha)}
* (x/\beta)^{\alpha - 1} e^{-x/\beta} 
* @f]
+   *
+   * @headerfile random
+   * @since C++11
*/
   template
 class gamma_distribution
@@ -2645,14 +2686,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  inline bool
  operator!=(const std::gamma_distribution<_RealType>& __d1,
const std::gamma_distribution<_RealType>& __d2)
-{ return !(__d1 == __d2); }
+ { return !(__d1 == __d2); }
 #endif
 
+  /// @} group random_distributions_poisson
+
+  /**
+   * @addtogroup random_distributions_normal Normal Distributions
+   * @ingroup random_distributions
+   * @{
+   */
+
   /**
* @brief A chi_squared_distribution random 

[committed] libstdc++: Simplify preprocessor/namespace nesting in

2023-04-28 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

There's no good reason to conditionally close and reopen namespace std
within an #if block. Just include the  header at the top
instead.

libstdc++-v3/ChangeLog:

* include/bits/move.h: Simplify opening/closing namespace std.
---
 libstdc++-v3/include/bits/move.h | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/libstdc++-v3/include/bits/move.h b/libstdc++-v3/include/bits/move.h
index 6bc70e8e724..4a8fceff96a 100644
--- a/libstdc++-v3/include/bits/move.h
+++ b/libstdc++-v3/include/bits/move.h
@@ -33,6 +33,8 @@
 #include 
 #if __cplusplus < 201103L
 # include 
+#else
+# include  // Brings in std::declval too.
 #endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -51,15 +53,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if __cplusplus >= 201103L
 
-_GLIBCXX_END_NAMESPACE_VERSION
-} // namespace
-
-#include  // Brings in std::declval too.
-
-namespace std _GLIBCXX_VISIBILITY(default)
-{
-_GLIBCXX_BEGIN_NAMESPACE_VERSION
-
   /**
*  @addtogroup utilities
*  @{
-- 
2.40.0



Re: [PATCH 3/5] genmatch: split shared code to gimple-match-exports.cc

2023-04-28 Thread Richard Biener via Gcc-patches
On Fri, 28 Apr 2023, Tamar Christina wrote:

> Hi All,
> 
> In preparation for automatically splitting match.pd files I split off the
> non-static helper functions that are shared between the match.pd functions off
> to another file.
> 
> This file can be compiled in parallel and also allows us to later avoid
> duplicate symbols errors.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR bootstrap/84402
>   * Makefile.in (OBJS): Add gimple-match-exports.o.
>   * genmatch.cc (decision_tree::gen): Export gimple_gimplify helpers.
>   * gimple-match-head.cc (gimple_simplify, gimple_resimplify1,
>   gimple_resimplify2, gimple_resimplify3, gimple_resimplify4,
>   gimple_resimplify5, constant_for_folding, convert_conditional_op,
>   maybe_resimplify_conditional_op, gimple_match_op::resimplify,
>   maybe_build_generic_op, build_call_internal, maybe_push_res_to_seq,
>   do_valueize, try_conditional_simplification, gimple_extract,
>   gimple_extract_op, canonicalize_code, commutative_binary_op_p,
>   commutative_ternary_op_p, first_commutative_argument,
>   associative_binary_op_p, directly_supported_p,
>   get_conditional_internal_fn): Moved to gimple-match-exports.cc
>   * gimple-match-exports.cc: New file.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 
> d8b76d83d6850c3ce5318f3acd7cdc2a8cbc140b..70559a014c0e32d8d825766e0c1516fc2ee05421
>  100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -223,6 +223,7 @@ libgcov-util.o-warn = -Wno-error
>  libgcov-driver-tool.o-warn = -Wno-error
>  libgcov-merge-tool.o-warn = -Wno-error
>  gimple-match.o-warn = -Wno-unused
> +gimple-match-exports.o-warn = -Wno-unused
>  generic-match.o-warn = -Wno-unused
>  dfp.o-warn = -Wno-strict-aliasing
>  
> @@ -1310,6 +1311,7 @@ ANALYZER_OBJS = \
>  # the last objects to finish building.
>  OBJS = \
>   gimple-match.o \
> + gimple-match-exports.o \
>   generic-match.o \
>   insn-attrtab.o \
>   insn-automata.o \
> @@ -2661,7 +2663,7 @@ s-tm-texi: build/genhooks$(build_exeext) 
> $(srcdir)/doc/tm.texi.in
> false; \
>   fi
>  
> -gimple-match.cc: s-match gimple-match-head.cc ; @true
> +gimple-match.cc: s-match gimple-match-head.cc gimple-match-exports.cc ; @true
>  generic-match.cc: s-match generic-match-head.cc ; @true
>  
>  s-match: build/genmatch$(build_exeext) $(srcdir)/match.pd cfn-operators.pd
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index 
> 1f52ca2eebc2794159747338babb56c610387f3b..716fb97aac4c3c2baae82e068df3ce158b9afee9
>  100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -3955,7 +3955,7 @@ decision_tree::gen (FILE *f, bool gimple)
>if (! has_kids_p)
>   {
> if (gimple)
> - fprintf (f, "\nstatic bool\n"
> + fprintf (f, "\nbool\n"
>   "gimple_simplify (gimple_match_op*, gimple_seq*,\n"
>   " tree (*)(tree), code_helper,\n"
>   " const tree");
> @@ -3978,7 +3978,7 @@ decision_tree::gen (FILE *f, bool gimple)
>/* Then generate the main entry with the outermost switch and
>   tail-calls to the split-out functions.  */
>if (gimple)
> - fprintf (f, "\nstatic bool\n"
> + fprintf (f, "\nbool\n"
>"gimple_simplify (gimple_match_op *res_op, gimple_seq *seq,\n"
>" tree (*valueize)(tree) ATTRIBUTE_UNUSED,\n"
>" code_helper code, const tree type");
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> new file mode 100644
> index 
> ..7aeb4ddb152468ddef19e6361428498371a1ebf2
> --- /dev/null
> +++ b/gcc/gimple-match-exports.cc
> @@ -0,0 +1,1253 @@
> +/* Helpers for the autogenerated gimple-match.cc file.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "target.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "ssa.h"
> +#include "cgraph.h"
> +#include "vec-perm-indices.h"
> 

Re: [PATCH 3/5] match.pd: CSE the dump output check.

2023-04-28 Thread Richard Biener via Gcc-patches
On Fri, 28 Apr 2023, Tamar Christina wrote:

> Hi All,
> 
> This is a small improvement in QoL codegen for match.pd to save time not
> re-evaluating the condition for printing debug information in every function.
> 
> There is a small but consistent runtime and compile time win here.  The 
> runtime
> win comes from not having to do the condition over again, and on Arm plaforms
> we now use the new test-and-branch support for booleans to only have a single
> instruction here.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR bootstrap/84402
>   * genmatch.cc (decision_tree::gen, write_predicate): Generate new
>   debug_dump var.
>   (dt_simplify::gen_1): Use it.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index 
> 6d62cdea2082d92e5ecc1102c80205115a4e3040..1f52ca2eebc2794159747338babb56c610387f3b
>  100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -3431,7 +3431,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
> operand *result)
>needs_label = true;
>  }
>  
> -  fprintf_indent (f, indent, "if (UNLIKELY (dump_file && (dump_flags & 
> TDF_FOLDING))) "
> +  fprintf_indent (f, indent, "if (UNLIKELY (debug_dump)) "
>  "fprintf (dump_file, \"%s ",
>  s->kind == simplify::SIMPLIFY
>  ? "Applying pattern" : "Matching expression");
> @@ -3892,6 +3892,8 @@ decision_tree::gen (FILE *f, bool gimple)
>   }
>  
>fprintf (f, ")\n{\n");
> +  fprintf_indent (f, 2, "const bool debug_dump = "
> + "dump_file && (dump_flags & TDF_FOLDING);\n");
>s->s->gen_1 (f, 2, gimple, s->s->s->result);
>if (gimple)
>   fprintf (f, "  return false;\n");
> @@ -3937,6 +3939,8 @@ decision_tree::gen (FILE *f, bool gimple)
>   fprintf (f, ", tree _p%d", i);
> fprintf (f, ")\n");
> fprintf (f, "{\n");
> +   fprintf_indent (f, 2, "const bool debug_dump = "
> + "dump_file && (dump_flags & TDF_FOLDING);\n");
> dop->gen_kids (f, 2, gimple, 0);
> if (gimple)
>   fprintf (f, "  return false;\n");
> @@ -4046,6 +4050,8 @@ write_predicate (FILE *f, predicate_id *p, 
> decision_tree , bool gimple)
>  gimple ? ", tree (*valueize)(tree) ATTRIBUTE_UNUSED" : "");
>/* Conveniently make 'type' available.  */
>fprintf_indent (f, 2, "const tree type = TREE_TYPE (t);\n");
> +  fprintf_indent (f, 2, "const bool debug_dump = "
> + "dump_file && (dump_flags & TDF_FOLDING);\n");
>  
>if (!gimple)
>  fprintf_indent (f, 2, "if (TREE_SIDE_EFFECTS (t)) return false;\n");
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH 2/5] match.pd: Remove commented out line pragmas unless -vv is used.

2023-04-28 Thread Richard Biener via Gcc-patches
On Fri, 28 Apr 2023, Tamar Christina wrote:

> Hi All,
> 
> genmatch currently outputs commented out line directives that have no effect
> but the compiler still has to parse only to discard.
> 
> They are however handy when debugging genmatch output.  As such this moves 
> them
> behind the -vv flag.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR bootstrap/84402
>   * genmatch.cc (output_line_directive): Only emit commented directive
>   when -vv.
>   (main): Initialize verbose.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index 
> 638606b2502f640e59527fc5a0b23fa3bedd0cee..6d62cdea2082d92e5ecc1102c80205115a4e3040
>  100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -209,7 +209,7 @@ output_line_directive (FILE *f, location_t location,
>else
>   fprintf (f, "%s:%d", file, loc.line);
>  }
> -  else
> +  else if (verbose == 2)
>  /* Other gen programs really output line directives here, at least for
> development it's right now more convenient to have line information
> from the generated file.  Still keep the directives as comment for now
> @@ -5221,6 +5221,7 @@ main (int argc, char **argv)
>  return 1;
>  
>bool gimple = true;
> +  verbose = 0;

That's redundant - globals are default zero-initialized.

OK with removing this line.

Thanks,
Richard.



Re: [PATCH] v2: Add targetm.libm_function_max_error

2023-04-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Apr 28, 2023 at 12:29:58PM +0100, Richard Sandiford wrote:
> Jakub Jelinek via Gcc-patches  writes:
> > Hi!
> >
> > On Thu, Apr 27, 2023 at 10:34:59AM +, Richard Biener wrote:
> >> OK. As said the patch itself looks good to me, let's go ahead.  We
> >> have plenty of time to backtrack until GCC 14.
> >
> > Thanks.  Unfortunately when I started using it, I've discovered that the
> > CASE_CFN_xxx_ALL macros don't include the CFN_xxx cases, just
> > CFN_BUILT_IN_xxx* cases.
> >
> > So here is an updated version of the patch I'll bootstrap/regtest tonight
> > which instead uses CASE_CFN_xxx: CASE_CFN_xxx_FN:
> 
> Shouldn't we change something in that case?  The point of these macros
> is to wrap things up a single easy-to-use name, so something feels wrong
> if we're having to use a repeated pattern like this.

Maybe.  But unfortunately not all builtins have those CFN_xxx enumerators,
some have just CFN_BUILT_IN_xxx{,L,F}, otherwise have
CFN_BUILT_IN_xxx{,L,F,F16,F32,F64,F128} and others have that plus CFN_xxx.
So we'd perhaps need some other macros for the all but CFN_xxx and perhaps
use ALL only for the cases where it is really all of them.

Jakub



[PATCH] ipa/109652 - ICE in modification phase of IPA SRA

2023-04-28 Thread Richard Biener via Gcc-patches
There's another questionable IL transform by IPA SRA, replacing
foo (p_1(D)->x) with foo (VIEW_CONVERT  (ISRA.PARM.1))
where ISRA.PARM.1 is a register.  Conversion of a register to
an aggregate type is questionable but not entirely unreasonable
and not within the set of IL I am rejecting when fixing PR109644.

The following lets this slip through in IPA SRA transform by
restricting re-gimplification to the case of register type
results.  To not break the previous testcase again we need to
optimize the BIT_FIELD_REF , ...> case
to elide the conversion.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

PR ipa/109652
* ipa-param-manipulation.cc
(ipa_param_body_adjustments::modify_expression): Allow
conversion of a register to a non-register type.  Elide
conversions inside BIT_FIELD_REFs.

* gcc.dg/torture/pr109652.c: New testcase.
---
 gcc/ipa-param-manipulation.cc   |  7 +++--
 gcc/testsuite/gcc.dg/torture/pr109652.c | 40 +
 2 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr109652.c

diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
index 33dcab9c33c..a286af7f5d9 100644
--- a/gcc/ipa-param-manipulation.cc
+++ b/gcc/ipa-param-manipulation.cc
@@ -1836,9 +1836,11 @@ ipa_param_body_adjustments::modify_expression (tree 
*expr_p, bool convert,
   || TREE_CODE (expr) == IMAGPART_EXPR
   || TREE_CODE (expr) == REALPART_EXPR)
 {
+  /* For a BIT_FIELD_REF do not bother to VIEW_CONVERT the base,
+instead reference the replacement directly.  */
+  convert = TREE_CODE (expr) != BIT_FIELD_REF;
   expr_p = _OPERAND (expr, 0);
   expr = *expr_p;
-  convert = true;
 }
 
   ipa_param_body_replacement *pbr = get_expr_replacement (expr, false);
@@ -1861,7 +1863,8 @@ ipa_param_body_adjustments::modify_expression (tree 
*expr_p, bool convert,
   gcc_checking_assert (tree_to_shwi (TYPE_SIZE (TREE_TYPE (expr)))
   == tree_to_shwi (TYPE_SIZE (TREE_TYPE (repl;
   tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (expr), repl);
-  if (is_gimple_reg (repl))
+  if (is_gimple_reg (repl)
+ && is_gimple_reg_type (TREE_TYPE (expr)))
{
  gcc_assert (extra_stmts);
  vce = force_gimple_operand (vce, extra_stmts, true, NULL_TREE);
diff --git a/gcc/testsuite/gcc.dg/torture/pr109652.c 
b/gcc/testsuite/gcc.dg/torture/pr109652.c
new file mode 100644
index 000..8a6524d2212
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr109652.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+
+typedef int UInt;
+UInt skeletal_RI5_instr;
+__attribute__((__noreturn__)) void vex_assert_fail();
+typedef struct {
+  union {
+struct {
+  UInt imm5;
+} I5;
+  } ARMri5;
+} ARMRI5;
+typedef enum { ARMin_Alu, ARMin_Shift } ARMInstrTag;
+void iregEnc();
+static UInt skeletal_RI5(ARMRI5 *ri) {
+  UInt imm5 = ri->ARMri5.I5.imm5;
+  __builtin_expect(imm5, 1) ?: vex_assert_fail();
+  iregEnc(ri->ARMri5);
+  return skeletal_RI5_instr;
+}
+ARMInstrTag emit_ARMInstr_i_0;
+void *emit_ARMInstr_disp_cp_chain_me_to_slowEP() {
+  switch (emit_ARMInstr_i_0) {
+  case ARMin_Alu:
+UInt instr, subopc;
+UInt rD, rN;
+goto bad;
+instr |= subopc | rN;
+  case ARMin_Shift:
+rD = 0;
+UInt rM = 0;
+ARMRI5 argR;
+instr = skeletal_RI5();
+instr |= rD | rM;
+goto done;
+  }
+bad:
+done:
+  return 0;
+}
-- 
2.35.3


[PATCH 10/10] arm testsuite: Shifts and get_FPSCR ACLE optimisation fixes

2023-04-28 Thread Andrea Corallo via Gcc-patches
From: Stam Markianos-Wright 

These newly updated tests were rewritten by Andrea. Some of them
needed further manual fixing as follows:

* The #shift immediate value not in the check-function-bodies as expected
* Some shifts getting optimised to mov immediates, e.g.
  `uqshll (1, 1);` -> movsr0, #2; movsr1, #0
* The ACLE was specifying sub-optimal code: lsr+and instead of ubfx. In
  this case the test rewritten from the ACLE had the lsr+and pattern,
  but the compiler was able to optimise to ubfx. Hence I've changed the
  test to now match on ubfx.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/srshr.c: Update shift value.
* gcc.target/arm/mve/intrinsics/srshrl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/uqshl.c: Update shift value and mov imm.
* gcc.target/arm/mve/intrinsics/uqshll.c: Update shift value and mov 
imm.
* gcc.target/arm/mve/intrinsics/urshr.c: Update shift value.
* gcc.target/arm/mve/intrinsics/urshrl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/vadciq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_u32.c: Update to ubfx.
---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c   | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshll.c  | 5 +++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/urshr.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/urshrl.c  | 4 ++--
 .../gcc.target/arm/mve/intrinsics/vadciq_m_s32.c  | 8 ++--
 .../gcc.target/arm/mve/intrinsics/vadciq_m_u32.c  | 8 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_s32.c  | 8 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_u32.c  | 8 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_s32.c | 8 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_u32.c | 8 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_s32.c   | 8 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_u32.c   | 8 ++--
 .../gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c  | 8 ++--
 .../gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c  | 8 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_s32.c  | 8 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_u32.c  | 8 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c | 8 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c | 8 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_s32.c   | 8 ++--
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_u32.c   | 8 ++--
 22 files changed, 43 insertions(+), 106 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
index 94e3f42fd33..734375d58c0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** srshr   (?:ip|fp|r[0-9]+), #shift(?:@.*|)
+** srshr   (?:ip|fp|r[0-9]+), #1(?:@.*|)
 ** ...
 */
 int32_t
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
index 65f28ccbfde..a91943c38a0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** srshrl  (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #shift(?: @.*|)
+** srshrl  (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #1(?: @.*|)
 ** ...
 */
 int64_t
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
index b23c9d97ba6..58aa7a61e42 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
+++ 

[PATCH 02/10] arm: Fix vstrwq* backend + testsuite

2023-04-28 Thread Andrea Corallo via Gcc-patches
Hi all,

this patch fixes the vstrwq* MVE instrinsics failing to emit the
correct sequence of instruction due to a missing predicates. Also the
immediate range is fixed to be multiples of 2 up between [-252, 252].

Best Regards

  Andrea

gcc/ChangeLog:

* config/arm/constraints.md (mve_vldrd_immediate): Move it to
predicates.md.
(Ri): Move constraint definition from predicates.md.
(Rl): Define new constraint.
* config/arm/mve.md (mve_vstrwq_scatter_base_wb_p_v4si): Add
missing constraint.
(mve_vstrwq_scatter_base_wb_p_fv4sf): Add missing Up constraint
for op 1, use mve_vstrw_immediate predicate and Rl constraint for
op 2. Fix asm output spacing.
(mve_vstrdq_scatter_base_wb_p_v2di): Add missing constraint.
* config/arm/predicates.md (Ri) Move constraint to constraints.md
(mve_vldrd_immediate): Move it from
constraints.md.
(mve_vstrw_immediate): New predicate.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vstrwq_f32.c: Use
check-function-bodies instead of scan-assembler checks.  Use
extern "C" for C++ testing.
* gcc.target/arm/mve/intrinsics/vstrwq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_u32.c: Likewise.
---
 gcc/config/arm/constraints.md | 20 --
 gcc/config/arm/mve.md | 10 ++---
 gcc/config/arm/predicates.md  | 14 +++
 .../arm/mve/intrinsics/vstrwq_f32.c   | 32 ---
 .../arm/mve/intrinsics/vstrwq_p_f32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_p_s32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_p_u32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_s32.c   | 32 ---
 .../mve/intrinsics/vstrwq_scatter_base_f32.c  | 28 +++--
 .../intrinsics/vstrwq_scatter_base_p_f32.c| 36 +++--
 .../intrinsics/vstrwq_scatter_base_p_s32.c| 36 +++--
 .../intrinsics/vstrwq_scatter_base_p_u32.c| 36 +++--
 .../mve/intrinsics/vstrwq_scatter_base_s32.c  | 28 +++--
 .../mve/intrinsics/vstrwq_scatter_base_u32.c  | 28 +++--
 .../intrinsics/vstrwq_scatter_base_wb_f32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_f32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_s32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_u32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_s32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_base_wb_u32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_offset_f32.c| 32 ---
 .../intrinsics/vstrwq_scatter_offset_p_f32.c  | 40 ---
 

[PATCH 04/10] arm: Stop vadcq, vsbcq intrinsics from overwriting the FPSCR NZ flags

2023-04-28 Thread Andrea Corallo via Gcc-patches
From: Stam Markianos-Wright 

Hi all,

We noticed that calls to the vadcq and vsbcq intrinsics, both of
which use __builtin_arm_set_fpscr_nzcvqc to set the Carry flag in
the FPSCR, would produce the following code:

```
< r2 is the *carry input >
vmrsr3, FPSCR_nzcvqc
bic r3, r3, #536870912
orr r3, r3, r2, lsl #29
vmsrFPSCR_nzcvqc, r3
```

when the MVE ACLE instead gives a different instruction sequence of:
```
< Rt is the *carry input >
VMRS Rs,FPSCR_nzcvqc
BFI Rs,Rt,#29,#1
VMSR FPSCR_nzcvqc,Rs
```

the bic + orr pair is slower and it's also wrong, because, if the
*carry input is greater than 1, then we risk overwriting the top two
bits of the FPSCR register (the N and Z flags).

This turned out to be a problem in the header file and the solution was
to simply add a `& 1x0u` to the `*carry` input: then the compiler knows
that we only care about the lowest bit and can optimise to a BFI.

Ok for trunk?

Thanks,
Stam Markianos-Wright

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vadcq_s32): Fix arithmetic.
(__arm_vadcq_u32): Likewise.
(__arm_vadcq_m_s32): Likewise.
(__arm_vadcq_m_u32): Likewise.
(__arm_vsbcq_s32): Likewise.
(__arm_vsbcq_u32): Likewise.
(__arm_vsbcq_m_s32): Likewise.
(__arm_vsbcq_m_u32): Likewise.
---
 gcc/config/arm/arm_mve.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 1262d668121..8778216304b 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -16055,7 +16055,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vadcq_sv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16065,7 +16065,7 @@ __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   uint32x4_t __res = __builtin_mve_vadcq_uv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16075,7 +16075,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, 
unsigned * __carry, mve_pred16_t __p)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vadcq_m_sv4si (__inactive, __a, __b, __p);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16085,7 +16085,7 @@ __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, 
unsigned * __carry, mve_pred16_t __p)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   uint32x4_t __res =  __builtin_mve_vadcq_m_uv4si (__inactive, __a, __b, __p);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16131,7 +16131,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vsbcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vsbcq_sv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16141,7 +16141,7 @@ __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vsbcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   uint32x4_t __res = 

  1   2   >