[PATCH] Add a new conversion for conditional ternary set into ifcvt [PR106536]

2022-11-22 Thread HAO CHEN GUI via Gcc-patches
Hi,
  There is a new insn on my target, which has a nested if_then_else and
set -1, 0 and 1 according to a comparison.

   [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
 (if_then_else:SI (lt (match_operand:CC 1 "cc_reg_operand" "y")
  (const_int 0))
  (const_int -1)
  (if_then_else (gt (match_dup 1)
(const_int 0))
(const_int 1)
(const_int 0]

  In ifcvt pass, it probably contains a comparison, a branch, a setcc
and a constant set.

8: r122:CC=cmp(r120:DI#0,r121:DI#0)
9: pc={(r122:CC<0)?L29:pc}

   14: r118:SI=r122:CC>0

   29: L29:
5: r118:SI=0x

  This patch adds the new conversion into ifcvt and convert this kind of
branch into a nested if-then-else insn if the target supports such
pattern.

  HAVE_ternary_conditional_set indicates if the target has such nested
if-then-else insn. It's set in genconfig. noce_try_ternary_cset will be
executed to detect suitable pattern and convert it to the nested
if-then-else insn if HAVE_ternary_conditional_set is set. The hook
TARGET_NOCE_TERNARY_CSET_P detects target specific pattern and output
conditions and setting integers for the nested if-then-else.

  Bootstrapped and tested on powerpc64-linux BE/LE and x86 with no
regressions. Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-11-23  Haochen Gui 

gcc/
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_NOCE_TERNARY_CSET_P): Document new hook.
* genconfig.cc (have_ternary_cset_flag): New.
(walk_insn_part): Detect nested if-then-else with const_int setting
and set have_ternary_cset_flag.
(HAVE_ternary_conditional_set): Define.
* ifcvt.cc (noce_emit_ternary_cset): New function to emit nested
if-then-else insns.
(noce_try_ternary_cset): Detect ternary conditional set and emit the
insn.
(noce_process_if_block): Try to do ternary condition set convertion
when a target supports ternary conditional set insn.
* target.def (noce_ternary_cset_p): New hook.
* targhooks.cc (default_noce_ternary_cset_p): New function.
* targhooks.h (default_noce_ternary_cset_p): New declare.


patch.diff
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 92bda1a7e14..9823eccbe68 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -7094,6 +7094,15 @@ the @code{POLY_VALUE_MIN}, @code{POLY_VALUE_MAX} and
 implementation returns the lowest possible value of @var{val}.
 @end deftypefn

+@deftypefn {Target Hook} bool TARGET_NOCE_TERNARY_CSET_P (struct noce_if_info 
*@var{if_info}, rtx *@var{outer_cond}, rtx *@var{inner_cond}, int *@var{int1}, 
int *@var{int2}, int *@var{int3})
+This hook returns true if the if-then-else-join blocks describled in
+@code{if_info} can be converted to a ternary conditional set implemented by
+a nested if-then-else insn.  The @code{int1}, @code{int2} and @code{int3}
+are three possible results of the nested if-then-else insn.
+@code{outer_cond} and @code{inner_cond} are the conditions for outer and
+if-then-else.
+@end deftypefn
+
 @node Scheduling
 @section Adjusting the Instruction Scheduler

diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 112462310b1..1d6f28cc50a 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4631,6 +4631,8 @@ Define this macro if a non-short-circuit operation 
produced by

 @hook TARGET_ESTIMATED_POLY_VALUE

+@hook TARGET_NOCE_TERNARY_CSET_P
+
 @node Scheduling
 @section Adjusting the Instruction Scheduler

diff --git a/gcc/genconfig.cc b/gcc/genconfig.cc
index b7c6b48eec6..902c832cf5a 100644
--- a/gcc/genconfig.cc
+++ b/gcc/genconfig.cc
@@ -33,6 +33,7 @@ static int max_recog_operands;  /* Largest operand number 
seen.  */
 static int max_dup_operands;/* Largest number of match_dup in any insn.  */
 static int max_clobbers_per_insn;
 static int have_cmove_flag;
+static int have_ternary_cset_flag;
 static int have_cond_exec_flag;
 static int have_lo_sum_flag;
 static int have_rotate_flag;
@@ -136,6 +137,12 @@ walk_insn_part (rtx part, int recog_p, int non_pc_set_src)
  && GET_CODE (XEXP (part, 1)) == MATCH_OPERAND
  && GET_CODE (XEXP (part, 2)) == MATCH_OPERAND)
have_cmove_flag = 1;
+  else if (recog_p && non_pc_set_src
+  && GET_CODE (XEXP (part, 1)) == CONST_INT
+  && GET_CODE (XEXP (part, 2)) == IF_THEN_ELSE
+  && GET_CODE (XEXP (XEXP (part, 2), 1)) == CONST_INT
+  && GET_CODE (XEXP (XEXP (part, 2), 2)) == CONST_INT)
+   have_ternary_cset_flag = 1;
   break;

 case COND_EXEC:
@@ -328,6 +335,11 @@ main (int argc, const char **argv)
   else
 printf ("#define HAVE_conditional_move 0\n");

+  if (have_ternary_cset_flag)
+printf ("#define HAVE_ternary_conditional_set 1\n");
+  else
+printf 

[PATCH v1] LoongArch: Fixed a compilation failure with '%c' in inline assembly [PR107731].

2022-11-22 Thread Lulu Cheng
gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_classify_address):
Add precessint for CONST_INT.
(loongarch_print_operand): Increase the processing of '%c'.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/tst-asm-const.c: Moved to...
* gcc.target/loongarch/pr107731.c: ...here.
---
 gcc/config/loongarch/loongarch.cc  | 14 ++
 .../loongarch/{tst-asm-const.c => pr107731.c}  |  6 +++---
 2 files changed, 17 insertions(+), 3 deletions(-)
 rename gcc/testsuite/gcc.target/loongarch/{tst-asm-const.c => pr107731.c} (78%)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 8d5d8d965dd..7f02b0bab23 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2029,6 +2029,11 @@ loongarch_classify_address (struct 
loongarch_address_info *info, rtx x,
   return (loongarch_valid_base_register_p (info->reg, mode, strict_p)
  && loongarch_valid_lo_sum_p (info->symbol_type, mode,
   info->offset));
+case CONST_INT:
+  /* Small-integer addresses don't occur very often, but they
+are legitimate if $r0 is a valid base register.  */
+  info->type = ADDRESS_CONST_INT;
+  return IMM12_OPERAND (INTVAL (x));
 
 default:
   return false;
@@ -4889,6 +4894,7 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool 
hi64_part,
 
'A' Print a _DB suffix if the memory model requires a release.
'b' Print the address of a memory operand, without offset.
+   'c'  print an integer.
'C' Print the integer branch condition for comparison OP.
'd' Print CONST_INT OP in decimal.
'F' Print the FPU branch condition for comparison OP.
@@ -4935,6 +4941,14 @@ loongarch_print_operand (FILE *file, rtx op, int letter)
fputs ("_db", file);
   break;
 
+case 'c':
+  if (CONST_INT_P (op))
+   fprintf (file, HOST_WIDE_INT_PRINT_DEC, INTVAL (op));
+  else
+   output_operand_lossage ("unsupported operand for code '%c'", letter);
+
+  break;
+
 case 'C':
   loongarch_print_int_branch_condition (file, code, letter);
   break;
diff --git a/gcc/testsuite/gcc.target/loongarch/tst-asm-const.c 
b/gcc/testsuite/gcc.target/loongarch/pr107731.c
similarity index 78%
rename from gcc/testsuite/gcc.target/loongarch/tst-asm-const.c
rename to gcc/testsuite/gcc.target/loongarch/pr107731.c
index 2e04b99e301..80d84c48c6e 100644
--- a/gcc/testsuite/gcc.target/loongarch/tst-asm-const.c
+++ b/gcc/testsuite/gcc.target/loongarch/pr107731.c
@@ -1,13 +1,13 @@
-/* Test asm const. */
 /* { dg-do compile } */
 /* { dg-final { scan-assembler-times "foo:.*\\.long 1061109567.*\\.long 52" 1 
} } */
+
 int foo ()
 {
   __asm__ volatile (
   "foo:"
   "\n\t"
- ".long %a0\n\t"
- ".long %a1\n\t"
+ ".long %c0\n\t"
+ ".long %c1\n\t"
  :
  :"i"(0x3f3f3f3f), "i"(52)
  :
-- 
2.31.1



Re: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

2022-11-22 Thread Jeff Law via Gcc-patches



On 11/22/22 14:20, Eugene Rozenfeld wrote:

I took another look at this. We actually collect perf data when building the 
libraries. So, we have ./prev-gcc/perf.data, ./prev-libcpp/perf.data, 
./prev-libiberty/perf.data, etc. But when creating gcov data for  
-fauto-profile build of cc1plus or cc1 we only use ./prev-gcc/perf.data . So, a 
better solution would be either having a single perf.data for all builds (gcc 
and libraries) or merging perf.data files before attempting autostagefeedback. 
What would you recommend?


ISTM that if neither approach loses data, then they're functionally 
equivalent -- meaning that we can select whichever is easier to wire 
into our build system.


A single perf.data might serialize the build.  So perhaps separate, then 
merge right before autostagefeedback.



But I'm willing to go with whatever you think is best.

Jeff




Re: [PATCH v3] RISC-V modified add3 for large stack frame optimization [PR105733]

2022-11-22 Thread Jeff Law via Gcc-patches



On 11/3/22 18:53, Kevin Lee wrote:

This is the identical patch with 
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604814.html, but with 
the correct plaintext format.


The loop still seems a bit odd which may point to further improvements
that could be made to this patch.  Consider this fragment of the loop:

Thank you for the review Jeff! I am currently looking into this issue
in a different patch. I'll come back with some improvement.
  
gcc/ChangeLog:

Jim Wilson 
Michael Collison 
Kevin Lee 

	* config/riscv/predicates.md (const_lui_operand): New Predicate.

(add_operand): Ditto.
(reg_or_const_int_operand): Ditto.
* config/riscv/riscv-protos.h (riscv_eliminable_reg): New
function.
* config/riscv/riscv-selftests.cc (calculate_x_in_sequence):
Consider Parallel insns.
* config/riscv/riscv.cc (riscv_eliminable_reg): New function.
(riscv_adjust_libcall_cfi_prologue): Use gen_rtx_SET and
gen_rtx_fmt_ee instead of gen_add3_insn.
(riscv_adjust_libcall_cfi_epilogue): Ditto.
* config/riscv/riscv.md (addsi3): Remove.
(add3): New instruction for large stack frame
optimization.
(add3_internal): Ditto.
(adddi3): Remove.
(add3_internal2): New instruction for insns generated in
the prologue and epilogue pass.
---

diff --git a/gcc/config/riscv/riscv-selftests.cc 
b/gcc/config/riscv/riscv-selftests.cc
index 636874ebc0f..50457db708e 100644
--- a/gcc/config/riscv/riscv-selftests.cc
+++ b/gcc/config/riscv/riscv-selftests.cc
@@ -116,6 +116,9 @@ calculate_x_in_sequence (rtx reg)
rtx pat = PATTERN (insn);
rtx dest = SET_DEST (pat);
  
+  if (GET_CODE (pat) == PARALLEL)

+   dest = SET_DEST (XVECEXP (pat, 0, 0));


So this assumes you've got a parallel where the first vector is a SET.  
That may well be true, but it's probably safer to verify with something like



    gcc_assert (GET_CODE (XVECEXP (pat, 0, 0)) == SET)


That way we're not surprised in the future if we have more patterns that 
use PARALLEL, perhaps with the first not being a simple set.




+{
+  return REG_P (x) && (REGNO (x) == FRAME_POINTER_REGNUM
+  || REGNO (x) == ARG_POINTER_REGNUM
+  || (REGNO (x) >= FIRST_VIRTUAL_REGISTER
+  && REGNO (x) <= LAST_VIRTUAL_REGISTER));


Instead I'd write it as


  return (REG_P (x)
  && (REGNO (x) == FRAME_POINTER_REGNUM
      || REGNO (x) == ARG_POINTER_REGNUM
          || (REGNO (x) >= FIRST_VIRUTAL_REGISTER
  && REGNO (x) <= LAST_VIRTUAL_REGISTER)));


That's just the style most GCC folks are used to reading.



@@ -4887,8 +4897,9 @@ riscv_adjust_libcall_cfi_prologue ()
}
  
/* Debug info for adjust sp.  */

-  adjust_sp_rtx = gen_add3_insn (stack_pointer_rtx,
-stack_pointer_rtx, GEN_INT (-saved_size));
+  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
+   gen_rtx_fmt_ee (PLUS, GET_MODE 
(stack_pointer_rtx),
+ stack_pointer_rtx, GEN_INT 
(saved_size)));
dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
  dwarf);
return dwarf;
@@ -4990,8 +5001,9 @@ riscv_adjust_libcall_cfi_epilogue ()
int saved_size = cfun->machine->frame.save_libcall_adjustment;
  
/* Debug info for adjust sp.  */

-  adjust_sp_rtx = gen_add3_insn (stack_pointer_rtx,
-stack_pointer_rtx, GEN_INT (saved_size));
+  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
+   gen_rtx_fmt_ee (PLUS, GET_MODE 
(stack_pointer_rtx),
+ stack_pointer_rtx, GEN_INT 
(saved_size)));
dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
  dwarf);


I think this duplicates a change from Phillip & his team. This as to fix 
ICEs in the dwarf2 CFI generator, right?  Please double check as remove 
if it does duplicate a change from Philipp & his team.





+
+(define_insn_and_split "add3_internal"
+  [(set (match_operand:GPR  0 "register_operand" "=r,r,,!")
+   (plus:GPR (match_operand:GPR 1 "register_operand" " %r,r,r,0")
+ (match_operand:GPR 2 "add_operand"  " r,I,L,L")))
+  (clobber (match_scratch:GPR 3 "=X,X,X,"))]
+  ""
+{
+  if ((which_alternative == 2) || (which_alternative == 3))
+return "#";
+
+  if (TARGET_64BIT && mode == SImode)
+return "add%i2w\t%0,%1,%2";
+  else
+return "add%i2\t%0,%1,%2";
+}
+  "&& reload_completed && const_lui_operand (operands[2], mode)"
+  [(const_int 0)]
+{
+  if (REGNO (operands[0]) != REGNO (operands[1]))
+{
+  emit_insn (gen_mov (operands[0], operands[2]));
+  emit_insn (gen_add3_internal (operands[0], operands[0], 
operands[1]));
+}
+  else
+{
+  emit_insn (gen_mov (operands[3], 

Re: [committed][RISC-V] Fix recent rvv/base/spill testcase failures

2022-11-22 Thread Kito Cheng via Gcc-patches
LGTM, thanks :)

On Wed, Nov 23, 2022 at 7:21 AM Jeff Law  wrote:
>
> As Jaiwei noted, many (all?) of the rvv/base/spill tests started failing
> after the introduction of shrink-wrapping.
>
>
> The core issue is we're expecting the frame to have a constant size, but
> it doesn't.  So when using the to_constant method we abort.
>
>
> The safest thing to do is to set no shrink-wrapping components when the
> frame size is not fixed.  We might be able to do better later -- iff we
> know the offset to the GPRs/FPRs is fixed and fits into the appropriate
> number of bits.
>
>
> Bootstrapped and regression tested (C-only) on riscv64-linux-gnu.  As
> expected, it fixes a bucketload of failures in rvv/base/spill-*.c.
>
>
> Installed on the trunk,
>
> Jeff


Re: [pushed][PATCH v3] LoongArch: Add prefetch instructions.

2022-11-22 Thread chenglulu

Pushed r13-4259.

在 2022/11/16 10:10, Lulu Cheng 写道:

v2 -> v3:
1. Remove preldx support.

---
Enable sw prefetching at -O3 and higher.

Co-Authored-By: xujiahao 

gcc/ChangeLog:

* config/loongarch/constraints.md (ZD): New constraint.
* config/loongarch/loongarch-def.c: Initial number of parallel prefetch.
* config/loongarch/loongarch-tune.h (struct loongarch_cache):
Define number of parallel prefetch.
* config/loongarch/loongarch.cc (loongarch_option_override_internal):
Set up parameters to be used in prefetching algorithm.
* config/loongarch/loongarch.md (prefetch): New template.
---
  gcc/config/loongarch/constraints.md   | 10 ++
  gcc/config/loongarch/loongarch-def.c  |  2 ++
  gcc/config/loongarch/loongarch-tune.h |  1 +
  gcc/config/loongarch/loongarch.cc | 28 +++
  gcc/config/loongarch/loongarch.md | 14 ++
  5 files changed, 55 insertions(+)

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index 43cb7b5f0f5..46f7f63ae31 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -86,6 +86,10 @@
  ;;"ZB"
  ;;  "An address that is held in a general-purpose register.
  ;;  The offset is zero"
+;;"ZD"
+;; "An address operand whose address is formed by a base register
+;;  and offset that is suitable for use in instructions with the same
+;;  addressing mode as @code{preld}."
  ;; "<" "Matches a pre-dec or post-dec operand." (Global non-architectural)
  ;; ">" "Matches a pre-inc or post-inc operand." (Global non-architectural)
  
@@ -190,3 +194,9 @@ (define_memory_constraint "ZB"

The offset is zero"
(and (match_code "mem")
 (match_test "REG_P (XEXP (op, 0))")))
+
+(define_address_constraint "ZD"
+  "An address operand whose address is formed by a base register
+   and offset that is suitable for use in instructions with the same
+   addressing mode as @code{preld}."
+   (match_test "loongarch_12bit_offset_address_p (op, mode)"))
diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index cbf995d81b5..80ab10a52a8 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -62,11 +62,13 @@ loongarch_cpu_cache[N_TUNE_TYPES] = {
.l1d_line_size = 64,
.l1d_size = 64,
.l2d_size = 256,
+  .simultaneous_prefetches = 4,
},
[CPU_LA464] = {
.l1d_line_size = 64,
.l1d_size = 64,
.l2d_size = 256,
+  .simultaneous_prefetches = 4,
},
  };
  
diff --git a/gcc/config/loongarch/loongarch-tune.h b/gcc/config/loongarch/loongarch-tune.h

index 6f3530f5c02..8e3eb29472b 100644
--- a/gcc/config/loongarch/loongarch-tune.h
+++ b/gcc/config/loongarch/loongarch-tune.h
@@ -45,6 +45,7 @@ struct loongarch_cache {
  int l1d_line_size;  /* bytes */
  int l1d_size;   /* KiB */
  int l2d_size;   /* kiB */
+int simultaneous_prefetches; /* number of parallel prefetch */
  };
  
  #endif /* LOONGARCH_TUNE_H */

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 8d5d8d965dd..8ee32c90573 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "context.h"
  #include "builtins.h"
  #include "rtl-iter.h"
+#include "opts.h"
  
  /* This file should be included last.  */

  #include "target-def.h"
@@ -6100,6 +6101,33 @@ loongarch_option_override_internal (struct gcc_options 
*opts)
if (loongarch_branch_cost == 0)
  loongarch_branch_cost = loongarch_cost->branch_cost;
  
+  /* Set up parameters to be used in prefetching algorithm.  */

+  int simultaneous_prefetches
+= loongarch_cpu_cache[LARCH_ACTUAL_TUNE].simultaneous_prefetches;
+
+  SET_OPTION_IF_UNSET (opts, _options_set,
+  param_simultaneous_prefetches,
+  simultaneous_prefetches);
+
+  SET_OPTION_IF_UNSET (opts, _options_set,
+  param_l1_cache_line_size,
+  loongarch_cpu_cache[LARCH_ACTUAL_TUNE].l1d_line_size);
+
+  SET_OPTION_IF_UNSET (opts, _options_set,
+  param_l1_cache_size,
+  loongarch_cpu_cache[LARCH_ACTUAL_TUNE].l1d_size);
+
+  SET_OPTION_IF_UNSET (opts, _options_set,
+  param_l2_cache_size,
+  loongarch_cpu_cache[LARCH_ACTUAL_TUNE].l2d_size);
+
+
+  /* Enable sw prefetching at -O3 and higher.  */
+  if (opts->x_flag_prefetch_loop_arrays < 0
+  && (opts->x_optimize >= 3 || opts->x_flag_profile_use)
+  && !opts->x_optimize_size)
+opts->x_flag_prefetch_loop_arrays = 1;
+
if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib)
  error ("%qs cannot be used for compiling a shared library",
   "-mdirect-extern-access");
diff --git 

Re: [PATCH V2] Use subscalar mode to move struct block for parameter

2022-11-22 Thread Jiufu Guo via Gcc-patches
Hi Jeff,

Thanks a lot for your comments!

Jeff Law  writes:

> On 11/20/22 20:07, Jiufu Guo wrote:
>> Jiufu Guo  writes:
>>
>>> Hi,
>>>
>>> As mentioned in the previous version patch:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
>>> The suboptimal code is generated for "assigning from parameter" or
>>> "assigning to return value".
>>> This patch enhances the assignment from parameters like the below
>>> cases:
>>> /case1.c
>>> typedef struct SA {double a[3];long l; } A;
>>> A ret_arg (A a) {return a;}
>>> void st_arg (A a, A *p) {*p = a;}
>>>
>>> case2.c
>>> typedef struct SA {double a[3];} A;
>>> A ret_arg (A a) {return a;}
>>> void st_arg (A a, A *p) {*p = a;}
>>>
>>> For this patch, bootstrap and regtest pass on ppc64{,le}
>>> and x86_64.
>>> * Besides asking for help reviewing this patch, I would like to
>>> consult comments about enhancing for "assigning to returns".
>> I updated the patch to fix the issue for returns.  This patch
>> adds a flag DECL_USEDBY_RETURN_P to indicate if a var is used
>> by a return stmt.  This patch fix the issue in expand pass only,
>> so, we would try to update the patch to avoid this flag.
>>
>> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
>> index dd29c03..09b8ec64cea 100644
>> --- a/gcc/cfgexpand.cc
>> +++ b/gcc/cfgexpand.cc
>> @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars)
>>   frame_phase = off ? align - off : 0;
>> }
>>   +  /* Collect VARs on returns.  */
>> +  if (DECL_RESULT (current_function_decl))
>> +{
>> +  edge_iterator ei;
>> +  edge e;
>> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>> +if (greturn *ret = safe_dyn_cast (last_stmt (e->src)))
>> +  {
>> +tree val = gimple_return_retval (ret);
>> +if (val && VAR_P (val))
>> +  DECL_USEDBY_RETURN_P (val) = 1;
>> +  }
>> +}
>> +
>> /* Set TREE_USED on all variables in the local_decls.  */
>> FOR_EACH_LOCAL_DECL (cfun, i, var)
>>   TREE_USED (var) = 1;
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index d9407432ea5..20973649963 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool 
>> nontemporal)
>> return;
>>   }
>>   +  if ((TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from)
>> +   && TYPE_MODE (TREE_TYPE (from)) == BLKmode
>> +   && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
>> +   || REG_P (DECL_INCOMING_RTL (from
>> +  || (VAR_P (to) && DECL_USEDBY_RETURN_P (to)
>> +  && TYPE_MODE (TREE_TYPE (to)) == BLKmode
>> +  && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl)))
>> +   == PARALLEL))
>> +{
>> +  push_temp_slots ();
>> +  rtx par_ret;
>> +  machine_mode mode;
>> +  par_ret = TREE_CODE (from) == PARM_DECL
>> +  ? DECL_INCOMING_RTL (from)
>> +  : DECL_RTL (DECL_RESULT (current_function_decl));
>> +  mode = GET_CODE (par_ret) == PARALLEL
>> +   ? GET_MODE (XEXP (XVECEXP (par_ret, 0, 0), 0))
>> +   : word_mode;
>> +  int mode_size = GET_MODE_SIZE (mode).to_constant ();
>> +  int size = INTVAL (expr_size (from));
>> +
>> +  /* If/How the parameter using submode, it dependes on the size and
>> + position of the parameter.  Here using heurisitic number.  */
>> +  int hurstc_num = 8;
>
> Where did this come from and what does it mean?
Sorry for does not make this clear. We know that an aggregate arg may be
on registers partially or totally, as assign_parm_adjust_entry_rtl.
For an example, if a parameter with 12 words and the target/ABI only
allow 8 gprs for arguments, then the parameter could use 8 regs at most
and left part in stack.

>
>
> Note that BLKmode subword values passed in registers can be either
> right or left justified.  I think you also need to worry about
> endianness here.
Since the subword is used to move block(read from source mem and then
store to destination mem with register mode), and this would keep to use
the same endianness on reg like move_block_from_reg. So, the patch does
not check the endianness.

If any concerns and sugguestions, please point out, thanks!

BR,
Jeff (Jiufu)
>
>
> Jeff


[PATCH] Change the behavior of predicate check failure on cbranchcc4 operand0 in prepare_cmp_insn

2022-11-22 Thread HAO CHEN GUI via Gcc-patches
Hi,
  I want to enable "have_cbranchcc4" on rs6000. But not all combinations of
comparison codes and sub CC modes are benefited to generate cbranchcc4 insns
on rs6000. There is an predicate for operand0 of cbranchcc4 to bypass
some combinations. It gets assertion failure in prepare_cmp_insn. I think
we shouldn't suppose that all comparison codes and sub CC modes are supported
and throw an assertion failure in prepare_cmp_insn. It might check the
predicate and go to fail if the predicate can't be satisfied. This patch
changes the behavior of those codes.

  Bootstrapped and tested on powerpc64-linux BE/LE and x86 with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.


ChangeLog
2022-11-23  Haochen Gui 

gcc/
* optabs.cc (prepare_cmp_insn): Go to fail other than assert it when
predicate check of "cbranchcc4" operand[0] fails.

patch.diff
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 165f8d1fa22..3ec8f6b17ba 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -4484,8 +4484,9 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, 
rtx size,
 {
   enum insn_code icode = optab_handler (cbranch_optab, CCmode);
   test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y);
-  gcc_assert (icode != CODE_FOR_nothing
-  && insn_operand_matches (icode, 0, test));
+  gcc_assert (icode != CODE_FOR_nothing);
+  if (!insn_operand_matches (icode, 0, test))
+   goto fail;
   *ptest = test;
   return;
 }


Re: [PATCH] configure: Implement --enable-host-pie

2022-11-22 Thread Marek Polacek via Gcc-patches
On Sun, Nov 20, 2022 at 08:06:55AM -0700, Jeff Law wrote:
> 
> On 11/10/22 19:52, Marek Polacek via Gcc-patches wrote:
> > This is a rebased version of the patch I posted in March:
> > 
> > which Alex sort of approved here:
> > 
> > but it was too late to commit the patch in GCC 12.
> > 
> > There are no changes except that I've converted the documentation
> > part into the ReST format, and of course regenerated configure.
> > 
> > With --enable-host-pie enabled:
> > $ file ./gcc/cc1 ./gcc/cc1plus
> > ./gcc/cc1: ELF 64-bit LSB pie executable, x86-64, version 1 
> > (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, 
> > for GNU/Linux 3.2.0, with debug_info, not stripped
> > ./gcc/cc1plus: ELF 64-bit LSB pie executable, x86-64, version 1 
> > (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, 
> > for GNU/Linux 3.2.0, with debug_info, not stripped
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu w/ and w/o --enable-host-pie,
> > ok for trunk?
> > 
> > -- >8 --
> > 
> > This patch implements the --enable-host-pie configure option which
> > makes the compiler executables PIE.  This can be used to enhance
> > protection against ROP attacks, and can be viewed as part of a wider
> > trend to harden binaries.
> > 
> > It is similar to the option --enable-host-shared, except that --e-h-s
> > won't add -shared to the linker flags whereas --e-h-p will add -pie.
> > It is different from --enable-default-pie because that option just
> > adds an implicit -fPIE/-pie when the compiler is invoked, but the
> > compiler itself isn't PIE.
> > 
> > Since r12-5768-gfe7c3ecf, PCH works well with PIE, so there are no PCH
> > regressions.
> > 
> > When building the compiler, the build process may use various in-tree
> > libraries; these need to be built with -fPIE so that it's possible to
> > use them when building a PIE.  For instance, when --with-included-gettext
> > is in effect, intl object files must be compiled with -fPIE.  Similarly,
> > when building in-tree gmp, isl, mpfr and mpc, they must be compiled with
> > -fPIE.
> > 
> > I plan to add an option to link with -Wl,-z,now.
> > 
> > ChangeLog:
> > 
> > * Makefile.def: Pass $(PICFLAG) to AM_CFLAGS for gmp, mpfr, mpc, and
> > isl.
> > * Makefile.in: Regenerate.
> > * Makefile.tpl: Set PICFLAG.
> > * configure.ac (--enable-host-pie): New check.  Set PICFLAG after this
> > check.
> > * configure: Regenerate.
> > 
> > c++tools/ChangeLog:
> > 
> > * Makefile.in: Rename PIEFLAG to PICFLAG.  Set LD_PICFLAG.  Use it.
> > Use pic/libiberty.a if PICFLAG is set.
> > * configure.ac (--enable-default-pie): Set PICFLAG instead of PIEFLAG.
> > (--enable-host-pie): New check.
> > * configure: Regenerate.
> > 
> > fixincludes/ChangeLog:
> > 
> > * Makefile.in: Set and use PICFLAG and LD_PICFLAG.  Use the "pic"
> > build of libiberty if PICFLAG is set.
> > * configure.ac:
> > * configure: Regenerate.
> > 
> > gcc/ChangeLog:
> > 
> > * Makefile.in: Set LD_PICFLAG.  Use it.  Set enable_host_pie.
> > Remove NO_PIE_CFLAGS and NO_PIE_FLAG.  Pass LD_PICFLAG to
> > ALL_LINKERFLAGS.  Use the "pic" build of libiberty if --enable-host-pie.
> > * configure.ac (--enable-host-shared): Don't set PICFLAG here.
> > (--enable-host-pie): New check.  Set PICFLAG and LD_PICFLAG after this
> > check.
> > * configure: Regenerate.
> > * doc/install/configuration.rst: Document --enable-host-pie.
> > 
> > gcc/d/ChangeLog:
> > 
> > * Make-lang.in: Remove NO_PIE_CFLAGS.
> > 
> > intl/ChangeLog:
> > 
> > * Makefile.in: Use @PICFLAG@ in COMPILE as well.
> > * configure.ac (--enable-host-shared): Don't set PICFLAG here.
> > (--enable-host-pie): New check.  Set PICFLAG after this check.
> > * configure: Regenerate.
> > 
> > libcody/ChangeLog:
> > 
> > * Makefile.in: Pass LD_PICFLAG to LDFLAGS.
> > * configure.ac (--enable-host-shared): Don't set PICFLAG here.
> > (--enable-host-pie): New check.  Set PICFLAG and LD_PICFLAG after this
> > check.
> > * configure: Regenerate.
> > 
> > libcpp/ChangeLog:
> > 
> > * configure.ac (--enable-host-shared): Don't set PICFLAG here.
> > (--enable-host-pie): New check.  Set PICFLAG after this check.
> > * configure: Regenerate.
> > 
> > libdecnumber/ChangeLog:
> > 
> > * configure.ac (--enable-host-shared): Don't set PICFLAG here.
> > (--enable-host-pie): New check.  Set PICFLAG after this check.
> > * configure: Regenerate.
> > 
> > libiberty/ChangeLog:
> > 
> > * configure.ac: Also set shared when enable_host_pie.
> > * configure: Regenerate.
> > 
> > zlib/ChangeLog:
> > 
> > * configure.ac (--enable-host-shared): Don't set PICFLAG here.
> > (--enable-host-pie): New check.  Set PICFLAG after this check.
> > * configure: 

Re: [PATCH v4] LoongArch: Optimize immediate load.

2022-11-22 Thread chenglulu



在 2022/11/23 00:44, Xi Ruoyao 写道:

While I still can't fully understand the immediate load issue and how
this patch fix it, I've tested this patch (alongside the prefetch
instruction patch) with bootstrap-ubsan.  And the compiled result of
imm-load1.c seems OK.

And it's doing correct thing for Glibc "improved generic string
functions" patch, producing some really tight loop now.

In the process of debugging, I found this,bringing the immediate number 
load instruction out of the loop is done in loop2_invariant optimization.


One of the conditions for extraction is that the destination register 
cannot be used more than once, and the sequence before it was modified 
was like this:


(insn 12 11 13 3 (set (reg:DI 90)
    (const_int 16842752 [0x101])) "test.c":13:12 discrim 1 131 
{*movdi_64bit}

 (nil))
(insn 13 12 14 3 (set (reg:DI 91)
    (ior:DI (reg:DI 90)
    (const_int 257 [0x101]))) "test.c":13:12 discrim 1 88 {iordi3}
 (expr_list:REG_DEAD (reg:DI 90)
    (expr_list:REG_EQUAL (const_int 16843009 [0x1010101])
    (nil

(insn 14 13 15 3 (set (reg:DI 91)
    (ior:DI (zero_extend:DI (subreg:SI (reg:DI 91) 0))
    (const_int 282578783305728 [0x10101]))) 
"test.c":13:12 discrim 1 150 {lu32i_d}

 (expr_list:REG_EQUAL (const_int 282578800148737 [0x1010101010101])
    (nil)))
(insn 15 14 17 3 (set (reg:DI 91)
    (ior:DI (and:DI (reg:DI 91)
    (const_int 4503599627370495 [0xf]))
    (const_int 72057594037927936 [0x100]))) 
"test.c":13:12 discrim 1 151 {lu52i_d}

 (expr_list:REG_EQUAL (const_int 72340172838076673 [0x101010101010101])
    (nil)))

Therefore, the last two instructions do not meet the extraction conditions.

But because of the implementation of our instructions, I freed myself up 
immediately to do it loop2_invariant later, so I avoided this problem.




Re: [PATCH] i386: Only enable small loop unrolling in backend [PR 107602]

2022-11-22 Thread Hongyu Wang via Gcc-patches
Hi Jeff,

> The reversion of the loop-init.cc changes is fine.  The x86 maintainers
> will need to chime in on the rest.  Consider installing the loop-init.cc
> reversion immediately as the current state has regressed s390 and
> potentially other targets.

I've posted a patch in
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606478.html
to change the rs6000 target and there is a discussion ongoing. If we
draw a conclusion that we only want to make changes in the backend, I
will install this one.

-- 
Regards,

Hongyu, Wang


Re: [PATCH] rs6000: Adjust loop_unroll_adjust to match middle-end change [PR 107692]

2022-11-22 Thread Hongyu Wang via Gcc-patches
Hi, Segher and Richard

> > Something in your patch was wrong, please fix that (or revert the
> > patch).  You should not have to touch config/rs6000/ at all.
>
> Sure something is wrong, but I think there's the opportunity to
> simplify rs6000/ and s390x/, the only other two implementors of
> the hook used.

If I understand correctly, the wrong part is we should not break the
logic of -funroll-loops and check OPTION_SET_P in
targetm.loop_unroll_adjust to pretend the loop-unrolling is disabled
with -fno-unroll-loops.
I don't have a good idea to resolve this, perhaps add another hook and
check OPTION_SET_P (flag_unroll_loops) && munroll_only_small_loops
there and use that hook in rtl_loop_unroll::gate (), but still it
doesn't work if we want to strictly follow the logic that
-munroll-only-small-loops should not enable loop unrolling.
IMHO the middle-end part with target hook looks quite tricky (and of
course the OPTION_SET_P in the target hook). So Richard if you agree,
I'd like to install the reversion patch posted in
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606774.html
and move all them to the backend first.

-- 
Regards,

Hongyu, Wang


Re: [PATCH][RFC] Unify MAX_NUM_CHAINS and MAX_CHAIN_LEN to --param uninit-max-predicate-size

2022-11-22 Thread Jeff Law via Gcc-patches



On 9/5/22 07:25, Richard Biener via Gcc-patches wrote:

The following exposes the MAX_NUM_CHAINS and MAX_CHAIN_LEN to the user
by adding a --param uninit-max-predicate-size and re-doing the
limits on the whole predicate expression size rather than limiting
the number of OR and AND elements separately.  The following goes
a step further and for a single AND chain allows an arbitrary size,
limiting that only with the computational --param
uninit-control-dep-attempts parameter.  That might be a bit high
in practice, but it seems odd to continue searching for smaller and
smaller paths until we exhaust the search space or
uninit-control-dep-attempts.

I'm testing this on x86_64-unknown-linux-gnu at the moment.

Any comments?

Thanks,
Richard.

* params.opt (uninit-max-predicate-size): New.
* doc/invoke.texi (--param uninit-max-predicate-size): Document.
* gimple-predicate-analysis.h
(predicate::init_from_control_deps): Adjust.
* gimple-predicate-analysis.cc (MAX_NUM_CHAINS, MAX_CHAIN_LEN):
Remove.
(format_edge_vecs): Adjust.
(simple_control_dep_chain): Do not limit.
(compute_control_dep_chain): Adjust limiting to the overall
predicate expression size _after_ adding an element to the
vector of AND chains.
(predicate::init_from_control_deps): Adjust.
(uninit_analysis::init_use_preds): Likewise.
(uninit_analysis::init_from_phi_def): Likewise.


I think we probably have too many knobs already, though magic numbers 
are even worse.  I suspect we (gcc develoeprs) will be the biggest user 
of this if we go forward.  Essentially given a testcase we can crank up 
the limits to see if the test is hitting limits or exposing a deeper 
problem.


So based on removal of the magic #s, it looks good to me.


jeff




Re: [committed] Fix comment typos noticed by Bernhard

2022-11-22 Thread Jeff Law


On 11/22/22 16:28, Andrew Pinski wrote:

On Tue, Nov 22, 2022 at 3:25 PM Jeff Law  wrote:

Minor comment typo fixes as noticed by Bernhard.


Installed onto the trunk,

Hmm:
-  int alternative
+  int bool

That seems wrong and might cause a build failure.


Fixed thusly.

jeff

ps.  I keep thinking that the workflow right now is fairly cumbersome 
moving things between the laptop and one of the servers I use.  I could 
make an argument that a topic branch with a merge-request would be 
better.  But we're not set up for that :(



commit ee86077c2c2e38376cb5a575af62e9187c98e8df
Author: Jeff Law 
Date:   Tue Nov 22 16:30:38 2022 -0700

Fix thinko in last patch

gcc/
* tree-ssa-dom.cc (record_edge_info): Fix thinko in last commit.

diff --git a/gcc/tree-ssa-dom.cc b/gcc/tree-ssa-dom.cc
index e82f4f73a66..c9e52d1ee94 100644
--- a/gcc/tree-ssa-dom.cc
+++ b/gcc/tree-ssa-dom.cc
@@ -676,7 +676,7 @@ record_edge_info (basic_block bb)
 if it never traverses the backedge to begin with.  This
 implies that any PHI nodes create equivalances that we
 can attach to the loop exit edge.  */
- int bool
+ bool alternative
= (EDGE_PRED (bb, 0)->flags & EDGE_DFS_BACK) ? 1 : 0;
 
  gphi_iterator gsi;


Re: [committed] Fix comment typos noticed by Bernhard

2022-11-22 Thread Jeff Law



On 11/22/22 16:28, Andrew Pinski wrote:

On Tue, Nov 22, 2022 at 3:25 PM Jeff Law  wrote:

Minor comment typo fixes as noticed by Bernhard.


Installed onto the trunk,

Hmm:
-  int alternative
+  int bool

That seems wrong and might cause a build failure.


Ugh.  Probably will.  I'll double-check momentarily.  That's what I get 
for scanning the stashed patch thinking it was just comment fixes and 
not bootstrapping.  My bad entirely.



jeff



Re: [committed] Fix comment typos noticed by Bernhard

2022-11-22 Thread Andrew Pinski via Gcc-patches
On Tue, Nov 22, 2022 at 3:25 PM Jeff Law  wrote:
>
> Minor comment typo fixes as noticed by Bernhard.
>
>
> Installed onto the trunk,

Hmm:
-  int alternative
+  int bool

That seems wrong and might cause a build failure.

Thanks,
Andrew


>
>
> Jeff
>
>


[committed] Fix comment typos noticed by Bernhard

2022-11-22 Thread Jeff Law

Minor comment typo fixes as noticed by Bernhard.


Installed onto the trunk,


Jeff


commit a03b35a28db262546415e8f16829cbb027a75025
Author: Jeff Law 
Date:   Tue Nov 22 16:22:18 2022 -0700

Fix comment typos noticed by Bernhard

gcc/
* tree-ssa-dom.cc (record_edge_info): Fix comment typos.

diff --git a/gcc/tree-ssa-dom.cc b/gcc/tree-ssa-dom.cc
index c7f095d79fc..e82f4f73a66 100644
--- a/gcc/tree-ssa-dom.cc
+++ b/gcc/tree-ssa-dom.cc
@@ -673,10 +673,10 @@ record_edge_info (basic_block bb)
{
  /* At this point we know the exit condition is loop
 invariant.  The only way to get out of the loop is
-if never traverses the backedge to begin with.  This
-implies that any PHI nodes create equivalances we can
-attach to the loop exit edge.  */
- int alternative
+if it never traverses the backedge to begin with.  This
+implies that any PHI nodes create equivalances that we
+can attach to the loop exit edge.  */
+ int bool
= (EDGE_PRED (bb, 0)->flags & EDGE_DFS_BACK) ? 1 : 0;
 
  gphi_iterator gsi;


[committed][RISC-V] Fix recent rvv/base/spill testcase failures

2022-11-22 Thread Jeff Law
As Jaiwei noted, many (all?) of the rvv/base/spill tests started failing 
after the introduction of shrink-wrapping.



The core issue is we're expecting the frame to have a constant size, but 
it doesn't.  So when using the to_constant method we abort.



The safest thing to do is to set no shrink-wrapping components when the 
frame size is not fixed.  We might be able to do better later -- iff we 
know the offset to the GPRs/FPRs is fixed and fits into the appropriate 
number of bits.



Bootstrapped and regression tested (C-only) on riscv64-linux-gnu.  As 
expected, it fixes a bucketload of failures in rvv/base/spill-*.c.



Installed on the trunk,

Jeff
commit ca73d4c80ea06087d9dd22594e5670bb15e21066
Author: Jeff Law 
Date:   Tue Nov 22 18:12:45 2022 -0500

Fix recent rvv/base/spill testcase failures

he core issue is we're expecting the frame to have a constant size, but it
doesn't.  So when using the to_constant method we abort.

The safest thing to do is to set no shrink-wrapping components when the
frame size is not fixed.  We might be able to do better later -- iff we
know the offset to the GPRs/FPRs is fixed and fits into the appropriate
number of bits.

Bootstrapped and regression tested (C-only) on riscv64-linux-gnu.  As
expected, it fixes a bucketload of failures in rvv/base/spill-*.c.

gcc/
* config/riscv/riscv.cc (riscv_get_separate_components): Do not
do shrink-wrapping for a frame with a variable size.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7ec4ce97e6c..7bfc0e9f595 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5340,7 +5340,8 @@ riscv_get_separate_components (void)
   bitmap_clear (components);
 
   if (riscv_use_save_libcall (>machine->frame)
-  || cfun->machine->interrupt_handler_p)
+  || cfun->machine->interrupt_handler_p
+  || !cfun->machine->frame.gp_sp_offset.is_constant ())
 return components;
 
   offset = cfun->machine->frame.gp_sp_offset.to_constant ();


Re: [PATCH v2] tree-object-size: Support strndup and strdup

2022-11-22 Thread Siddhesh Poyarekar

On 2022-11-22 15:43, Jeff Law wrote:


On 11/21/22 07:27, Siddhesh Poyarekar wrote:

On 2022-11-20 10:42, Jeff Law wrote:


On 11/4/22 06:48, Siddhesh Poyarekar wrote:
Use string length of input to strdup to determine the usable size of 
the

resulting object.  Avoid doing the same for strndup since there's a
chance that the input may be too large, resulting in an unnecessary
overhead or worse, the input may not be NULL terminated, resulting in a
crash where there would otherwise have been none.

gcc/ChangeLog:

* tree-object-size.cc (todo): New variable.
(object_sizes_execute): Use it.
(strdup_object_size): New function.
(call_object_size): Use it.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_strdup,
test_strndup, test_strdup_min, test_strndup_min): New tests.
(main): Call them.
* gcc.dg/builtin-dynamic-object-size-1.c: Silence overread
warnings.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
* gcc.dg/builtin-object-size-1.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-2.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-4.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.


I'm struggling to see how the SSA updating is correct.  Yes we need 
to update the virtuals due to the introduction of the call to strlen, 
particularly when SRC is not a string constant.  But do we need to do 
more?


Don't we end up gimplifying the 1 + strlenfn (src) expression? Can 
that possibly create new SSA_NAMEs?  Do those need to be put into SSA 
form? I feel like I'm missing something here...


We do all of that manually in gimplify_size_expressions, the only 
thing left to do is updating virtuals AFAICT.


I guess it's actually buried down in force_gimple_operand and I guess 
for temporaries they're not going to be alive across the new gimple 
sequence and each destination gets its own SSA_NAME, so it ought to be 
safe.  Just had to work a bit further through things.


OK for the trunk.


Thanks, pushed with the trivial fixup that Prathamesh suggested, i.e. 
replaced 'if (!strndup)' with 'else'.


Sid


Re: [PATCH] c: Propagate erroneous types to declaration specifiers [PR107805]

2022-11-22 Thread Joseph Myers
On Tue, 22 Nov 2022, Florian Weimer via Gcc-patches wrote:

> Without this change, finish_declspecs cannot tell that whether there
> was an erroneous type specified, or no type at all.  This may result
> in additional diagnostics for implicit ints, or missing diagnostics
> for multiple types.
> 
>   PR c/107805
> 
> gcc/c/
>   * c-decl.cc (declspecs_add_type): Propagate error_mark_bode
>   from type to specs.
> 
> gcc/testsuite/
>   * gcc.dg/pr107805-1.c: New test.
>   * gcc.dg/pr107805-1.c: Likewise.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] range-op: Implement floating point division fold_range [PR107569]

2022-11-22 Thread Joseph Myers
On Tue, 22 Nov 2022, Jan-Benedict Glaw wrote:

> I'm running a slightly hacked [glibc]/scripts/build-many-glibcs.py to
> to CI builds for glibc as well by now (hacked to allow for GCC master
> being used) and this GCC commit
> (2d5c4a16dd833aa083f13dd3e78e3ef38afe6ebb) triggers glibc's
> elf/check-localplt testcase to fail, though just for
> sparc64-linux-gnu. (As I just started with glibc checks, it took me a
> while to realize this was a real regression and not a flaw in my
> setup.)

I think the appropriate fix is to update the relevant localplt.data (to 
add the relevant libgcc symbol marked with "?" as optional), I don't think 
there's a GCC bug here.

-- 
Joseph S. Myers
jos...@codesourcery.com


[committed] analyzer: only look for named functions in root ns [PR107788]

2022-11-22 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4249-gec7c796de020cb.

gcc/analyzer/ChangeLog:
PR analyzer/107788
* known-function-manager.cc (known_function_manager::get_match):
Don't look up fndecls by name when they're not in the root
namespace.

gcc/testsuite/ChangeLog:
PR analyzer/107788
* g++.dg/analyzer/named-functions.C: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/known-function-manager.cc  | 15 ---
 gcc/testsuite/g++.dg/analyzer/named-functions.C | 12 
 2 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/named-functions.C

diff --git a/gcc/analyzer/known-function-manager.cc 
b/gcc/analyzer/known-function-manager.cc
index e17350da5ec..c1074bcb6e5 100644
--- a/gcc/analyzer/known-function-manager.cc
+++ b/gcc/analyzer/known-function-manager.cc
@@ -91,6 +91,7 @@ known_function_manager::add (enum internal_fn ifn,
 const known_function *
 known_function_manager::get_match (tree fndecl, const call_details ) const
 {
+  /* Look for a matching built-in.  */
   if (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL))
 {
   if (const known_function *candidate
@@ -99,10 +100,18 @@ known_function_manager::get_match (tree fndecl, const 
call_details ) const
fndecl))
  return candidate;
 }
+
+  /* Look for a match by name.  */
+
+  /* Reject fndecls that aren't in the root namespace.  */
+  if (DECL_CONTEXT (fndecl)
+  && TREE_CODE (DECL_CONTEXT (fndecl)) != TRANSLATION_UNIT_DECL)
+return NULL;
   if (tree identifier = DECL_NAME (fndecl))
-  if (const known_function *candidate = get_by_identifier (identifier))
-   if (candidate->matches_call_types_p (cd))
- return candidate;
+if (const known_function *candidate = get_by_identifier (identifier))
+  if (candidate->matches_call_types_p (cd))
+   return candidate;
+
   return NULL;
 }
 
diff --git a/gcc/testsuite/g++.dg/analyzer/named-functions.C 
b/gcc/testsuite/g++.dg/analyzer/named-functions.C
new file mode 100644
index 000..661a9307b81
--- /dev/null
+++ b/gcc/testsuite/g++.dg/analyzer/named-functions.C
@@ -0,0 +1,12 @@
+#define NULL ((void *)0)
+
+namespace my
+{
+  int socket (int, int, int);
+};
+
+void test_my_socket ()
+{
+  /* This shouldn't match the known function "::socket".  */
+  my::socket (0, 0, 0); /* { dg-bogus "leak" } */
+}
-- 
2.26.3



[committed] analyzer: fix ICE on 'bind(INT_CST, ...)' [PR107783]

2022-11-22 Thread David Malcolm via Gcc-patches
This was crashing inside fd_phase_mismatch's ctor with assertion
failure when the state was "fd-constant".

Fix the ICE by not complaining about constants passed to these APIs.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4248-g64fb291c5839e1.

gcc/analyzer/ChangeLog:
PR analyzer/107783
* sm-fd.cc (fd_state_machine::check_for_new_socket_fd): Don't
complain when old state is "fd-constant".
(fd_state_machine::on_listen): Likewise.
(fd_state_machine::on_accept): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/107783
* gcc.dg/analyzer/fd-accept.c (test_accept_on_constant): New.
* gcc.dg/analyzer/fd-bind.c (test_bind_on_constant): New.
* gcc.dg/analyzer/fd-connect.c (test_connect_on_constant): New.
* gcc.dg/analyzer/fd-listen.c (test_listen_on_connected_socket):
Fix typo.
(test_listen_on_constant): New.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/sm-fd.cc  | 9 ++---
 gcc/testsuite/gcc.dg/analyzer/fd-accept.c  | 5 +
 gcc/testsuite/gcc.dg/analyzer/fd-bind.c| 5 +
 gcc/testsuite/gcc.dg/analyzer/fd-connect.c | 5 +
 gcc/testsuite/gcc.dg/analyzer/fd-listen.c  | 7 ++-
 5 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
index 3e500575428..f7779be7d26 100644
--- a/gcc/analyzer/sm-fd.cc
+++ b/gcc/analyzer/sm-fd.cc
@@ -1798,7 +1798,8 @@ fd_state_machine::check_for_new_socket_fd (const 
call_details ,
|| old_state == m_new_datagram_socket
|| old_state == m_new_unknown_socket
|| old_state == m_start
-   || old_state == m_stop))
+   || old_state == m_stop
+   || old_state == m_constant_fd))
 {
   /* Complain about "bind" or "connect" in wrong phase.  */
   tree diag_arg = sm_ctxt->get_diagnostic_tree (fd_sval);
@@ -1900,6 +1901,7 @@ fd_state_machine::on_listen (const call_details ,
   if (!check_for_socket_fd (cd, successful, sm_ctxt, fd_sval, node, old_state))
 return false;
   if (!(old_state == m_start
+   || old_state == m_constant_fd
|| old_state == m_stop
|| old_state == m_bound_stream_socket
|| old_state == m_bound_unknown_socket
@@ -2015,8 +2017,9 @@ fd_state_machine::on_accept (const call_details ,
   if (!check_for_socket_fd (cd, successful, sm_ctxt, fd_sval, node, old_state))
 return false;
 
-  if (old_state == m_start)
-/* If we were in the start state, assume we had the expected state.  */
+  if (old_state == m_start || old_state == m_constant_fd)
+/* If we were in the start state (or a constant), assume we had the
+   expected state.  */
 sm_ctxt->set_next_state (cd.get_call_stmt (), fd_sval,
 m_listening_stream_socket);
   else if (old_state == m_stop)
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-accept.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-accept.c
index 36cc7af7184..e56caaca6af 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-accept.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-accept.c
@@ -67,3 +67,8 @@ int test_accept_on_accept (int fd_a)
 
   return fd_b;
 }
+
+int test_accept_on_constant ()
+{
+  return accept (0, NULL, 0);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-bind.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-bind.c
index 6f91bc4b794..fa69ca4c0f8 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-bind.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-bind.c
@@ -72,3 +72,8 @@ void test_bind_after_accept (int fd, const char *sockname)
 
   close (afd);
 }
+
+int test_bind_on_constant ()
+{
+  return bind (0, NULL, 0);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-connect.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-connect.c
index 1ab54d01f36..5b1c335ba76 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-connect.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-connect.c
@@ -44,3 +44,8 @@ void test_connect_after_bind (const char *sockname,
 
   close (fd);  
 }
+
+int test_connect_on_constant ()
+{
+  return connect (0, NULL, 0);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-listen.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-listen.c
index 1f54a8f2953..31eb90d6cb3 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-listen.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-listen.c
@@ -52,7 +52,7 @@ void test_listen_on_new_datagram_socket (void)
   close (fd);
 }
 
-void test_listed_on_connected_socket (int fd)
+void test_listen_on_connected_socket (int fd)
 {
   int afd = accept (fd, NULL, 0);
   if (afd == -1)
@@ -61,3 +61,8 @@ void test_listed_on_connected_socket (int fd)
   /* { dg-message "'listen' expects a bound stream socket file descriptor but 
'afd' is connected" "final event" { target *-*-* } .-1 } */
   close (afd);
 }
+
+int test_listen_on_constant ()
+{
+  return listen (0, 10);
+}
-- 
2.26.3



[committed] analyzer: fix 'errno' on Solaris and OS X [PR107807]

2022-11-22 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4247-g7c9717fcb5cf94.

gcc/analyzer/ChangeLog:
PR analyzer/107807
* region-model-impl-calls.cc (register_known_functions): Register
"___errno" and "__error" as synonyms  for "__errno_location".

gcc/testsuite/ChangeLog:
PR analyzer/107807
* gcc.dg/analyzer/errno-___errno.c: New test.
* gcc.dg/analyzer/errno-__error.c: New test.
* gcc.dg/analyzer/errno-global-var.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model-impl-calls.cc   | 14 +
 .../gcc.dg/analyzer/errno-___errno.c  | 29 +++
 gcc/testsuite/gcc.dg/analyzer/errno-__error.c | 28 ++
 .../gcc.dg/analyzer/errno-global-var.c| 26 +
 4 files changed, 97 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/errno-___errno.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/errno-__error.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/errno-global-var.c

diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 6962ffd400f..23a21d752cf 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -1953,6 +1953,20 @@ register_known_functions (known_function_manager )
 kfm.add ("error_at_line", make_unique (5));
   }
 
+  /* Other implementations of C standard library.  */
+  {
+/* According to PR 107807 comment #2, Solaris implements "errno"
+   like this:
+extern int *___errno(void) __attribute__((__const__));
+#define errno (*(___errno()))
+   and OS X like this:
+extern int * __error(void);
+#define errno (*__error())
+   Add these as synonyms for "__errno_location".  */
+kfm.add ("___errno", make_unique ());
+kfm.add ("__error", make_unique ());
+  }
+
   /* C++ support functions.  */
   {
 kfm.add ("operator new", make_unique ());
diff --git a/gcc/testsuite/gcc.dg/analyzer/errno-___errno.c 
b/gcc/testsuite/gcc.dg/analyzer/errno-___errno.c
new file mode 100644
index 000..17ff8b7de9d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/errno-___errno.c
@@ -0,0 +1,29 @@
+#include "analyzer-decls.h"
+
+/* According to PR 107807 comment #2, Solaris implements "errno"
+   like this:  */
+
+extern int *___errno(void) __attribute__((__const__));
+#define errno (*(___errno()))
+
+
+extern void external_fn (void);
+
+int test_reading_errno (void)
+{
+  return errno;
+}
+
+void test_setting_errno (int val)
+{
+  errno = val;
+}
+
+void test_storing_to_errno (int val)
+{
+  __analyzer_eval (errno == val); /* { dg-warning "UNKNOWN" } */
+  errno = val;
+  __analyzer_eval (errno == val); /* { dg-warning "TRUE" } */
+  external_fn ();
+  __analyzer_eval (errno == val); /* { dg-warning "UNKNOWN" } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/errno-__error.c 
b/gcc/testsuite/gcc.dg/analyzer/errno-__error.c
new file mode 100644
index 000..19bc4f937f6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/errno-__error.c
@@ -0,0 +1,28 @@
+#include "analyzer-decls.h"
+
+/* According to PR 107807 comment #2, OS X implements "errno"
+   like this:  */
+
+extern int * __error(void);
+#define errno (*__error())
+
+extern void external_fn (void);
+
+int test_reading_errno (void)
+{
+  return errno;
+}
+
+void test_setting_errno (int val)
+{
+  errno = val;
+}
+
+void test_storing_to_errno (int val)
+{
+  __analyzer_eval (errno == val); /* { dg-warning "UNKNOWN" } */
+  errno = val;
+  __analyzer_eval (errno == val); /* { dg-warning "TRUE" } */
+  external_fn ();
+  __analyzer_eval (errno == val); /* { dg-warning "UNKNOWN" } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/errno-global-var.c 
b/gcc/testsuite/gcc.dg/analyzer/errno-global-var.c
new file mode 100644
index 000..fdf1b17cecc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/errno-global-var.c
@@ -0,0 +1,26 @@
+#include "analyzer-decls.h"
+
+/* "errno" declared as a global var.  */
+
+extern int errno;
+
+extern void external_fn (void);
+
+int test_reading_errno (void)
+{
+  return errno;
+}
+
+void test_setting_errno (int val)
+{
+  errno = val;
+}
+
+void test_storing_to_errno (int val)
+{
+  __analyzer_eval (errno == val); /* { dg-warning "UNKNOWN" } */
+  errno = val;
+  __analyzer_eval (errno == val); /* { dg-warning "TRUE" } */
+  external_fn ();
+  __analyzer_eval (errno == val); /* { dg-warning "UNKNOWN" } */  
+}
-- 
2.26.3



[committed] analyzer: eliminate region_model::impl_call_* special cases

2022-11-22 Thread David Malcolm via Gcc-patches
Eliminate all of the remaining special cases in class region_model that
handle various specific functions, replacing them with uses of
known_function subclasses.

Add various type-checks that ought to prevent ICEs for cases where
functions match the name of a standard C library or POSIX function, but
have incompatible arguments.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4246-g6bd31b33daa3c7.

gcc/analyzer/ChangeLog:
* analyzer.h (class internal_known_function): New.
(register_varargs_builtins): New decl.
* engine.cc (exploded_node::on_stmt_pre): Remove
"out_terminate_path" param from call to region_model::on_stmt_pre.
(feasibility_state::maybe_update_for_edge): Likewise.
* known-function-manager.cc: Include "basic-block.h", "gimple.h",
and "analyzer/region-model.h".
(known_function_manager::known_function_manager): Initialize
m_combined_fns_arr.
(known_function_manager::~known_function_manager): Clean up
m_combined_fns_arr.
(known_function_manager::get_by_identifier): Make const.
(known_function_manager::add): New overloaded definitions for
enum built_in_function and enum internal_fn.
(known_function_manager::get_by_fndecl): Delete.
(known_function_manager::get_match): New.
(known_function_manager::get_internal_fn): New.
(known_function_manager::get_normal_builtin): New.
* known-function-manager.h
(known_function_manager::get_by_identifier): Make private and
add const qualifier.
(known_function_manager::get_by_fndecl): Delete.
(known_function_manager::add): Add overloaded decls for
enum built_in_function name and enum internal_fn.
(known_function_manager::get_match): New decl.
(known_function_manager::get_internal_fn): New decl.
(known_function_manager::get_normal_builtin): New decl.
(known_function_manager::m_combined_fns_arr): New field.
* region-model-impl-calls.cc (call_details::arg_is_size_p): New.
(class kf_alloca): New.
(region_model::impl_call_alloca): Convert to...
(kf_alloca::impl_call_pre): ...this.
(kf_analyzer_dump_capacity::matches_call_types_p): Rewrite check
to use call_details::arg_is_pointer_p.
(region_model::impl_call_builtin_expect): Convert to...
(class kf_expect): ...this.
(class kf_calloc): New, adding check that both arguments are
size_t.
(region_model::impl_call_calloc): Convert to...
(kf_calloc::impl_call_pre): ...this.
(kf_connect::matches_call_types_p): Rewrite check to use
call_details::arg_is_pointer_p.
(region_model::impl_call_error): Convert to...
(class kf_error): ...this, and...
(kf_error::impl_call_pre): ...this.
(class kf_fgets): New, adding checks that args 0 and 2 are
pointers.
(region_model::impl_call_fgets): Convert to...
(kf_fgets::impl_call_pre): ...this.
(class kf_fread): New, adding checks on the argument types.
(region_model::impl_call_fread): Convert to...
(kf_fread::impl_call_pre): ...this.
(class kf_free): New, adding check that the argument is a pointer.
(region_model::impl_call_free): Convert to...
(kf_free::impl_call_post): ...this.
(class kf_getchar): New.
(class kf_malloc): New, adding check that the argument is a
size_t.
(region_model::impl_call_malloc): Convert to...
(kf_malloc::impl_call_pre): ...this.
(class kf_memcpy): New, adding checks on arguments.
(region_model::impl_call_memcpy): Convert to...
(kf_memcpy::impl_call_pre): ...this.
(class kf_memset): New.
(region_model::impl_call_memset): Convert to...
(kf_memset::impl_call_pre): ...this.
(kf_pipe::matches_call_types_p): Rewrite check to use
call_details::arg_is_pointer_p.
(kf_putenv::matches_call_types_p): Likewise.
(class kf_realloc): New, adding checks on the argument types.
(region_model::impl_call_realloc): Convert to...
(kf_realloc::impl_call_post): ...this.
(class kf_strchr): New.
(region_model::impl_call_strchr): Convert to...
(kf_strchr::impl_call_post): ...this.
(class kf_stack_restore): New.
(class kf_stack_save): New.
(class kf_stdio_output_fn): New.
(class kf_strcpy): New,
(region_model::impl_call_strcpy): Convert to...
(kf_strcpy::impl_call_pre): ...this.
(class kf_strlen): New.
(region_model::impl_call_strlen): Convert to...
(kf_strlen::impl_call_pre): ...this.
(class kf_ubsan_bounds): New.
(region_model::impl_deallocation_call): Reimplement to avoid call
to impl_call_free.
(register_known_functions): Add handlers for IFN_BUILTIN_EXPECT

Re: [PING] [PATCH RESEND] riscv: improve the cost model for loading a 64bit constant in rv32.

2022-11-22 Thread Jeff Law via Gcc-patches



On 11/17/22 00:32, Lin Sinan via Gcc-patches wrote:

The motivation of this patch is to correct the wrong estimation of
the number of instructions needed for loading a 64bit constant in
rv32 in the current cost model(riscv_interger_cost). According to
the current implementation, if a constant requires more than 3
instructions(riscv_const_insn and riscv_legitimate_constant_p),
then the constant will be put into constant pool when expanding
gimple to rtl(legitimate_constant_p hook and emit_move_insn).
So the inaccurate cost model leads to the suboptimal codegen
in rv32 and the wrong estimation part could be corrected through
this fix.

e.g. the current codegen for loading 0x839290001 in rv32

   lui a5,%hi(.LC0)
   lw  a0,%lo(.LC0)(a5)
   lw  a1,%lo(.LC0+4)(a5)
.LC0:
   .word   958988289
   .word   8

output after this patch

   li a0,958988288
   addi a0,a0,1
   li a1,8

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_build_integer): Handle the case of 
loading 64bit constant in rv32.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rv32-load-64bit-constant.c: New test.

Signed-off-by: Lin Sinan 
---
  gcc/config/riscv/riscv.cc | 23 +++
  .../riscv/rv32-load-64bit-constant.c  | 38 +++
  2 files changed, 61 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 32f9ef9ade9..9dffabdc5e3 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -618,6 +618,29 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
}
  }
  
+  if ((value > INT32_MAX || value < INT32_MIN) && !TARGET_64BIT)


Nit.   It's common practice to have the TARGET test first in a series of 
tests.  It may also be advisable to break this into two lines.  
Something like this:



  if ((!TARGET_64BIT)
  || value > INT32_MAX || value < INT32_MIN)


That's the style most GCC folks are more accustomed to reading.




+{
+  unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
+  struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS],
+   hicode[RISCV_MAX_INTEGER_OPS];
+  int hi_cost, lo_cost;
+
+  hi_cost = riscv_build_integer_1 (hicode, hival, mode);
+  if (hi_cost < cost)
+   {
+ lo_cost = riscv_build_integer_1 (alt_codes, loval, mode);
+ if (lo_cost + hi_cost < cost)


Just so I'm sure.  "cost" here refers strictly to other synthesized 
forms? If so, then ISTM that we'd want to generate the new style when 
lo_cost + hi_cost < cost OR when lo_cost + hi_cost is less than loading 
the constant from memory -- which is almost certainly more than "3" 
since the sequence from memory will be at least 3 instructions, two of 
which will hit memory.



Jeff



Re: [PATCH] RISC-V: Add the Zihpm and Zicntr extensions

2022-11-22 Thread Palmer Dabbelt

On Tue, 22 Nov 2022 13:50:28 PST (-0800), jeffreya...@gmail.com wrote:


On 11/22/22 08:29, Palmer Dabbelt wrote:

On Tue, 22 Nov 2022 07:20:15 PST (-0800), jeffreya...@gmail.com wrote:


On 11/20/22 18:36, Kito Cheng wrote:

So the idea here is just to define the extension so that it gets
defined
in the ISA strings and passed through to the assembler, right?

That will also define arch test marco:

https://github.com/riscv-non-isa/riscv-c-api-doc/blob/master/riscv-c-api.md#architecture-extension-test-macro



Sorry I should have been clearer and included the test macro(s) as well.

So a better summary would be that while it doesn't change the codegen
behavior in the compiler, it does provide the mechanisms to pass along
isa strings to other tools such as the assembler and signal via the test
macros that this extension is available.


IMO the important bit here is that we're not adding any compatibility
flags, like we did when fence.i was removed from the ISA.  That's fine
as long as we never remove these instructions from the base ISA in the
software, but that's what's suggested by Andrew in the post.


Right.  IIUC these instructions were never supposed to be in the base
ISA, but, in effect, snuck through.  We're retro-actively adding them as
an extension, at least in terms of ISA strings & test macros.  We're
currently (forever?) going to allow them in the assembler without
strictly requiring the extension be on.


That'd the the idea.


It's a super weird one, but there's a bunch of cases in RISC-V where
we're told to just ignore words in the ISA manual.  Definitely a trap
for users (and we already had some Linux folks get bit by the counter
changes here), but that's just how RISC-V works.


Mistakes happen.  The key is to adjust for them as best as we can.   
I'd lean towards a stricter enforcement, bringing these
instructions/extension in line with how we handle the others. It'd
potentially mean source incompatibilities that would need to be fixed,
but they shouldn't be difficult and we're still early enough in the game
that we *could* take that route.  Andrew's position is more
accommodating of existing code and while I may not 100% agree with his
position, I understand it.


So while I'd lean towards a stricter checking, I can live with this
approach.  I wouldn't mind hearing from Kito, Philipp and others though.


That's the sort of thing we've traditionally done: essentially just read 
the actual words in the PDF and produce implementations that match 
those, tagging versions when things change (the fence.i stuff is a good 
example).  After some amount of time we can then move the default spec 
version over to the new one.  That's a little bit of churn for users, 
but it shouldn't be all that bad.


IMO that's the sane way to go, I'd certainly expect to be able to read 
the words in the PDFs and go implement things according to them.  It's 
pretty clearly not what the ISA folks want, though.


There's also the secondary issue of getting ISA strings to match between 
the various bits of the software stack that uses them.  We're trying to 
move away from ISA strings as a stable uABI in Linux for exactly this 
reason, but ISA strings have already ended up all over the place so 
there's only so much we can do.


Re: [PATCH V2] Use subscalar mode to move struct block for parameter

2022-11-22 Thread Jeff Law via Gcc-patches



On 11/20/22 20:07, Jiufu Guo wrote:

Jiufu Guo  writes:


Hi,

As mentioned in the previous version patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
The suboptimal code is generated for "assigning from parameter" or
"assigning to return value".
This patch enhances the assignment from parameters like the below
cases:
/case1.c
typedef struct SA {double a[3];long l; } A;
A ret_arg (A a) {return a;}
void st_arg (A a, A *p) {*p = a;}

case2.c
typedef struct SA {double a[3];} A;
A ret_arg (A a) {return a;}
void st_arg (A a, A *p) {*p = a;}

For this patch, bootstrap and regtest pass on ppc64{,le}
and x86_64.
* Besides asking for help reviewing this patch, I would like to
consult comments about enhancing for "assigning to returns".

I updated the patch to fix the issue for returns.  This patch
adds a flag DECL_USEDBY_RETURN_P to indicate if a var is used
by a return stmt.  This patch fix the issue in expand pass only,
so, we would try to update the patch to avoid this flag.

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index dd29c03..09b8ec64cea 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars)
  frame_phase = off ? align - off : 0;
}
  
+  /* Collect VARs on returns.  */

+  if (DECL_RESULT (current_function_decl))
+{
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+   if (greturn *ret = safe_dyn_cast (last_stmt (e->src)))
+ {
+   tree val = gimple_return_retval (ret);
+   if (val && VAR_P (val))
+ DECL_USEDBY_RETURN_P (val) = 1;
+ }
+}
+
/* Set TREE_USED on all variables in the local_decls.  */
FOR_EACH_LOCAL_DECL (cfun, i, var)
  TREE_USED (var) = 1;
diff --git a/gcc/expr.cc b/gcc/expr.cc
index d9407432ea5..20973649963 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool nontemporal)
return;
  }
  
+  if ((TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from)

+   && TYPE_MODE (TREE_TYPE (from)) == BLKmode
+   && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
+  || REG_P (DECL_INCOMING_RTL (from
+  || (VAR_P (to) && DECL_USEDBY_RETURN_P (to)
+ && TYPE_MODE (TREE_TYPE (to)) == BLKmode
+ && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl)))
+  == PARALLEL))
+{
+  push_temp_slots ();
+  rtx par_ret;
+  machine_mode mode;
+  par_ret = TREE_CODE (from) == PARM_DECL
+ ? DECL_INCOMING_RTL (from)
+ : DECL_RTL (DECL_RESULT (current_function_decl));
+  mode = GET_CODE (par_ret) == PARALLEL
+  ? GET_MODE (XEXP (XVECEXP (par_ret, 0, 0), 0))
+  : word_mode;
+  int mode_size = GET_MODE_SIZE (mode).to_constant ();
+  int size = INTVAL (expr_size (from));
+
+  /* If/How the parameter using submode, it dependes on the size and
+position of the parameter.  Here using heurisitic number.  */
+  int hurstc_num = 8;


Where did this come from and what does it mean?


Note that BLKmode subword values passed in registers can be either right 
or left justified.  I think you also need to worry about endianness here.



Jeff




[PATCH] Fortran: error recovery on associate with bad selector [PR107577]

2022-11-22 Thread Harald Anlauf via Gcc-patches
Dear all,

please find attached an obvious patch by Steve for a technical
regression that resulted from improvements in error recovery
of bad uses of associate.

Regtested on x86_64-pc-linux-gnu.

Will commit soon unless there are comments.

As a sidenote: the testcase shows that we resolve the associate
names quite often, likely more often than necessary, resulting
in many error messages produced for the same line of code.  In
the present case, each use of the bad name produces two errors,
one where it is used, and one at the associate statement.
That is probably not helpful for the user.

Thanks,
Harald

From 9ff8d2ec56d139b54e2f66f747142687a38d2106 Mon Sep 17 00:00:00 2001
From: Steve Kargl 
Date: Tue, 22 Nov 2022 22:31:51 +0100
Subject: [PATCH] Fortran: error recovery on associate with bad selector
 [PR107577]

gcc/fortran/ChangeLog:

	PR fortran/107577
	* resolve.cc (find_array_spec): Choose appropriate locus either of
	bad array reference or of non-array entity in error message.

gcc/testsuite/ChangeLog:

	PR fortran/107577
	* gfortran.dg/pr107577.f90: New test.
---
 gcc/fortran/resolve.cc |  3 ++-
 gcc/testsuite/gfortran.dg/pr107577.f90 | 13 +
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr107577.f90

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 24e5aa03556..3396c6ce4a7 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -5005,8 +5005,9 @@ find_array_spec (gfc_expr *e)
   case REF_ARRAY:
 	if (as == NULL)
 	  {
+	locus loc = ref->u.ar.where.lb ? ref->u.ar.where : e->where;
 	gfc_error ("Invalid array reference of a non-array entity at %L",
-		   >u.ar.where);
+		   );
 	return false;
 	  }

diff --git a/gcc/testsuite/gfortran.dg/pr107577.f90 b/gcc/testsuite/gfortran.dg/pr107577.f90
new file mode 100644
index 000..94e6620a0ee
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr107577.f90
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! PR fortran/107577 - ICE in find_array_spec
+! Contributed by G.Steinmetz
+
+program p
+  implicit none
+  associate (y => f(4))! { dg-error "has no IMPLICIT type" }
+if (lbound (y, 1) /= 1) stop 1 ! { dg-error "Invalid array reference" }
+if (y(1) /= 1) stop 2  ! { dg-error "Invalid array reference" }
+  end associate
+end
+
+! { dg-error "has no type" " " { target *-*-* } 7 }
--
2.35.3



[committed] libstdc++: Add workaround for fs::path constraint recursion [PR106201]

2022-11-22 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to gcc-12 branch.

-- >8 --

This works around a compiler bug where overload resolution attempts
implicit conversion to path in order to call a function with a path&
parameter. Such conversion would produce a prvalue, which would not be
able to bind to the lvalue reference anyway. Attempting to check the
conversion causes a constraint recursion because the arguments to the
path constructor are checked to see if they're iterators, which checks
if they're swappable, which tries to use the swap function that
triggered the conversion in the first place.

This replaces the swap function with an abbreviated function template
that is constrained with same_as auto& so that the invalid
conversion is never considered.

libstdc++-v3/ChangeLog:

PR libstdc++/106201
* include/bits/fs_path.h (filesystem::swap(path&, path&)):
Replace with abbreviated function template.
* include/experimental/bits/fs_path.h (filesystem::swap):
Likewise.
* testsuite/27_io/filesystem/iterators/106201.cc: New test.
* testsuite/experimental/filesystem/iterators/106201.cc: New test.
---
 libstdc++-v3/include/bits/fs_path.h|  7 +++
 libstdc++-v3/include/experimental/bits/fs_path.h   |  9 -
 .../testsuite/27_io/filesystem/iterators/106201.cc | 14 ++
 .../experimental/filesystem/iterators/106201.cc| 14 ++
 4 files changed, 43 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
 create mode 100644 
libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 65682c2a185..1b4a1b69f37 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -737,7 +737,14 @@ namespace __detail
   /// @{
   /// @relates std::filesystem::path
 
+#if __cpp_concepts >= 201907L
+  // Workaround for PR libstdc++/106201
+  inline void
+  swap(same_as auto& __lhs, same_as auto& __rhs) noexcept
+  { __lhs.swap(__rhs); }
+#else
   inline void swap(path& __lhs, path& __rhs) noexcept { __lhs.swap(__rhs); }
+#endif
 
   size_t hash_value(const path& __p) noexcept;
 
diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h 
b/libstdc++-v3/include/experimental/bits/fs_path.h
index a493e17a37e..ba6acb2158d 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -537,7 +537,14 @@ namespace __detail
   /// @relates std::experimental::filesystem::path @{
 
   /// Swap overload for paths
-  inline void swap(path& __lhs, path& __rhs) noexcept { __lhs.swap(__rhs); }
+#if __cpp_concepts >= 201907L
+  // Workaround for PR libstdc++/106201
+  inline void
+  swap(same_as auto& __lhs, same_as auto& __rhs) noexcept
+  { __lhs.swap(__rhs); }
+#else
+   inline void swap(path& __lhs, path& __rhs) noexcept { __lhs.swap(__rhs); }
+#endif
 
   /// Compute a hash value for a path
   size_t hash_value(const path& __p) noexcept;
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
new file mode 100644
index 000..c5fefd9ac3f
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
@@ -0,0 +1,14 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+// { dg-require-filesystem-ts "" }
+
+// PR libstdc++/106201 constraint recursion in path(Source const&) constructor.
+
+#include 
+#include 
+#include 
+namespace fs = std::filesystem;
+using I = std::counted_iterator;
+static_assert( std::swappable );
+using R = std::counted_iterator;
+static_assert( std::swappable );
diff --git a/libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc
new file mode 100644
index 000..017b72ef5f6
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc
@@ -0,0 +1,14 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+// { dg-require-filesystem-ts "" }
+
+// PR libstdc++/106201 constraint recursion in path(Source const&) constructor.
+
+#include 
+#include 
+#include 
+namespace fs = std::experimental::filesystem;
+using I = std::counted_iterator;
+static_assert( std::swappable );
+using R = std::counted_iterator;
+static_assert( std::swappable );
-- 
2.38.1



Re: [PATCH] RISC-V: Add the Zihpm and Zicntr extensions

2022-11-22 Thread Jeff Law via Gcc-patches



On 11/22/22 08:29, Palmer Dabbelt wrote:

On Tue, 22 Nov 2022 07:20:15 PST (-0800), jeffreya...@gmail.com wrote:


On 11/20/22 18:36, Kito Cheng wrote:
So the idea here is just to define the extension so that it gets 
defined

in the ISA strings and passed through to the assembler, right?

That will also define arch test marco:

https://github.com/riscv-non-isa/riscv-c-api-doc/blob/master/riscv-c-api.md#architecture-extension-test-macro 



Sorry I should have been clearer and included the test macro(s) as well.

So a better summary would be that while it doesn't change the codegen
behavior in the compiler, it does provide the mechanisms to pass along
isa strings to other tools such as the assembler and signal via the test
macros that this extension is available.


IMO the important bit here is that we're not adding any compatibility 
flags, like we did when fence.i was removed from the ISA.  That's fine 
as long as we never remove these instructions from the base ISA in the 
software, but that's what's suggested by Andrew in the post.


Right.  IIUC these instructions were never supposed to be in the base 
ISA, but, in effect, snuck through.  We're retro-actively adding them as 
an extension, at least in terms of ISA strings & test macros.  We're 
currently (forever?) going to allow them in the assembler without 
strictly requiring the extension be on.



It's a super weird one, but there's a bunch of cases in RISC-V where 
we're told to just ignore words in the ISA manual.  Definitely a trap 
for users (and we already had some Linux folks get bit by the counter 
changes here), but that's just how RISC-V works.


Mistakes happen.  The key is to adjust for them as best as we can.    
I'd lean towards a stricter enforcement, bringing these 
instructions/extension in line with how we handle the others. It'd 
potentially mean source incompatibilities that would need to be fixed, 
but they shouldn't be difficult and we're still early enough in the game 
that we *could* take that route.  Andrew's position is more 
accommodating of existing code and while I may not 100% agree with his 
position, I understand it.



So while I'd lean towards a stricter checking, I can live with this 
approach.  I wouldn't mind hearing from Kito, Philipp and others though.



Jeff



RE: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

2022-11-22 Thread Eugene Rozenfeld via Gcc-patches
I took another look at this. We actually collect perf data when building the 
libraries. So, we have ./prev-gcc/perf.data, ./prev-libcpp/perf.data, 
./prev-libiberty/perf.data, etc. But when creating gcov data for  
-fauto-profile build of cc1plus or cc1 we only use ./prev-gcc/perf.data . So, a 
better solution would be either having a single perf.data for all builds (gcc 
and libraries) or merging perf.data files before attempting autostagefeedback. 
What would you recommend?

Thanks,

Eugene

-Original Message-
From: Jeff Law  
Sent: Tuesday, November 22, 2022 12:01 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org; 
Andi Kleen 
Subject: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

[You don't often get email from jeffreya...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On 11/21/22 14:57, Eugene Rozenfeld via Gcc-patches wrote:
> 1. Fix gcov version
> 2. Don't attempt to create an autoprofile file for cc1 since cc1plus 
> (not cc1) is not invoked when building cc1 3. Fix documentation typo
>
> Tested on x86_64-pc-linux-gnu.
>
> gcc/ChangeLog:
>
>   * c/Make-lang.in: Don't attempt to create an autoprofile file for cc1
>   * cp/Make-lang.in: Fix gcov version
>   * lto/Make-lang.in: Fix gcov version
>   * doc/install.texi: Fix documentation typo

Just to be 100% sure.  While the compiler is built with cc1plus, various 
runtime libraries are still build with the C compiler and thus would use cc1.  
AFAICT it looks like we don't try to build the runtime libraries to get any 
data about the behavior of the C compiler.  Can you confirm?


Assuming that's correct, this is fine for the trunk.


Thanks,

Jeff



Re: [PATCH v2] tree-object-size: Support strndup and strdup

2022-11-22 Thread Jeff Law via Gcc-patches



On 11/21/22 07:27, Siddhesh Poyarekar wrote:

On 2022-11-20 10:42, Jeff Law wrote:


On 11/4/22 06:48, Siddhesh Poyarekar wrote:
Use string length of input to strdup to determine the usable size of 
the

resulting object.  Avoid doing the same for strndup since there's a
chance that the input may be too large, resulting in an unnecessary
overhead or worse, the input may not be NULL terminated, resulting in a
crash where there would otherwise have been none.

gcc/ChangeLog:

* tree-object-size.cc (todo): New variable.
(object_sizes_execute): Use it.
(strdup_object_size): New function.
(call_object_size): Use it.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_strdup,
test_strndup, test_strdup_min, test_strndup_min): New tests.
(main): Call them.
* gcc.dg/builtin-dynamic-object-size-1.c: Silence overread
warnings.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
* gcc.dg/builtin-object-size-1.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-2.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-4.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.


I'm struggling to see how the SSA updating is correct.  Yes we need 
to update the virtuals due to the introduction of the call to strlen, 
particularly when SRC is not a string constant.  But do we need to do 
more?


Don't we end up gimplifying the 1 + strlenfn (src) expression? Can 
that possibly create new SSA_NAMEs?  Do those need to be put into SSA 
form? I feel like I'm missing something here...


We do all of that manually in gimplify_size_expressions, the only 
thing left to do is updating virtuals AFAICT.


I guess it's actually buried down in force_gimple_operand and I guess 
for temporaries they're not going to be alive across the new gimple 
sequence and each destination gets its own SSA_NAME, so it ought to be 
safe.  Just had to work a bit further through things.


OK for the trunk.


Thanks,
jeff




Re: [PATCH 2/5] c++: Set the locus of the function result decl

2022-11-22 Thread Jason Merrill via Gcc-patches

On 11/20/22 12:06, Bernhard Reutner-Fischer wrote:

Hi Jason!

The "meh" of result-decl-plugin-test-2.C should likely be omitted,
grokdeclarator would need some changes to add richloc hints and we would not
be able to make a reliable guess what to remove precisely.
C.f. /* Check all other uses of type modifiers.  */
Furthermore it is unrelated to DECL_RESULT so not of direct interest
here. The other tests in test-2.C, f() and huh() should work though.

I don't know if it's acceptable to change ipa-pure-const to make the
missing noreturn warning more precise and emit a fixit-hint. At least it
would be a real test for the DECL_RESULT and would spare us the plugin.


The main problem I see with that change is that the syntax of the fixit 
might be wrong for non-C-family front-ends.


Here's a version of the patch that fixes template/method handling, and 
adjusts -Waggregate-return as well:
From 5075d2ac12f655f8f83f6f3be27e2c1141e1ce99 Mon Sep 17 00:00:00 2001
From: Bernhard Reutner-Fischer 
Date: Sun, 20 Nov 2022 18:06:04 +0100
Subject: [PATCH] c++: Set the locus of the function result decl
To: gcc-patches@gcc.gnu.org

gcc/cp/ChangeLog:

	* decl.cc (grokdeclarator): Build RESULT_DECL.
	(start_preparsed_function): Copy location from template.

gcc/ChangeLog:

	* function.cc (init_function_start): Use DECL_RESULT location
	for -Waggregate-return warning.
	* ipa-pure-const.cc (suggest_attribute): Add fixit-hint for the
	noreturn attribute.

gcc/testsuite/ChangeLog:

	* c-c++-common/pr68833-1.c: Adjust noreturn warning line number.
	* gcc.dg/noreturn-1.c: Likewise.
	* g++.dg/diagnostic/return-type-loc1.C: New test.
	* g++.dg/other/resultdecl-1.C: New test.

Co-authored-by: Jason Merrill 
---
 gcc/cp/decl.cc| 26 +--
 gcc/function.cc   |  3 +-
 gcc/ipa-pure-const.cc | 14 +++-
 gcc/testsuite/c-c++-common/pr68833-1.c|  2 +-
 .../g++.dg/diagnostic/return-type-loc1.C  | 20 
 gcc/testsuite/g++.dg/other/resultdecl-1.C | 32 +++
 gcc/testsuite/gcc.dg/noreturn-1.c |  2 +-
 7 files changed, 93 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/return-type-loc1.C
 create mode 100644 gcc/testsuite/g++.dg/other/resultdecl-1.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 544efdc9914..2c5cd930e0a 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -14774,6 +14774,18 @@ grokdeclarator (const cp_declarator *declarator,
 	else if (constinit_p)
 	  DECL_DECLARED_CONSTINIT_P (decl) = true;
   }
+else if (TREE_CODE (decl) == FUNCTION_DECL)
+  {
+	location_t loc = smallest_type_location (declspecs);
+	if (loc != UNKNOWN_LOCATION)
+	  {
+	tree restype = TREE_TYPE (TREE_TYPE (decl));
+	tree resdecl = build_decl (loc, RESULT_DECL, 0, restype);
+	DECL_ARTIFICIAL (resdecl) = 1;
+	DECL_IGNORED_P (resdecl) = 1;
+	DECL_RESULT (decl) = resdecl;
+	  }
+  }
 
 /* Record constancy and volatility on the DECL itself .  There's
no need to do this when processing a template; we'll do this
@@ -17328,9 +17340,19 @@ start_preparsed_function (tree decl1, tree attrs, int flags)
 
   if (DECL_RESULT (decl1) == NULL_TREE)
 {
-  tree resdecl;
+  /* In a template instantiation, copy the return type location.  When
+	 parsing, the location will be set in grokdeclarator.  */
+  location_t loc = input_location;
+  if (DECL_TEMPLATE_INSTANTIATION (decl1)
+	  && !DECL_CXX_CONSTRUCTOR_P (decl1)
+	  && !DECL_CXX_DESTRUCTOR_P (decl1))
+	{
+	  tree tmpl = template_for_substitution (decl1);
+	  tree res = DECL_RESULT (DECL_TEMPLATE_RESULT (tmpl));
+	  loc = DECL_SOURCE_LOCATION (res);
+	}
 
-  resdecl = build_decl (input_location, RESULT_DECL, 0, restype);
+  tree resdecl = build_decl (loc, RESULT_DECL, 0, restype);
   DECL_ARTIFICIAL (resdecl) = 1;
   DECL_IGNORED_P (resdecl) = 1;
   DECL_RESULT (decl1) = resdecl;
diff --git a/gcc/function.cc b/gcc/function.cc
index 9c8773bbc59..dc333c27e92 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -4997,7 +4997,8 @@ init_function_start (tree subr)
   /* Warn if this value is an aggregate type,
  regardless of which calling convention we are using for it.  */
   if (AGGREGATE_TYPE_P (TREE_TYPE (DECL_RESULT (subr
-warning (OPT_Waggregate_return, "function returns an aggregate");
+warning_at (DECL_SOURCE_LOCATION (DECL_RESULT (subr)),
+		OPT_Waggregate_return, "function returns an aggregate");
 }
 
 /* Expand code to verify the stack_protect_guard.  This is invoked at
diff --git a/gcc/ipa-pure-const.cc b/gcc/ipa-pure-const.cc
index 572a6da274f..8f6e8f63d91 100644
--- a/gcc/ipa-pure-const.cc
+++ b/gcc/ipa-pure-const.cc
@@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-fnsummary.h"
 #include "symtab-thunks.h"
 #include "dbgcnt.h"
+#include "gcc-rich-location.h"
 
 /* Lattice values for const and pure 

Re: [PATCH] Fix count comparison in ipa-cp

2022-11-22 Thread Jeff Law via Gcc-patches



On 11/21/22 14:26, Eugene Rozenfeld via Gcc-patches wrote:

The existing comparison was incorrect for non-PRECISE counts
(e.g., AFDO): we could end up with a 0 base_count, which could
lead to asserts, e.g., in good_cloning_opportunity_p.

gcc/ChangeLog:

 * ipa-cp.cc (ipcp_propagate_stage): Fix profile count comparison.


OK.  Probably somewhat painful to pull together a reliable test for 
this, right?



Jeff




Re: [PATCH] Fix autoprofiledbootstrap build

2022-11-22 Thread Jeff Law via Gcc-patches



On 11/21/22 14:57, Eugene Rozenfeld via Gcc-patches wrote:

1. Fix gcov version
2. Don't attempt to create an autoprofile file for cc1 since cc1plus
(not cc1) is not invoked when building cc1
3. Fix documentation typo

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* c/Make-lang.in: Don't attempt to create an autoprofile file for cc1
* cp/Make-lang.in: Fix gcov version
* lto/Make-lang.in: Fix gcov version
* doc/install.texi: Fix documentation typo


Just to be 100% sure.  While the compiler is built with cc1plus, various 
runtime libraries are still build with the C compiler and thus would use 
cc1.  AFAICT it looks like we don't try to build the runtime libraries 
to get any data about the behavior of the C compiler.  Can you confirm?



Assuming that's correct, this is fine for the trunk.


Thanks,

Jeff



Re: [PATCH] testsuite: Fix missing EFFECTIVE_TARGETS variable errors

2022-11-22 Thread Maciej W. Rozycki
On Mon, 21 Nov 2022, Jeff Law wrote:

> > gcc/testsuite/
> > * lib/target-supports.exp
> > (check_effective_target_mpaired_single): Add `args' argument and
> > pass it to `check_no_compiler_messages' replacing
> > `-mpaired-single'.
> > (add_options_for_mips_loongson_mmi): Add `args' argument and
> > pass it to `check_no_compiler_messages'.
> > (check_effective_target_mips_msa): Add `args' argument and pass
> > it to `check_no_compiler_messages' replacing `-mmsa'.
> > (check_effective_target_mpaired_single_runtime)
> > (add_options_for_mpaired_single): Pass `-mpaired-single' to
> > `check_effective_target_mpaired_single'.
> > (check_effective_target_mips_loongson_mmi_runtime)
> > (add_options_for_mips_loongson_mmi): Pass `-mloongson-mmi' to
> > `check_effective_target_mips_loongson_mmi'.
> > (check_effective_target_mips_msa_runtime)
> > (add_options_for_mips_msa): Pass `-mmsa' to
> > `check_effective_target_mips_msa'.
> > (et-is-effective-target): Verify that EFFECTIVE_TARGETS exists
> > and if not, just check if the current compilation environment
> > supports the target feature requested.
> > (check_vect_support_and_set_flags): Pass `-mpaired-single',
> > `-mloongson-mmi', and `-mmsa' to the respective target feature
> > checks.
> 
> OK.

 I have committed it now, thanks for your review.

  Maciej


[committed] libstdc++: Replace std::isdigit and std::isxdigit in [PR107817]

2022-11-22 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

These functions aren't usable in constant expressions. Provide our own
implementations, based on __from_chars_alnum_to_val from .

libstdc++-v3/ChangeLog:

PR libstdc++/107817
* include/std/charconv (__from_chars_alnum_to_val): Add
constexpr for C++20.
* include/std/format (__is_digit, __is_xdigit): New functions.
(_Spec::_S_parse_width_or_precision): Use __is_digit.
(__formatter_fp::parse): Use __is_xdigit.
---
 libstdc++-v3/include/std/charconv |  2 +-
 libstdc++-v3/include/std/format   | 12 +---
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index 8f02395172f..8b2acc5bf8d 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -454,7 +454,7 @@ namespace __detail
   // If _DecOnly is false: if the character is an alphanumeric digit, then
   // return its corresponding base-36 value, otherwise return a value >= 127.
   template
-_GLIBCXX23_CONSTEXPR unsigned char
+_GLIBCXX20_CONSTEXPR unsigned char
 __from_chars_alnum_to_val(unsigned char __c)
 {
   if _GLIBCXX17_CONSTEXPR (_DecOnly)
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 7ae58eb2416..23ffbdabed8 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -358,6 +358,12 @@ namespace __format
 size_t
 __int_from_arg(const basic_format_arg<_Context>& __arg);
 
+  constexpr bool __is_digit(char __c)
+  { return std::__detail::__from_chars_alnum_to_val(__c) < 10; }
+
+  constexpr bool __is_xdigit(char __c)
+  { return std::__detail::__from_chars_alnum_to_val(__c) < 16; }
+
   template
 struct _Spec
 {
@@ -469,7 +475,7 @@ namespace __format
  unsigned short& __val, bool& __arg_id,
  basic_format_parse_context<_CharT>& __pc)
   {
-   if (std::isdigit(*__first))
+   if (__format::__is_digit(*__first))
  {
auto [__v, __ptr] = __format::__parse_integer(__first, __last);
if (!__ptr)
@@ -1537,7 +1543,7 @@ namespace __format
 
  if (__trailing_zeros)
{
- if (!std::isxdigit(__s[0]))
+ if (!__format::__is_xdigit(__s[0]))
--__sigfigs;
  __z = __prec - __sigfigs;
}
@@ -1627,7 +1633,7 @@ namespace __format
{
  __fill_char = _CharT('0');
  // Write sign before zero filling.
- if (!std::isxdigit(__narrow_str[0]))
+ if (!__format::__is_xdigit(__narrow_str[0]))
{
  *__out++ = __str[0];
  __str.remove_prefix(1);
-- 
2.38.1



[committed] libstdc++: Add testcase for fs::path constraint recursion [PR106201]

2022-11-22 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/106201
* testsuite/27_io/filesystem/iterators/106201.cc: New test.
---
 .../testsuite/27_io/filesystem/iterators/106201.cc   | 12 
 1 file changed, 12 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc

diff --git a/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
new file mode 100644
index 000..4a64e675816
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
@@ -0,0 +1,12 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+// { dg-require-filesystem-ts "" }
+
+// PR libstdc++/106201 constraint recursion in path(Source const&) constructor.
+
+#include 
+#include 
+using I = std::counted_iterator;
+static_assert( std::swappable );
+using R = std::counted_iterator;
+static_assert( std::swappable );
-- 
2.38.1



Re: [PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

2022-11-22 Thread Kees Cook via Gcc-patches
On Tue, Nov 22, 2022 at 03:02:04PM +, Qing Zhao wrote:
> 
> 
> > On Nov 22, 2022, at 9:10 AM, Qing Zhao via Gcc-patches 
> >  wrote:
> > 
> > 
> > 
> >> On Nov 22, 2022, at 3:16 AM, Richard Biener  wrote:
> >> 
> >> On Mon, 21 Nov 2022, Qing Zhao wrote:
> >> 
> >>> 
> >>> 
>  On Nov 18, 2022, at 11:31 AM, Kees Cook  wrote:
>  
>  On Fri, Nov 18, 2022 at 03:19:07PM +, Qing Zhao wrote:
> > Hi, Richard,
> > 
> > Honestly, it?s very hard for me to decide what?s the best way to handle 
> > the interaction 
> > between -fstrict-flex-array=M and -Warray-bounds=N. 
> > 
> > Ideally,  -fstrict-flex-array=M should completely control the behavior 
> > of -Warray-bounds.
> > If possible, I prefer this solution.
> > 
> > However, -Warray-bounds is included in -Wall, and has been used 
> > extensively for a long time.
> > It?s not safe to change its default behavior. 
>  
>  I prefer that -fstrict-flex-arrays controls -Warray-bounds. That
>  it is in -Wall is _good_ for this reason. :) No one is going to add
>  -fstrict-flex-arrays (at any level) without understanding what it does
>  and wanting those effects on -Warray-bounds.
> >>> 
> >>> 
> >>> The major difficulties to let -fstrict-flex-arrays controlling 
> >>> -Warray-bounds was discussed in the following threads:
> >>> 
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604133.html
> >>> 
> >>> Please take a look at the discussion and let me know your opinion.
> >> 
> >> My opinion is now, after re-considering and with seeing your new 
> >> patch, that -Warray-bounds=2 should be changed to only add
> >> "the intermediate results of pointer arithmetic that may yield out of 
> >> bounds values" and that what it considers a flex array should now
> >> be controlled by -fstrict-flex-arrays only.
> >> 
> >> That is, I think, the only thing that's not confusing to users even
> >> if that implies a change from previous behavior that we should
> >> document by clarifying the -Warray-bounds documentation as well as
> >> by adding an entry to the Caveats section of gcc-13/changes.html
> >> 
> >> That also means that =2 will get _less_ warnings with GCC 13 when
> >> the user doesn't use -fstrict-flex-arrays as well.
> > 
> > Okay.  So, this is for -Warray-bounds=2.
> > 
> > For -Warray-bounds=1 -fstrict-flex-array=N, if N > 1, should 
> > -fstrict-flex-array=N control -Warray-bounds=1?
> 
> More thinking on this. (I might misunderstand a little bit in the previous 
> email)
> 
> If I understand correctly now, what you proposed was:
> 
> 1. The level of -Warray-bounds will NOT control how a trailing array is 
> considered as a flex array member anymore. 
> 2. Only the level of -fstrict-flex-arrays will control this;
> 3. Keep the current default  behavior of -Warray-bounds on treating trailing 
> arrays as flex array member (treating all [0],[1], and [] as flexible array 
> members). 
> 4. Updating the documentation for -Warray-bounds by clarifying this change, 
> and also as an entry to the Caveats section on such change on -Warray-bounds.
> 
> If the above is correct, Yes, I like this change. Both the user interface and 
> the internal implementation will be simplified and cleaner. 
> 
> Let me know if you see any issue with my above understanding.
> 
> Thanks a lot.

FWIW, this matches what I think makes the most sense too.

-- 
Kees Cook


RE: [PATCH 35/35] arm: improve tests for vsetq_lane*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 35/35] arm: improve tests for vsetq_lane*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise.
> ---
>  .../arm/mve/intrinsics/vsetq_lane_f16.c   | 36 +++--
>  .../arm/mve/intrinsics/vsetq_lane_f32.c   | 36 +++--
>  .../arm/mve/intrinsics/vsetq_lane_s16.c   | 24 ++--
>  .../arm/mve/intrinsics/vsetq_lane_s32.c   | 24 ++--
>  .../arm/mve/intrinsics/vsetq_lane_s64.c   | 27 ++---
>  .../arm/mve/intrinsics/vsetq_lane_s8.c| 24 ++--
>  .../arm/mve/intrinsics/vsetq_lane_u16.c   | 36 +++--
>  .../arm/mve/intrinsics/vsetq_lane_u32.c   | 36 +++--
>  .../arm/mve/intrinsics/vsetq_lane_u64.c   | 39 ---
>  .../arm/mve/intrinsics/vsetq_lane_u8.c| 36 +++--
>  10 files changed, 284 insertions(+), 34 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> index e03e9620528..b5c9f4d5eb8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> @@ -1,15 +1,45 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } 
> {""} } */
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmov.16 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> +**   ...
> +*/
>  float16x8_t
>  foo (float16_t a, float16x8_t b)
>  {
> -return vsetq_lane_f16 (a, b, 0);
> +  return vsetq_lane_f16 (a, b, 1);
>  }
> 

Hmm, for these tests we should be able to scan for more specific codegen as 
we're setting individual lanes, so we should be able to scan for lane 1 in the 
vmov instruction, though it may need to be flipped for big-endian.
Thanks,
Kyrill

> -/* { dg-final { scan-assembler "vmov.16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmov.16 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> +**   ...
> +*/
> +float16x8_t
> +foo1 (float16_t a, float16x8_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/*
> +**foo2:
> +**   ...
> +**   vmov.16 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> +**   ...
> +*/
> +float16x8_t
> +foo2 (float16x8_t b)
> +{
> +  return vsetq_lane (1.1, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
> index 2b9f1a7e627..211083ce5d4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
> @@ -1,15 +1,45 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } 
> {""} } */
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmov.32 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> +**   ...
> +*/
>  float32x4_t
>  foo (float32_t a, float32x4_t b)
>  {
> -return vsetq_lane_f32 (a, b, 0);
> +  return vsetq_lane_f32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vmov.32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmov.32 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> +**   ...
> +*/
> +float32x4_t
> +foo1 (float32_t a, float32x4_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/*
> +**foo2:
> +**   ...
> +**   vmov.32 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> +**   ...
> +*/
> +float32x4_t
> +foo2 (float32x4_t b)
> +{
> +  return vsetq_lane (1.1, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
> 

RE: [PATCH 33/35] arm: improve tests and fix vrmlaldavhaq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 33/35] arm: improve tests and fix vrmlaldavhaq*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vrmlaldavhq_v4si,
>   mve_vrmlaldavhaq_v4si): Fix spacing vs tabs.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c: Improve
> test.
>   * gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md |  4 +-
>  .../arm/mve/intrinsics/vrmlaldavhaq_p_s32.c   | 24 ++-
>  .../arm/mve/intrinsics/vrmlaldavhaq_p_u32.c   | 40 ++-
>  3 files changed, 62 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index d2ffae6a425..b5e6da4b133 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -2543,7 +2543,7 @@ (define_insn "mve_vrmlaldavhq_v4si"
>VRMLALDAVHQ))
>]
>"TARGET_HAVE_MVE"
> -  "vrmlaldavh.32 %Q0, %R0, %q1, %q2"
> +  "vrmlaldavh.32\t%Q0, %R0, %q1, %q2"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -2649,7 +2649,7 @@ (define_insn "mve_vrmlaldavhaq_v4si"
>VRMLALDAVHAQ))
>]
>"TARGET_HAVE_MVE"
> -  "vrmlaldavha.32 %Q0, %R0, %q2, %q3"
> +  "vrmlaldavha.32\t%Q0, %R0, %q2, %q3"
>[(set_attr "type" "mve_move")
>  ])
> 
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
> index 263d3509771..dec4a969dfe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vrmlaldavhat.s32(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  int64_t
>  foo (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
>  {
>return vrmlaldavhaq_p_s32 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vrmlaldavhat.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vrmlaldavhat.s32(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  int64_t
>  foo1 (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
>  {
>return vrmlaldavhaq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vrmlaldavhat.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
> index 83ab68c001b..f3c8bfd121c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vrmlaldavhat.u32(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  uint64_t
>  foo (uint64_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
>  {
>return vrmlaldavhaq_p_u32 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vrmlaldavhat.u32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vrmlaldavhat.u32(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  uint64_t
>  foo1 (uint64_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
>  {
>return vrmlaldavhaq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vrmlaldavhat.u32"  }  } */
> +/*
> +**foo2:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vrmlaldavhat.u32(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
> +uint64_t
> +foo2 (uint32x4_t b, uint32x4_t c, mve_pred16_t p)
> +{
> +  return vrmlaldavhaq_p (1, b, c, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1



RE: [PATCH 34/35] arm: improve tests for vrshlq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 34/35] arm: improve tests for vrshlq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c: Improve tests.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vrshlq_m_n_s16.c   | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_m_n_s32.c   | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_m_n_s8.c| 25 +++---
>  .../arm/mve/intrinsics/vrshlq_m_n_u16.c   | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_m_n_u32.c   | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_m_n_u8.c| 25 +++---
>  .../arm/mve/intrinsics/vrshlq_m_s16.c | 26 ---
>  .../arm/mve/intrinsics/vrshlq_m_s32.c | 26 ---
>  .../arm/mve/intrinsics/vrshlq_m_s8.c  | 26 ---
>  .../arm/mve/intrinsics/vrshlq_m_u16.c | 26 ---
>  .../arm/mve/intrinsics/vrshlq_m_u32.c | 26 ---
>  .../arm/mve/intrinsics/vrshlq_m_u8.c  | 26 ---
>  .../arm/mve/intrinsics/vrshlq_n_s16.c | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_n_s32.c | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_n_s8.c  | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_n_u16.c | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_n_u32.c | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_n_u8.c  | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_s16.c   | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_s32.c   | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vrshlq_s8.c | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_u16.c   | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_u32.c   | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vrshlq_u8.c | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_x_s16.c | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_x_s32.c | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_x_s8.c  | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_x_u16.c | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_x_u32.c | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_x_u8.c  | 25 +++---
>  30 files changed, 564 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
> index cf51de6aa9c..c7d1f3a5b1c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } 

RE: [PATCH 32/35] arm: improve tests for vqsubq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 32/35] arm: improve tests for vqsubq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_s16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_s32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_s8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_u16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_u32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_u8.c:

Missing text.
Ok with ChangeLog fixed.
Kyrill

> ---
>  .../arm/mve/intrinsics/vqsubq_m_n_s16.c   | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_n_s32.c   | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_n_s8.c| 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vqsubq_m_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vqsubq_m_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vqsubq_m_s16.c | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_s32.c | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_s8.c  | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_u16.c | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_u32.c | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_u8.c  | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_n_s16.c | 16 ++-
>  .../arm/mve/intrinsics/vqsubq_n_s32.c | 16 ++-
>  .../arm/mve/intrinsics/vqsubq_n_s8.c  | 16 ++-
>  .../arm/mve/intrinsics/vqsubq_n_u16.c | 28 -
>  .../arm/mve/intrinsics/vqsubq_n_u32.c | 28 -
>  .../arm/mve/intrinsics/vqsubq_n_u8.c  | 28 -
>  .../arm/mve/intrinsics/vqsubq_s16.c   | 16 ++-
>  .../arm/mve/intrinsics/vqsubq_s32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vqsubq_s8.c | 16 ++-
>  .../arm/mve/intrinsics/vqsubq_u16.c   | 16 ++-
>  .../arm/mve/intrinsics/vqsubq_u32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vqsubq_u8.c | 16 ++-
>  24 files changed, 516 insertions(+), 72 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
> index abcff4f0e3c..39b8089919d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqsubt.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>return vqsubq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqsubt.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c
> 

RE: [PATCH 31/35] arm: improve tests for vqrdmlashq_m*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 31/35] arm: improve tests for vqrdmlashq_m*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c:
>   * gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c:
>   * gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c:

Missing ChangeLog entries.
Ok with that fixed.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vqrdmlashq_m_n_s16.c   | 34 ++-
>  .../arm/mve/intrinsics/vqrdmlashq_m_n_s32.c   | 34 ++-
>  .../arm/mve/intrinsics/vqrdmlashq_m_n_s8.c| 34 ++-
>  3 files changed, 78 insertions(+), 24 deletions(-)
> 
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
> index 35b9618ca47..da4d724bb46 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlasht.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m_n_s16 (a, b, c, p);
> +  return vqrdmlashq_m_n_s16 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlasht.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m (a, b, c, p);
> +  return vqrdmlashq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
> index 8517835eb61..2430f1cb102 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlasht.s32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m_n_s32 (a, b, c, p);
> +  return vqrdmlashq_m_n_s32 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlasht.s32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo1 (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m (a, b, c, p);
> +  return vqrdmlashq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
> index e42cc63fa74..30915b24e5e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include 

RE: [PATCH 30/35] arm: improve tests for vqrdmlahq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 30/35] arm: improve tests for vqrdmlahq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c: Improve
> test.
>   * gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vqrdmlahq_m_n_s16.c| 34 ++-
>  .../arm/mve/intrinsics/vqrdmlahq_m_n_s32.c| 34 ++-
>  .../arm/mve/intrinsics/vqrdmlahq_m_n_s8.c | 34 ++-
>  .../arm/mve/intrinsics/vqrdmlahq_n_s16.c  | 24 +
>  .../arm/mve/intrinsics/vqrdmlahq_n_s32.c  | 24 +
>  .../arm/mve/intrinsics/vqrdmlahq_n_s8.c   | 24 +
>  6 files changed, 132 insertions(+), 42 deletions(-)
> 
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
> index 70c3fa0e9b1..07d689279ac 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlaht.s16   q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m_n_s16 (a, b, c, p);
> +  return vqrdmlahq_m_n_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlaht.s16   q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m (a, b, c, p);
> +  return vqrdmlahq_m (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
> index 75ed9911276..3b02ca16038 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlaht.s32   q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m_n_s32 (a, b, c, p);
> +  return vqrdmlahq_m_n_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlaht.s32   q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo1 (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m (a, b, c, p);
> +  return vqrdmlahq_m (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c
> index ddaea545f40..b661bdcb4cf 100644
> 

RE: [PATCH 29/35] arm: improve tests for vqdmul*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 29/35] arm: improve tests for vqdmul*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c: Improve
> tests.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vqdmulhq_m_n_s16.c | 26 ---
>  .../arm/mve/intrinsics/vqdmulhq_m_n_s32.c | 26 ---
>  .../arm/mve/intrinsics/vqdmulhq_m_n_s8.c  | 26 ---
>  .../arm/mve/intrinsics/vqdmulhq_m_s16.c   | 26 ---
>  .../arm/mve/intrinsics/vqdmulhq_m_s32.c   | 26 ---
>  .../arm/mve/intrinsics/vqdmulhq_m_s8.c| 26 ---
>  .../arm/mve/intrinsics/vqdmulhq_n_s16.c   | 16 ++--
>  .../arm/mve/intrinsics/vqdmulhq_n_s32.c   | 16 ++--
>  .../arm/mve/intrinsics/vqdmulhq_n_s8.c| 16 ++--
>  .../arm/mve/intrinsics/vqdmulhq_s16.c | 16 ++--
>  .../arm/mve/intrinsics/vqdmulhq_s32.c | 16 ++--
>  .../arm/mve/intrinsics/vqdmulhq_s8.c  | 16 ++--
>  .../arm/mve/intrinsics/vqdmullbq_m_n_s16.c| 26 ---
>  .../arm/mve/intrinsics/vqdmullbq_m_n_s32.c| 26 ---
>  .../arm/mve/intrinsics/vqdmullbq_m_s16.c  | 26 ---
>  .../arm/mve/intrinsics/vqdmullbq_m_s32.c  | 26 ---
>  .../arm/mve/intrinsics/vqdmullbq_n_s16.c  | 16 ++--
>  .../arm/mve/intrinsics/vqdmullbq_n_s32.c  | 16 ++--
>  .../arm/mve/intrinsics/vqdmullbq_s16.c| 16 ++--
>  .../arm/mve/intrinsics/vqdmullbq_s32.c| 16 ++--
>  .../arm/mve/intrinsics/vqdmulltq_m_n_s16.c| 26 ---
>  .../arm/mve/intrinsics/vqdmulltq_m_n_s32.c| 26 ---
>  .../arm/mve/intrinsics/vqdmulltq_m_s16.c  | 26 ---
>  .../arm/mve/intrinsics/vqdmulltq_m_s32.c  | 26 ---
>  .../arm/mve/intrinsics/vqdmulltq_n_s16.c  | 16 ++--
>  .../arm/mve/intrinsics/vqdmulltq_n_s32.c  | 16 ++--
>  .../arm/mve/intrinsics/vqdmulltq_s16.c| 16 ++--
>  .../arm/mve/intrinsics/vqdmulltq_s32.c| 16 ++--
>  28 files changed, 504 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
> index 57ab85eaf52..a5c1a106205 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqdmulht.s16q[0-9]+, 

RE: [PATCH 28/35] arm: improve tests for vqdmlahq_m*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 28/35] arm: improve tests for vqdmlahq_m*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c: Likewise.

Ok.
Thanks,
Kyrill


> ---
>  .../arm/mve/intrinsics/vqdmlahq_m_n_s16.c | 34 ++-
>  .../arm/mve/intrinsics/vqdmlahq_m_n_s32.c | 34 ++-
>  .../arm/mve/intrinsics/vqdmlahq_m_n_s8.c  | 34 ++-
>  .../arm/mve/intrinsics/vqdmlahq_n_s16.c   | 24 +
>  .../arm/mve/intrinsics/vqdmlahq_n_s32.c   | 24 +
>  .../arm/mve/intrinsics/vqdmlahq_n_s8.c| 24 +
>  .../arm/mve/intrinsics/vqdmlashq_m_n_s16.c| 34 ++-
>  .../arm/mve/intrinsics/vqdmlashq_m_n_s32.c| 34 ++-
>  .../arm/mve/intrinsics/vqdmlashq_m_n_s8.c | 34 ++-
>  .../arm/mve/intrinsics/vqdmlashq_n_s16.c  | 24 +
>  .../arm/mve/intrinsics/vqdmlashq_n_s32.c  | 24 +
>  .../arm/mve/intrinsics/vqdmlashq_n_s8.c   | 24 +
>  12 files changed, 264 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
> index d8c4f4bab8e..94d93874542 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqdmlaht.s16q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
>  {
> -  return vqdmlahq_m_n_s16 (a, b, c, p);
> +  return vqdmlahq_m_n_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlaht.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqdmlaht.s16q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
>  {
> -  return vqdmlahq_m (a, b, c, p);
> +  return vqdmlahq_m (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlaht.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
> index 361f5d00bdf..a3dab7fa02e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqdmlaht.s32q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
>  {
> -  return vqdmlahq_m_n_s32 (a, b, c, p);
> +  return vqdmlahq_m_n_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlaht.s32"  }  } */
> 
> +/*
> +**foo1:

RE: [PATCH 27/35] arm: improve tests for vqaddq_m*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 27/35] arm: improve tests for vqaddq_m*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vqaddq_m_n_s16.c   | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_n_s32.c   | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_n_s8.c| 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vqaddq_m_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vqaddq_m_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vqaddq_m_s16.c | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_s32.c | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_s8.c  | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_u16.c | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_u32.c | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_u8.c  | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_n_s16.c | 16 ++-
>  .../arm/mve/intrinsics/vqaddq_n_s32.c | 16 ++-
>  .../arm/mve/intrinsics/vqaddq_n_s8.c  | 16 ++-
>  .../arm/mve/intrinsics/vqaddq_n_u16.c | 28 -
>  .../arm/mve/intrinsics/vqaddq_n_u32.c | 28 -
>  .../arm/mve/intrinsics/vqaddq_n_u8.c  | 28 -
>  .../arm/mve/intrinsics/vqaddq_s16.c   | 16 ++-
>  .../arm/mve/intrinsics/vqaddq_s32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vqaddq_s8.c | 16 ++-
>  .../arm/mve/intrinsics/vqaddq_u16.c   | 16 ++-
>  .../arm/mve/intrinsics/vqaddq_u32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vqaddq_u8.c | 16 ++-
>  24 files changed, 516 insertions(+), 72 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
> index 65d3f770fe2..a659373d441 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqaddt.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>return vqaddq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqaddt.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { 

RE: [PATCH 26/35] arm: improve tests for vmlasq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 26/35] arm: improve tests for vmlasq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vmlasq_m_n_s16.c   | 34 ++---
>  .../arm/mve/intrinsics/vmlasq_m_n_s32.c   | 34 ++---
>  .../arm/mve/intrinsics/vmlasq_m_n_s8.c| 34 ++---
>  .../arm/mve/intrinsics/vmlasq_m_n_u16.c   | 50 ---
>  .../arm/mve/intrinsics/vmlasq_m_n_u32.c   | 50 ---
>  .../arm/mve/intrinsics/vmlasq_m_n_u8.c| 50 ---
>  .../arm/mve/intrinsics/vmlasq_n_s16.c | 24 ++---
>  .../arm/mve/intrinsics/vmlasq_n_s32.c | 24 ++---
>  .../arm/mve/intrinsics/vmlasq_n_s8.c  | 24 ++---
>  .../arm/mve/intrinsics/vmlasq_n_u16.c | 36 ++---
>  .../arm/mve/intrinsics/vmlasq_n_u32.c | 36 ++---
>  .../arm/mve/intrinsics/vmlasq_n_u8.c  | 36 ++---
>  12 files changed, 348 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
> index bf66e616ec7..af6e588adad 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmlast.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m_n_s16 (a, b, c, p);
> +  return vmlasq_m_n_s16 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmlast.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m (a, b, c, p);
> +  return vmlasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
> index 53c21e2e5b6..9d0cc3076d9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmlast.s32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m_n_s32 (a, b, c, p);
> +  return vmlasq_m_n_s32 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: 

RE: [PATCH 25/35] arm: improve tests and fix vmlaldavaxq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 25/35] arm: improve tests and fix vmlaldavaxq*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vmlaldavaq_)
>   (mve_vmlaldavaxq_s, mve_vmlaldavaxq_p_):
> Fix
>   spacing vs tabs.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c: Improve
> tests.
>   * gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md |  6 ++--
>  .../arm/mve/intrinsics/vmlaldavaxq_p_s16.c| 32 +++
>  .../arm/mve/intrinsics/vmlaldavaxq_p_s32.c| 32 +++
>  .../arm/mve/intrinsics/vmlaldavaxq_s16.c  | 24 ++
>  .../arm/mve/intrinsics/vmlaldavaxq_s32.c  | 24 ++
>  5 files changed, 91 insertions(+), 27 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 714dc6fc7ce..d2ffae6a425 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -4163,7 +4163,7 @@ (define_insn "mve_vmlaldavaq_"
>VMLALDAVAQ))
>]
>"TARGET_HAVE_MVE"
> -  "vmlaldava.%# %Q0, %R0, %q2, %q3"
> +  "vmlaldava.%#\t%Q0, %R0, %q2, %q3"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -4179,7 +4179,7 @@ (define_insn "mve_vmlaldavaxq_s"
>VMLALDAVAXQ_S))
>]
>"TARGET_HAVE_MVE"
> -  "vmlaldavax.s%# %Q0, %R0, %q2, %q3"
> +  "vmlaldavax.s%#\t%Q0, %R0, %q2, %q3"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -6126,7 +6126,7 @@ (define_insn
> "mve_vmlaldavaxq_p_"
>VMLALDAVAXQ_P))
>]
>"TARGET_HAVE_MVE"
> -  "vpst\;vmlaldavaxt.%# %Q0, %R0, %q2, %q3"
> +  "vpst\;vmlaldavaxt.%#\t%Q0, %R0, %q2, %q3"
>[(set_attr "type" "mve_move")
> (set_attr "length""8")])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
> index f33d3880236..87f0354a636 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmlaldavaxt.s16 (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  int64_t
> -foo (int64_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo (int64_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmlaldavaxq_p_s16 (a, b, c, p);
> +  return vmlaldavaxq_p_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavaxt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmlaldavaxt.s16 (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  int64_t
> -foo1 (int64_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo1 (int64_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmlaldavaxq_p (a, b, c, p);
> +  return vmlaldavaxq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavaxt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
> index ab072a9850e..d26bf5b90af 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmlaldavaxt.s32 (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  int64_t
> -foo (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
> +foo (int64_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
>  {
> -  return vmlaldavaxq_p_s32 (a, b, c, p);
> +  return vmlaldavaxq_p_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavaxt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, 

RE: [PATCH 24/35] arm: improve tests for vmladavaq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 24/35] arm: improve tests for vmladavaq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c: Improve tests.
>   * gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vmladavaq_p_s16.c  | 33 ++---
>  .../arm/mve/intrinsics/vmladavaq_p_s32.c  | 33 ++---
>  .../arm/mve/intrinsics/vmladavaq_p_s8.c   | 33 ++---
>  .../arm/mve/intrinsics/vmladavaq_p_u16.c  | 49 ---
>  .../arm/mve/intrinsics/vmladavaq_p_u32.c  | 49 ---
>  .../arm/mve/intrinsics/vmladavaq_p_u8.c   | 49 ---
>  .../arm/mve/intrinsics/vmladavaxq_p_s16.c | 33 ++---
>  .../arm/mve/intrinsics/vmladavaxq_p_s32.c | 33 ++---
>  .../arm/mve/intrinsics/vmladavaxq_p_s8.c  | 33 ++---
>  .../arm/mve/intrinsics/vmladavaxq_s16.c   | 24 ++---
>  .../arm/mve/intrinsics/vmladavaxq_s32.c   | 24 ++---
>  .../arm/mve/intrinsics/vmladavaxq_s8.c| 24 ++---
>  12 files changed, 336 insertions(+), 81 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
> index e458204c41b..f3e5eba3b08 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmladavat.s16   (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  int32_t
> -foo (int32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo (int32_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p_s16 (a, b, c, p);
> +  return vmladavaq_p_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmladavat.s16   (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  int32_t
> -foo1 (int32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo1 (int32_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p (a, b, c, p);
> +  return vmladavaq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.s16"  }  } */
> -/* { dg-final { scan-assembler "vmladavat.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
> index e3544787adb..71f6957bfc5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmladavat.s32   (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  int32_t
> -foo (int32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
> +foo (int32_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p_s32 (a, b, c, p);
> +  return vmladavaq_p_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmladavat.s32  

RE: [PATCH 22/35] arm: improve tests for vhsubq_m*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 22/35] arm: improve tests for vhsubq_m*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vhsubq_m_n_s16.c   | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_n_s32.c   | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_n_s8.c| 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vhsubq_m_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vhsubq_m_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vhsubq_m_s16.c | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_s32.c | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_s8.c  | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_u16.c | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_u32.c | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_u8.c  | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_n_s16.c | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_n_s32.c | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_n_s8.c  | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_n_u16.c | 28 -
>  .../arm/mve/intrinsics/vhsubq_n_u32.c | 28 -
>  .../arm/mve/intrinsics/vhsubq_n_u8.c  | 28 -
>  .../arm/mve/intrinsics/vhsubq_s16.c   | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_s32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vhsubq_s8.c | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_u16.c   | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_u32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vhsubq_u8.c | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_x_n_s16.c   | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_x_n_s32.c   | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_x_n_s8.c| 26 ++--
>  .../arm/mve/intrinsics/vhsubq_x_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vhsubq_x_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vhsubq_x_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vhsubq_x_s16.c | 25 +--
>  .../arm/mve/intrinsics/vhsubq_x_s32.c | 25 +--
>  .../arm/mve/intrinsics/vhsubq_x_s8.c  | 25 +--
>  .../arm/mve/intrinsics/vhsubq_x_u16.c | 25 +--
>  

RE: [PATCH 23/35] arm: improve tests for viwdupq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 23/35] arm: improve tests for viwdupq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c: Improve tests.
>   * gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/viwdupq_m_n_u16.c  | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_m_n_u32.c  | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_m_n_u8.c   | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_m_wb_u16.c | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_m_wb_u32.c | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_m_wb_u8.c  | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_n_u16.c| 32 ++--
>  .../arm/mve/intrinsics/viwdupq_n_u32.c| 32 ++--
>  .../arm/mve/intrinsics/viwdupq_n_u8.c | 28 ++-
>  .../arm/mve/intrinsics/viwdupq_wb_u16.c   | 36 ++---
>  .../arm/mve/intrinsics/viwdupq_wb_u32.c   | 36 ++---
>  .../arm/mve/intrinsics/viwdupq_wb_u8.c| 36 ++---
>  .../arm/mve/intrinsics/viwdupq_x_n_u16.c  | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_x_n_u32.c  | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_x_n_u8.c   | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_x_wb_u16.c | 50 ---
>  .../arm/mve/intrinsics/viwdupq_x_wb_u32.c | 50 ---
>  .../arm/mve/intrinsics/viwdupq_x_wb_u8.c  | 50 ---
>  18 files changed, 658 insertions(+), 106 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
> index 0f999cc672b..67a2465f435 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   viwdupt.u16 q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), 
> #[0-9]+(?:
>   @.*|)
> +**   ...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m_n_u16 (inactive, a, b, 2, p);
> +  return viwdupq_m_n_u16 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   viwdupt.u16 q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), 
> #[0-9]+(?:
>   @.*|)
> +**   ...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m (inactive, a, b, 2, p);
> +  return viwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
> +/*
> +**foo2:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   viwdupt.u16 q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), 
> #[0-9]+(?:
>   @.*|)
> +**   ...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, mve_pred16_t p)
> +{
> +  return viwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> 

RE: [PATCH 20/35] arm: improve tests for vfmasq_m*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 20/35] arm: improve tests for vfmasq_m*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vfmasq_m_n_f16.c   | 50 ---
>  .../arm/mve/intrinsics/vfmasq_m_n_f32.c   | 50 ---
>  2 files changed, 84 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
> index 06d2d114e46..03b376c9bbe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vfmast.f16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  float16x8_t
> -foo (float16x8_t a, float16x8_t b, float16_t c, mve_pred16_t p)
> +foo (float16x8_t m1, float16x8_t m2, float16_t add, mve_pred16_t p)
>  {
> -  return vfmasq_m_n_f16 (a, b, c, p);
> +  return vfmasq_m_n_f16 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vfmast.f16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vfmast.f16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  float16x8_t
> -foo1 (float16x8_t a, float16x8_t b, float16_t c, mve_pred16_t p)
> +foo1 (float16x8_t m1, float16x8_t m2, float16_t add, mve_pred16_t p)
>  {
> -  return vfmasq_m (a, b, c, p);
> +  return vfmasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vfmast.f16"  }  } */
> +/*
> +**foo2:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vfmast.f16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
> +float16x8_t
> +foo2 (float16x8_t m1, float16x8_t m2, mve_pred16_t p)
> +{
> +  return vfmasq_m (m1, m2, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
> index bf1773d0eeb..ecf30ba9826 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vfmast.f32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  float32x4_t
> -foo (float32x4_t a, float32x4_t b, float32_t c, mve_pred16_t p)
> +foo (float32x4_t m1, float32x4_t m2, float32_t add, mve_pred16_t p)
>  {
> -  return vfmasq_m_n_f32 (a, b, c, p);
> +  return vfmasq_m_n_f32 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vfmast.f32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vfmast.f32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  float32x4_t
> -foo1 (float32x4_t a, float32x4_t b, float32_t c, mve_pred16_t p)
> +foo1 (float32x4_t m1, float32x4_t m2, float32_t add, mve_pred16_t p)
>  {
> -  return vfmasq_m (a, b, c, p);
> +  return vfmasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vfmast.f32"  }  } */
> +/*
> +**foo2:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vfmast.f32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
> +float32x4_t
> +foo2 (float32x4_t m1, float32x4_t m2, mve_pred16_t p)
> +{
> +  return vfmasq_m (m1, m2, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1



RE: [PATCH 21/35] arm: improve tests for vhaddq_m*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 21/35] arm: improve tests for vhaddq_m*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vhaddq_m_n_s16.c   | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_n_s32.c   | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_n_s8.c| 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vhaddq_m_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vhaddq_m_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vhaddq_m_s16.c | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_s32.c | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_s8.c  | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_u16.c | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_u32.c | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_u8.c  | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_n_s16.c | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_n_s32.c | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_n_s8.c  | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_n_u16.c | 28 -
>  .../arm/mve/intrinsics/vhaddq_n_u32.c | 28 -
>  .../arm/mve/intrinsics/vhaddq_n_u8.c  | 28 -
>  .../arm/mve/intrinsics/vhaddq_s16.c   | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_s32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vhaddq_s8.c | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_u16.c   | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_u32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vhaddq_u8.c | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_x_n_s16.c   | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_x_n_s32.c   | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_x_n_s8.c| 26 ++--
>  .../arm/mve/intrinsics/vhaddq_x_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vhaddq_x_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vhaddq_x_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vhaddq_x_s16.c | 25 +--
>  .../arm/mve/intrinsics/vhaddq_x_s32.c | 25 +--
>  .../arm/mve/intrinsics/vhaddq_x_s8.c  | 25 +--
>  .../arm/mve/intrinsics/vhaddq_x_u16.c | 25 +--
>  

RE: [PATCH 19/35] arm: improve tests and fix vsubq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 19/35] arm: improve tests and fix vsubq*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vsubq_n_f): Fix spacing.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vsubq_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vsubq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md |  2 +-
>  .../gcc.target/arm/mve/intrinsics/vsubq_f16.c | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vsubq_f32.c | 16 ++-
>  .../arm/mve/intrinsics/vsubq_m_f16.c  | 26 --
>  .../arm/mve/intrinsics/vsubq_m_f32.c  | 26 --
>  .../arm/mve/intrinsics/vsubq_m_n_f16.c| 42 ++--
>  .../arm/mve/intrinsics/vsubq_m_n_f32.c| 42 ++--
>  .../arm/mve/intrinsics/vsubq_m_n_s16.c| 26 --
>  .../arm/mve/intrinsics/vsubq_m_n_s32.c| 26 --
>  .../arm/mve/intrinsics/vsubq_m_n_s8.c | 26 --
>  .../arm/mve/intrinsics/vsubq_m_n_u16.c| 42 ++--
>  .../arm/mve/intrinsics/vsubq_m_n_u32.c| 42 ++--
>  .../arm/mve/intrinsics/vsubq_m_n_u8.c | 42 ++--
>  .../arm/mve/intrinsics/vsubq_m_s16.c  | 25 --
>  .../arm/mve/intrinsics/vsubq_m_s32.c  | 25 --
>  .../arm/mve/intrinsics/vsubq_m_s8.c   | 25 --
>  .../arm/mve/intrinsics/vsubq_m_u16.c  | 25 --
>  .../arm/mve/intrinsics/vsubq_m_u32.c  | 25 --
>  .../arm/mve/intrinsics/vsubq_m_u8.c   | 25 --
>  .../arm/mve/intrinsics/vsubq_n_f16.c  | 28 ++-
>  .../arm/mve/intrinsics/vsubq_n_f32.c  | 28 ++-
>  .../arm/mve/intrinsics/vsubq_n_s16.c  | 17 +--
>  

RE: [PATCH 18/35] arm: improve tests for vmulq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 18/35] arm: improve tests for vmulq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vmulq_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vmulq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../gcc.target/arm/mve/intrinsics/vmulq_f16.c | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vmulq_f32.c | 16 ++-
>  .../arm/mve/intrinsics/vmulq_m_f16.c  | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_f32.c  | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_n_f16.c| 42 +--
>  .../arm/mve/intrinsics/vmulq_m_n_f32.c| 42 +--
>  .../arm/mve/intrinsics/vmulq_m_n_s16.c| 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_n_s32.c| 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_n_s8.c | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_n_u16.c| 42 +--
>  .../arm/mve/intrinsics/vmulq_m_n_u32.c| 42 +--
>  .../arm/mve/intrinsics/vmulq_m_n_u8.c | 42 +--
>  .../arm/mve/intrinsics/vmulq_m_s16.c  | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_s32.c  | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_s8.c   | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_u16.c  | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_u32.c  | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_u8.c   | 26 ++--
>  .../arm/mve/intrinsics/vmulq_n_f16.c  | 28 -
>  .../arm/mve/intrinsics/vmulq_n_f32.c  | 28 -
>  .../arm/mve/intrinsics/vmulq_n_s16.c  | 16 ++-
>  .../arm/mve/intrinsics/vmulq_n_s32.c  | 16 ++-
>  .../arm/mve/intrinsics/vmulq_n_s8.c   | 16 ++-
>  

RE: [PATCH 17/35] arm: improve tests and fix vadd*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 17/35] arm: improve tests and fix vadd*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vaddlvq_p_v4si)
>   (mve_vaddq_n_, mve_vaddvaq_)
>   (mve_vaddlvaq_v4si, mve_vaddq_n_f)
>   (mve_vaddlvaq_p_v4si, mve_vaddq,
> mve_vaddq_f):
>   Fix spacing.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddlvq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddlvq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_u8.c: Likewise.
>   * 

RE: [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Stam Markianos-Wright  wri...@arm.com>
> Subject: [PATCH 16/35] arm: Add integer vector overloading of vsubq_x
> instrinsic
> 
> From: Stam Markianos-Wright 
> 
> In the past we had only defined the vsubq_x generic overload of the
> vsubq_x_* intrinsics for float vector types.  This would cause them
> to fall back to the `__ARM_undef` failure state if they was called
> through the generic version.
> This patch simply adds these overloads.

Ok.
Thanks,
Kyrill

> 
> gcc/ChangeLog:
> 
> * config/arm/arm_mve.h (__arm_vsubq_x FP): New overloads.
>  (__arm_vsubq_x Integer): New.
> ---
>  gcc/config/arm/arm_mve.h | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index f6b42dc3fab..09167ec118e 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -38259,6 +38259,18 @@ extern void *__ARM_undef;
>  #define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>__typeof(p2) __p2 = (p2); \
>_Generic( (int
> (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vsubq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vsubq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3), \
>int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double), p3), \
> @@ -40223,6 +40235,22 @@ extern void *__ARM_undef;
>int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld4q_u16
> (__ARM_mve_coerce1(p0, uint16_t *)), \
>int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld4q_u32
> (__ARM_mve_coerce1(p0, uint32_t *
> 
> +#define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> +  __typeof(p2) __p2 = (p2); \
> +  _Generic( (int
> (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int 

Re: [PATCH v4] LoongArch: Optimize immediate load.

2022-11-22 Thread Xi Ruoyao via Gcc-patches
On Tue, 2022-11-22 at 22:03 +0800, Xi Ruoyao via Gcc-patches wrote:
> While I still can't fully understand the immediate load issue and how
> this patch fix it, I've tested this patch (alongside the prefetch
> instruction patch) with bootstrap-ubsan.  And the compiled result of
> imm-load1.c seems OK.

And it's doing correct thing for Glibc "improved generic string
functions" patch, producing some really tight loop now.

> 
> On Thu, 2022-11-17 at 17:59 +0800, Lulu Cheng wrote:
> > v1 -> v2:
> > 1. Change the code format.
> > 2. Fix bugs in the code.
> > 
> > v2 -> v3:
> > Modifying a code implementation of an undefined behavior.
> > 
> > v3 -> v4:
> > Move the part of the immediate number decomposition from expand pass
> > to split
> > pass.
> > 
> > Both regression tests and spec2006 passed.
> > 
> > The problem mentioned in the link does not move the four immediate
> > load
> > instructions out of the loop. It has been optimized. Now, as in the
> > test case,
> > four immediate load instructions are generated outside the loop.
> > (
> > https://sourceware.org/pipermail/libc-alpha/2022-September/142202.html
> > )
> > 
> > 
> > Because loop2_invariant pass will extract the instructions that do
> > not
> > change
> > in the loop out of the loop, some instructions will not meet the
> > extraction
> > conditions if the machine performs immediate decomposition while
> > expand pass,
> > so the immediate decomposition will be transferred to the split
> > process.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch.cc (enum
> > loongarch_load_imm_method):
> > Remove the member METHOD_INSV that is not currently used.
> > (struct loongarch_integer_op): Define a new member
> > curr_value,
> > that records the value of the number stored in the
> > destination
> > register immediately after the current instruction has run.
> > (loongarch_build_integer): Assign a value to the curr_value
> > member variable.
> > (loongarch_move_integer): Adds information for the immediate
> > load instruction.
> > * config/loongarch/loongarch.md (*movdi_32bit): Redefine as
> > define_insn_and_split.
> > (*movdi_64bit): Likewise.
> > (*movsi_internal): Likewise.
> > (*movhi_internal): Likewise.
> > * config/loongarch/predicates.md: Return true as long as it
> > is
> > CONST_INT, ensure
> > that the immediate number is not optimized by decomposition
> > during expand
> > optimization loop.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/loongarch/imm-load.c: New test.
> > * gcc.target/loongarch/imm-load1.c: New test.
> > ---
> >  gcc/config/loongarch/loongarch.cc | 62 ++--
> > --
> > -
> >  gcc/config/loongarch/loongarch.md | 44 +++--
> >  gcc/config/loongarch/predicates.md    |  2 +-
> >  gcc/testsuite/gcc.target/loongarch/imm-load.c | 10 +++
> >  .../gcc.target/loongarch/imm-load1.c  | 26 
> >  5 files changed, 110 insertions(+), 34 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load.c
> >  create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load1.c
> > 
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index 8ee32c90573..9e0d6c7c3ea 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -139,22 +139,21 @@ struct loongarch_address_info
> >  
> >     METHOD_LU52I:
> >   Load 52-63 bit of the immediate number.
> > -
> > -   METHOD_INSV:
> > - immediate like 0xfff0fxxx
> > -   */
> > +*/
> >  enum loongarch_load_imm_method
> >  {
> >    METHOD_NORMAL,
> >    METHOD_LU32I,
> > -  METHOD_LU52I,
> > -  METHOD_INSV
> > +  METHOD_LU52I
> >  };
> >  
> >  struct loongarch_integer_op
> >  {
> >    enum rtx_code code;
> >    HOST_WIDE_INT value;
> > +  /* Represent the result of the immediate count of the load
> > instruction at
> > + each step.  */
> > +  HOST_WIDE_INT curr_value;
> >    enum loongarch_load_imm_method method;
> >  };
> >  
> > @@ -1475,24 +1474,27 @@ loongarch_build_integer (struct
> > loongarch_integer_op *codes,
> >  {
> >    /* The value of the lower 32 bit be loaded with one
> > instruction.
> >  lu12i.w.  */
> > -  codes[0].code = UNKNOWN;
> > -  codes[0].method = METHOD_NORMAL;
> > -  codes[0].value = low_part;
> > +  codes[cost].code = UNKNOWN;
> > +  codes[cost].method = METHOD_NORMAL;
> > +  codes[cost].value = low_part;
> > +  codes[cost].curr_value = low_part;
> >    cost++;
> >  }
> >    else
> >  {
> >    /* lu12i.w + ior.  */
> > -  codes[0].code = UNKNOWN;
> > -  codes[0].method = METHOD_NORMAL;
> > -  codes[0].value = low_part & ~(IMM_REACH - 1);
> > +  codes[cost].code = UNKNOWN;
> > +  codes[cost].method = 

Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Christophe Lyon via Gcc-patches




On 11/22/22 12:33, Richard Earnshaw wrote:



On 22/11/2022 11:21, Richard Sandiford wrote:

Richard Earnshaw via Gcc-patches  writes:

On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:

gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on
big-endian, because the _Decimal32 on-stack argument is not padded in
the same direction depending on endianness.

This patch fixes the testcase so that it expects the argument in the
right stack location, similarly to what other tests do in the same
directory.

gcc/testsuite/ChangeLog:

PR target/107604
* gcc.target/aarch64/aapcs64/test_dfp_17.c: Fix for big-endian.
---
   gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c

index 22dc462bf7c..3c45f715cf7 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
@@ -32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd };
 ANON(struct z, a, D1)
 ANON(struct z, b, STACK)
 ANON(int , 5, W0)
+#ifndef __AAPCS64_BIG_ENDIAN__
 ANON(_Decimal32, f1, STACK+32) /* Note: no promotion to 
_Decimal64.  */

+#else
+  ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to 
_Decimal64.  */

+#endif
 LAST_ANON(_Decimal64, 0.5dd, STACK+40)
   #endif


Why would a Decimal32 change stack placement based on the endianness?
Isn't it a 4-byte object?


Yes, but PARM_BOUNDARY (64) sets a minimum alignment for all stack 
arguments.


Richard


Ah, OK.


Indeed, it was not immediately obvious to me either when looking at 
aarch64_layout_arg. aarch64_function_arg_padding comes into play, too.




I wonder if we should have a new macro in the tests, something like 
ANON_PADDED to describe this case and that works things out more 
automagically for big-endian.

Maybe, there are quite a few tests under aapcs64 which have a similar
#ifndef __AAPCS64_BIG_ENDIAN__



I notice the new ANON definition is not correctly indented.

R.


Re: PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-11-22 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin"  writes:
> Hi Richard,
>
> Many thanks for your review comments!
>
 on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
> Hi,
>
> As discussed in PR98125, -fpatchable-function-entry with
> SECTION_LINK_ORDER support doesn't work well on powerpc64
> ELFv1 because the filled "Symbol" in
>
>   .section name,"flags"o,@type,Symbol
>
> sits in .opd section instead of in the function_section
> like .text or named .text*.
>
> Since we already generates one label LPFE* which sits in
> function_section of current_function_decl, this patch is
> to reuse it as the symbol for the linked_to section.  It
> avoids the above ABI specific issue when using the symbol
> concluded from current_function_decl.
>
> Besides, with this support some previous workarounds for
> powerpc64 ELFv1 can be reverted.
>
> btw, rs6000_print_patchable_function_entry can be dropped
> but there is another rs6000 patch which needs this rs6000
> specific hook rs6000_print_patchable_function_entry, not
> sure which one gets landed first, so just leave it here.
>
> Bootstrapped and regtested on below:
>
>   1) powerpc64-linux-gnu P8 with default binutils 2.27
>  and latest binutils 2.39.
>   2) powerpc64le-linux-gnu P9 (default binutils 2.30).
>   3) powerpc64le-linux-gnu P10 (default binutils 2.30).
>   4) x86_64-redhat-linux with default binutils 2.30
>  and latest binutils 2.39.
>   5) aarch64-linux-gnu  with default binutils 2.30
>  and latest binutils 2.39.
>
>
> [snip...]
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 4db8506b106..d4de6e164ee 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -6906,11 +6906,16 @@ default_elf_asm_named_section (const char *name, 
> unsigned int flags,
>   fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE);
>if (flags & SECTION_LINK_ORDER)
>   {
> -   tree id = DECL_ASSEMBLER_NAME (decl);
> -   ultimate_transparent_alias_target ();
> -   const char *name = IDENTIFIER_POINTER (id);
> -   name = targetm.strip_name_encoding (name);
> -   fprintf (asm_out_file, ",%s", name);
> +   /* For now, only section "__patchable_function_entries"
> +  adopts flag SECTION_LINK_ORDER, internal label LPFE*
> +  was emitted in default_print_patchable_function_entry,
> +  just place it here for linked_to section.  */
> +   gcc_assert (!strcmp (name, "__patchable_function_entries"));
>> 
>> I like the idea of removing the rs600 workaround in favour of making the
>> target-independent more robust.  But this seems a bit hackish.  What
>> would we do if SECTION_LINK_ORDER was used for something else in future?
>> 
>
> Good question!  I think it depends on how we can get the symbol for the
> linked_to section, if adopting the name of the decl will suffer the
> similar issue which this patch wants to fix, we have to reuse the label
> LPFE* or some kind of new artificial label in the related section; or
> we can just go with the name of the given decl, or something related to
> that decl.  Since we can't predict any future uses, I just placed an
> assertion here to ensure that we would revisit and adjust this part at
> that time.  Does it sound reasonable to you?

Yeah, I guess that's good enough.  If the old scheme ends up being
correct for some future use, we can make the new behaviour conditional
on __patchable_function_entries.

So yeah, the patch LGTM to me, thanks.

Richard


RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-11-22 Thread Tamar Christina via Gcc-patches
Ping

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Friday, November 11, 2022 2:40 PM
> To: Richard Sandiford 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
> 
> Hi,
> 
> 
> > This name might cause confusion with the SVE iterators, where FULL
> > means "every bit of the register is used".  How about something like
> > VMOVE instead?
> >
> > With this change, I guess VALL_F16 represents "The set of all modes
> > for which the vld1 intrinsics are provided" and VMOVE or whatever is
> > "All Advanced SIMD modes suitable for moving, loading, and storing".
> > That is, VMOVE extends VALL_F16 with modes that are not manifested via
> > intrinsics.
> >
> 
> Done.
> 
> > Where is the 2h used, and is it valid syntax in that context?
> >
> > Same for later instances of 2h.
> 
> They are, but they weren't meant to be in this patch.  They belong in a
> separate FP16 series that I won't get to finish for GCC 13 due not being able
> to finish writing all the tests.  I have moved them to that patch series 
> though.
> 
> While the addp patch series has been killed, this patch is still good 
> standalone
> and improves codegen as shown in the updated testcase.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
>   (mov, movmisalign, aarch64_dup_lane,
>   aarch64_store_lane0, aarch64_simd_vec_set,
>   @aarch64_simd_vec_copy_lane, vec_set,
>   reduc__scal_, reduc__scal_,
>   aarch64_reduc__internal,
> aarch64_get_lane,
>   vec_init, vec_extract): Support V2HF.
>   (aarch64_simd_dupv2hf): New.
>   * config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
>   Add E_V2HFmode.
>   * config/aarch64/iterators.md (VHSDF_P): New.
>   (V2F, VMOVE, nunits, Vtype, Vmtype, Vetype, stype, VEL,
>   Vel, q, vp): Add V2HF.
>   * config/arm/types.md (neon_fp_reduc_add_h): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/sve/slp_1.c: Update testcase.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index
> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..487a31010245accec28e779661
> e6c2d578fca4b7 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -19,10 +19,10 @@
>  ;; .
> 
>  (define_expand "mov"
> -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> - (match_operand:VALL_F16 1 "general_operand"))]
> +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> + (match_operand:VMOVE 1 "general_operand"))]
>"TARGET_SIMD"
> -  "
> +{
>/* Force the operand into a register if it is not an
>   immediate whose use can be replaced with xzr.
>   If the mode is 16 bytes wide, then we will be doing @@ -46,12 +46,11 @@
> (define_expand "mov"
>aarch64_expand_vector_init (operands[0], operands[1]);
>DONE;
>  }
> -  "
> -)
> +})
> 
>  (define_expand "movmisalign"
> -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> -(match_operand:VALL_F16 1 "general_operand"))]
> +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> +(match_operand:VMOVE 1 "general_operand"))]
>"TARGET_SIMD && !STRICT_ALIGNMENT"
>  {
>/* This pattern is not permitted to fail during expansion: if both 
> arguments
> @@ -73,6 +72,16 @@ (define_insn "aarch64_simd_dup"
>[(set_attr "type" "neon_dup, neon_from_gp")]
>  )
> 
> +(define_insn "aarch64_simd_dupv2hf"
> +  [(set (match_operand:V2HF 0 "register_operand" "=w")
> + (vec_duplicate:V2HF
> +   (match_operand:HF 1 "register_operand" "0")))]
> +  "TARGET_SIMD"
> +  "@
> +   sli\\t%d0, %d1, 16"
> +  [(set_attr "type" "neon_shift_imm")]
> +)
> +
>  (define_insn "aarch64_simd_dup"
>[(set (match_operand:VDQF_F16 0 "register_operand" "=w,w")
>   (vec_duplicate:VDQF_F16
> @@ -85,10 +94,10 @@ (define_insn "aarch64_simd_dup"
>  )
> 
>  (define_insn "aarch64_dup_lane"
> -  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
> - (vec_duplicate:VALL_F16
> +  [(set (match_operand:VMOVE 0 "register_operand" "=w")
> + (vec_duplicate:VMOVE
> (vec_select:
> - (match_operand:VALL_F16 1 "register_operand" "w")
> + (match_operand:VMOVE 1 "register_operand" "w")
>   (parallel [(match_operand:SI 2 "immediate_operand" "i")])
>)))]
>"TARGET_SIMD"
> @@ -142,6 +151,29 @@ (define_insn
> "*aarch64_simd_mov"
>mov_reg, neon_move")]
>  )
> 
> +(define_insn "*aarch64_simd_movv2hf"
> +  [(set (match_operand:V2HF 0 "nonimmediate_operand"
> + "=w, m,  m,  w, ?r, ?w, ?r, w, w")
> + 

Re: [PATCH] RISC-V: Add the Zihpm and Zicntr extensions

2022-11-22 Thread Palmer Dabbelt

On Tue, 22 Nov 2022 07:20:15 PST (-0800), jeffreya...@gmail.com wrote:


On 11/20/22 18:36, Kito Cheng wrote:

So the idea here is just to define the extension so that it gets defined
in the ISA strings and passed through to the assembler, right?

That will also define arch test marco:

https://github.com/riscv-non-isa/riscv-c-api-doc/blob/master/riscv-c-api.md#architecture-extension-test-macro


Sorry I should have been clearer and included the test macro(s) as well.

So a better summary would be that while it doesn't change the codegen
behavior in the compiler, it does provide the mechanisms to pass along
isa strings to other tools such as the assembler and signal via the test
macros that this extension is available.


IMO the important bit here is that we're not adding any compatibility 
flags, like we did when fence.i was removed from the ISA.  That's fine 
as long as we never remove these instructions from the base ISA in the 
software, but that's what's suggested by Andrew in the post.



If so I think that it meets Andrew's requirements and at least some of
those issues raised by Jim.   But I'm not sure it can address your
concern WRT consistency.  In fact, I don't really see a way to address
that concern with option #2 which Andrew seems to think is the only
reasonable path forward from an RVI standpoint.


I'm at a loss for next steps, particularly as the newbie in this world.


It's a super weird one, but there's a bunch of cases in RISC-V where 
we're told to just ignore words in the ISA manual.  Definitely a trap 
for users (and we already had some Linux folks get bit by the counter 
changes here), but that's just how RISC-V works.


Re: [PATCH] RISC-V: Add the Zihpm and Zicntr extensions

2022-11-22 Thread Jeff Law via Gcc-patches



On 11/20/22 18:36, Kito Cheng wrote:

So the idea here is just to define the extension so that it gets defined
in the ISA strings and passed through to the assembler, right?

That will also define arch test marco:

https://github.com/riscv-non-isa/riscv-c-api-doc/blob/master/riscv-c-api.md#architecture-extension-test-macro


Sorry I should have been clearer and included the test macro(s) as well.

So a better summary would be that while it doesn't change the codegen 
behavior in the compiler, it does provide the mechanisms to pass along 
isa strings to other tools such as the assembler and signal via the test 
macros that this extension is available.



If so I think that it meets Andrew's requirements and at least some of 
those issues raised by Jim.   But I'm not sure it can address your 
concern WRT consistency.  In fact, I don't really see a way to address 
that concern with option #2 which Andrew seems to think is the only 
reasonable path forward from an RVI standpoint.



I'm at a loss for next steps, particularly as the newbie in this world.


jeff




Re: [PATCH] c++: Fix up -fcontract* options

2022-11-22 Thread Jason Merrill via Gcc-patches

On 11/21/22 18:00, Jakub Jelinek wrote:

Hi!

I've noticed
+FAIL: compiler driver --help=c++ option(s): "^ +-.*[^:.]\$" absent from output: "  
-fcontract-build-level=[off|default|audit] Specify max contract level to generate runtime checks 
for"
error, this is due to missing dot at the end of the description.

The second part of the first hunk should fix that, but while at it,
I find it weird that some options don't have RejectNegative, yet
for options that accept an argument a negative option looks weird
and isn't really handled.


OK.


Though, shall we have those [on|off] options at all?
Those are inconsistent with all other boolean options gcc has.
Every other boolean option is -fwhatever for it being on
and -fno-whatever for it being off, shouldn't the options be
without arguments and accept negatives (-fcontract-assumption-mode
vs. -fno-contract-assumption-mode etc.)?


True, but I think let's leave them alone for now, they'll probably all 
be replaced as the feature specification evolves.



2022-11-21  Jakub Jelinek  

* c.opt (fcontract-assumption-mode=, fcontract-continuation-mode=,
fcontract-role=, fcontract-semantic=): Add RejectNegative.
(fcontract-build-level=): Terminate description with dot.

--- gcc/c-family/c.opt.jj   2022-11-19 09:21:14.31706 +0100
+++ gcc/c-family/c.opt  2022-11-21 23:51:55.605736499 +0100
@@ -1692,12 +1692,12 @@ EnumValue
  Enum(on_off) String(on) Value(1)
  
  fcontract-assumption-mode=

-C++ Joined
+C++ Joined RejectNegative
  -fcontract-assumption-mode=[on|off]   Enable or disable treating axiom level 
contracts as assumptions (default on).
  
  fcontract-build-level=

  C++ Joined RejectNegative
--fcontract-build-level=[off|default|audit] Specify max contract level to 
generate runtime checks for
+-fcontract-build-level=[off|default|audit] Specify max contract level to 
generate runtime checks for.
  
  fcontract-strict-declarations=

  C++ Var(flag_contract_strict_declarations) Enum(on_off) Joined Init(0) 
RejectNegative
@@ -1708,15 +1708,15 @@ C++ Var(flag_contract_mode) Enum(on_off)
  -fcontract-mode=[on|off]  Enable or disable all contract facilities 
(default on).
  
  fcontract-continuation-mode=

-C++ Joined
+C++ Joined RejectNegative
  -fcontract-continuation-mode=[on|off] Enable or disable contract continuation 
mode (default off).
  
  fcontract-role=

-C++ Joined
+C++ Joined RejectNegative
  -fcontract-role=:Specify the semantics for all 
levels in a role (default, review), or a custom contract role with given semantics (ex: 
opt:assume,assume,assume)
  
  fcontract-semantic=

-C++ Joined
+C++ Joined RejectNegative
  -fcontract-semantic=:Specify the concrete semantics for 
level
  
  fcoroutines


Jakub





Re: [PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

2022-11-22 Thread Qing Zhao via Gcc-patches



> On Nov 22, 2022, at 9:10 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> 
> 
>> On Nov 22, 2022, at 3:16 AM, Richard Biener  wrote:
>> 
>> On Mon, 21 Nov 2022, Qing Zhao wrote:
>> 
>>> 
>>> 
 On Nov 18, 2022, at 11:31 AM, Kees Cook  wrote:
 
 On Fri, Nov 18, 2022 at 03:19:07PM +, Qing Zhao wrote:
> Hi, Richard,
> 
> Honestly, it?s very hard for me to decide what?s the best way to handle 
> the interaction 
> between -fstrict-flex-array=M and -Warray-bounds=N. 
> 
> Ideally,  -fstrict-flex-array=M should completely control the behavior of 
> -Warray-bounds.
> If possible, I prefer this solution.
> 
> However, -Warray-bounds is included in -Wall, and has been used 
> extensively for a long time.
> It?s not safe to change its default behavior. 
 
 I prefer that -fstrict-flex-arrays controls -Warray-bounds. That
 it is in -Wall is _good_ for this reason. :) No one is going to add
 -fstrict-flex-arrays (at any level) without understanding what it does
 and wanting those effects on -Warray-bounds.
>>> 
>>> 
>>> The major difficulties to let -fstrict-flex-arrays controlling 
>>> -Warray-bounds was discussed in the following threads:
>>> 
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604133.html
>>> 
>>> Please take a look at the discussion and let me know your opinion.
>> 
>> My opinion is now, after re-considering and with seeing your new 
>> patch, that -Warray-bounds=2 should be changed to only add
>> "the intermediate results of pointer arithmetic that may yield out of 
>> bounds values" and that what it considers a flex array should now
>> be controlled by -fstrict-flex-arrays only.
>> 
>> That is, I think, the only thing that's not confusing to users even
>> if that implies a change from previous behavior that we should
>> document by clarifying the -Warray-bounds documentation as well as
>> by adding an entry to the Caveats section of gcc-13/changes.html
>> 
>> That also means that =2 will get _less_ warnings with GCC 13 when
>> the user doesn't use -fstrict-flex-arrays as well.
> 
> Okay.  So, this is for -Warray-bounds=2.
> 
> For -Warray-bounds=1 -fstrict-flex-array=N, if N > 1, should 
> -fstrict-flex-array=N control -Warray-bounds=1?

More thinking on this. (I might misunderstand a little bit in the previous 
email)

If I understand correctly now, what you proposed was:

1. The level of -Warray-bounds will NOT control how a trailing array is 
considered as a flex array member anymore. 
2. Only the level of -fstrict-flex-arrays will control this;
3. Keep the current default  behavior of -Warray-bounds on treating trailing 
arrays as flex array member (treating all [0],[1], and [] as flexible array 
members). 
4. Updating the documentation for -Warray-bounds by clarifying this change, and 
also as an entry to the Caveats section on such change on -Warray-bounds.

If the above is correct, Yes, I like this change. Both the user interface and 
the internal implementation will be simplified and cleaner. 

Let me know if you see any issue with my above understanding.

Thanks a lot.

Qing

> 
> Qing
> 
>> 
>> Richard.
>> 
>> -- 
>> Richard Biener 
>> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
>> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
>> HRB 36809 (AG Nuernberg)



Re: [PATCH] 8/19 modula2 front end: libgm2 contents

2022-11-22 Thread Gaius Mulley via Gcc-patches
Richard Biener  writes:

> On Mon, Oct 10, 2022 at 5:35 PM Gaius Mulley via Gcc-patches
>  wrote:
>>
>>
>>
>> This patch set consists of the libgm2 makefile, autoconf sources
>> necessary to build the libm2pim, libm2iso, libm2min, libm2cor
>> and libm2log.
>
> This looks OK.

Thanks!

> I suppose it was also tested building a cross-compiler?

yes it builds a cross compiler tool chain targetting aarch64, hosted and
built on amd64 gnu linux.

> Can we get some up-to-date status on the build and support status for the
> list of primary and secondary platforms we list on
> https://gcc.gnu.org/gcc-13/criteria.html?

Primary platform summary


aarch64-none-linux-gnu  bootstrapped 6 reg failures
arm-linux-gnueabi   (still building)
i686-pc-linux-gnu   bootstrapped 7 reg failures
powerpc64-unknown-linux-gnu bootstrapped 12 reg failures
powerpc64le-unknown-linux-gnu   bootstrapped 18 reg failures
sparc-sun-solaris2.11   (still building)
x86_64-pc-linux-gnu bootstrapped 6 reg failures
(tumbleweed and bullseye)


there are six regression tests failures common to all platforms (one
test failing with 6 option permutations and with a reasonably obvious
fix will be purged soon)


i586-unknown-freebsdfails at:

ctype_members.cc:137:3: error: redefinition of ‘bool 
std::ctype::do_is(std::ctype_base::mask, char_type) const’
  137 |   ctype::
  |   ^~
In file included from 
/home/gaius/GM2/graft-combine/build-devel-modula2-enabled/i586-unknown-freebsd13.0/libstdc++-v3/include/bits/locale_facets.h:1546,
 from 
/home/gaius/GM2/graft-combine/build-devel-modula2-enabled/i586-unknown-freebsd13.0/libstdc++-v3/include/locale:42,
 from ctype_members.cc:31:


regards,
Gaius


Re: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-11-22 Thread Jeff Law via Gcc-patches



On 11/22/22 04:08, Richard Biener via Gcc-patches wrote:

On Tue, 22 Nov 2022, Richard Sandiford wrote:


Tamar Christina  writes:

-Original Message-
From: Richard Biener 
Sent: Tuesday, November 22, 2022 10:59 AM
To: Richard Sandiford 
Cc: Tamar Christina via Gcc-patches ; Tamar
Christina ; Richard Biener
; nd 
Subject: Re: [PATCH 1/8]middle-end: Recognize scalar reductions from
bitfields and array_refs

On Tue, 22 Nov 2022, Richard Sandiford wrote:


Tamar Christina via Gcc-patches  writes:

So it's not easily possible the within current infrastructure.  But
it does look like ARM might eventually benefit from something like STV

on x86?

I'm not sure.  The problem with trying to do this in RTL is that
you'd have to be able to decide from two psuedos whether they come
from extracts that are sequential. When coming in from a hard
register that's easy yes.  When coming in from a load, or any other

operation that produces psuedos that becomes harder.

Yeah.

Just in case anyone reading the above is tempted to implement STV for
AArch64: I think it would set a bad precedent if we had a
paste-&-adjust version of the x86 pass.  AFAIK, the target
capabilities and constraints are mostly modelled correctly using
existing mechanisms, so I don't think there's anything particularly
target-specific about the process of forcing things to be on the general or

SIMD/FP side.

So if we did have an STV-ish thing for AArch64, I think it should be a
target-independent pass that uses hooks and recog, even if the pass is
initially enabled for AArch64 only.

Agreed - maybe some of the x86 code can be leveraged, but of course the
cost modeling is the most difficult to get right - IIRC the x86 backend resorts
to backend specific tuning flags rather than trying to get rtx_cost or insn_cost
"correct" here.


(FWIW, on the patch itself, I tend to agree that this is really an SLP
optimisation.  If the vectoriser fails to see the benefit, or if it
fails to handle more complex cases, then it would be good to try to
fix that.)

Also agreed - but costing is hard ;)

I guess, I still disagree here but I've clearly been out-Richard.  The problem 
is still
that this is just basic codegen.  I still don't think it requires -O2 to be 
usable.

So I guess the only correct implementation is to use an STV-like patch.  But 
given
that this is already the second attempt, first RTL one was rejected by Richard,
second GIMPLE one was rejected by Richi I'd like to get an agreement on this STV
thing before I waste months more..

I don't think this in itself is a good motivation for STV.  My comment
above was more about the idea of STV for AArch64 in general (since it
had been raised).

Personally I still think the reduction should be generated in gimple.

I agree, and the proper place to generate the reduction is in SLP.


Sorry to have sent things astray with my earlier ACK.  It looked 
reasonable to me.


jeff



[pushed] c++: don't use strchrnul [PR107781]

2022-11-22 Thread Jason Merrill via Gcc-patches
As Jonathan suggested.

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

The contracts implementation was using strchrnul, which is a glibc
extension, so bootstrap broke on non-glibc targets.  Use C89 strcspn
instead.

PR c++/107781

gcc/cp/ChangeLog:

* contracts.cc (role_name_equal): Use strcspn instead
of strchrnul.
---
 gcc/cp/contracts.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
index f3afcc62ba0..a9097016768 100644
--- a/gcc/cp/contracts.cc
+++ b/gcc/cp/contracts.cc
@@ -210,8 +210,8 @@ lookup_concrete_semantic (const char *name)
 static bool
 role_name_equal (const char *role, const char *name)
 {
-  size_t role_len = strchrnul (role, ':') - role;
-  size_t name_len = strchrnul (name, ':') - name;
+  size_t role_len = strcspn (role, ":");
+  size_t name_len = strcspn (name, ":");
   if (role_len != name_len)
 return false;
   return strncmp (role, name, role_len) == 0;

base-commit: 4eb3a48698b2ca43967a4e7e7cfc0408192e85b2
-- 
2.31.1



Re: [PATCH] ipa-cp: Do not be too optimistic about self-recursive edges (PR 107661)

2022-11-22 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> PR 107661 shows that function push_agg_values_for_index_from_edge
> should not attempt to optimize self-recursive call graph edges when
> called from cgraph_edge_brings_all_agg_vals_for_node.  Unlike when
> being called from find_aggregate_values_for_callers_subset, we cannot
> expect that any cloning for constants would lead to the edge leading
> from a new clone to the same new clone, in this case it would only be
> redirected to a new callee.
> 
> Fixed by adding a parameter to push_agg_values_from_edge whether being
> optimistic about self-recursive edges is possible.
> 
> Bootstrapped, LTO-bootstrapped and tested on x86_64-linux.  OK for
> trunk?
OK,
thanks!
Honya
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2022-11-22  Martin Jambor  
> 
>   PR ipa/107661
>   * ipa-cp.cc (push_agg_values_from_edge): New parameter
>   optimize_self_recursion, use it to decide whether to pass interim to
>   the helper function.
>   (find_aggregate_values_for_callers_subset): Pass true in the new
>   parameter of push_agg_values_from_edge.
>   (cgraph_edge_brings_all_agg_vals_for_node): Pass false in the new
>   parameter of push_agg_values_from_edge.
> 
> gcc/testsuite/ChangeLog:
> 
> 2022-11-22  Martin Jambor  
> 
>   PR ipa/107661
>   * g++.dg/ipa/pr107661.C: New test.
> ---
>  gcc/ipa-cp.cc   | 18 +++-
>  gcc/testsuite/g++.dg/ipa/pr107661.C | 45 +
>  2 files changed, 56 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr107661.C
> 
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index d2bcd5e5e69..f0feb4beb8f 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -5752,14 +5752,16 @@ push_agg_values_for_index_from_edge (struct 
> cgraph_edge *cs, int index,
> description of ultimate callee of CS or the one it was cloned from (the
> summary where lattices are).  If INTERIM is non-NULL, it contains the
> current interim state of collected aggregate values which can be used to
> -   compute values passed over self-recursive edges and to skip values which
> -   clearly will not be part of intersection with INTERIM.  */
> +   compute values passed over self-recursive edges (if 
> OPTIMIZE_SELF_RECURSION
> +   is true) and to skip values which clearly will not be part of intersection
> +   with INTERIM.  */
>  
>  static void
>  push_agg_values_from_edge (struct cgraph_edge *cs,
>  ipa_node_params *dest_info,
>  vec *res,
> -const ipa_argagg_value_list *interim)
> +const ipa_argagg_value_list *interim,
> +bool optimize_self_recursion)
>  {
>ipa_edge_args *args = ipa_edge_args_sum->get (cs);
>if (!args)
> @@ -5785,7 +5787,9 @@ push_agg_values_from_edge (struct cgraph_edge *cs,
>ipcp_param_lattices *plats = ipa_get_parm_lattices (dest_info, index);
>if (plats->aggs_bottom)
>   continue;
> -  push_agg_values_for_index_from_edge (cs, index, res, interim);
> +  push_agg_values_for_index_from_edge (cs, index, res,
> +optimize_self_recursion ? interim
> +: NULL);
>  }
>  }
>  
> @@ -5804,7 +5808,7 @@ find_aggregate_values_for_callers_subset (struct 
> cgraph_node *node,
>/* gather_edges_for_value puts a non-recursive call into the first element 
> of
>   callers if it can.  */
>auto_vec interim;
> -  push_agg_values_from_edge (callers[0], dest_info, , NULL);
> +  push_agg_values_from_edge (callers[0], dest_info, , NULL, true);
>  
>unsigned valid_entries = interim.length ();
>if (!valid_entries)
> @@ -5815,7 +5819,7 @@ find_aggregate_values_for_callers_subset (struct 
> cgraph_node *node,
>  {
>auto_vec last;
>ipa_argagg_value_list avs ();
> -  push_agg_values_from_edge (callers[i], dest_info, , );
> +  push_agg_values_from_edge (callers[i], dest_info, , , true);
>  
>valid_entries = intersect_argaggs_with (interim, last);
>if (!valid_entries)
> @@ -5882,7 +5886,7 @@ cgraph_edge_brings_all_agg_vals_for_node (struct 
> cgraph_edge *cs,
>ipa_node_params *dest_info = ipa_node_params_sum->get (node);
>gcc_checking_assert (dest_info->ipcp_orig_node);
>dest_info = ipa_node_params_sum->get (dest_info->ipcp_orig_node);
> -  push_agg_values_from_edge (cs, dest_info, _values, );
> +  push_agg_values_from_edge (cs, dest_info, _values, , false);
>const ipa_argagg_value_list avl (_values);
>return avl.superset_of_p (existing);
>  }
> diff --git a/gcc/testsuite/g++.dg/ipa/pr107661.C 
> b/gcc/testsuite/g++.dg/ipa/pr107661.C
> new file mode 100644
> index 000..cc6f8538dbf
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ipa/pr107661.C
> @@ -0,0 +1,45 @@
> +/* { dg-do run  { target c++11 } } */
> +/* { dg-options "-O1 -fipa-cp -fipa-cp-clone" } */
> +
> +struct R {} 

Re: [PATCH] AArch64: Add fma_reassoc_width [PR107413]

2022-11-22 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Hi Richard,
>
>> I guess an obvious question is: if 1 (rather than 2) was the right value
>> for cores with 2 FMA pipes, why is 4 the right value for cores with 4 FMA
>> pipes?  It would be good to clarify how, conceptually, the core property
>> should map to the fma_reassoc_width value.
>
> 1 turns off reassociation so that FMAs get properly formed. After 
> reassociation far
> fewer FMAs get formed so we end up with more FLOPS which means slower 
> execution.
> It's a significant slowdown on cores that are not wide, have only 1 or 2 FP 
> pipes and
> may have high FP latencies. So we turn it off by default on all older cores.
>
>> It sounds from the existing comment like the main motivation for returning 1
>> was to encourage more FMAs to be formed, rather than to prevent FMAs from
>> being reassociated.  Is that no longer an issue?  Or is the point that,
>> with more FMA pipes, lower FMA formation is a price worth paying for
>> the better parallelism we get when FMAs can be formed?
>
> Exactly. A wide CPU can deal with the extra instructions, so the loss from 
> fewer
> FMAs ends up lower than the speedup from the extra parallelism. Having more 
> FMAs
> will be even faster of course.

Thanks.  It would be good to put this in a comment somewhere, perhaps above
the fma_reassoc_width field.  It isn't obvious from the patch as posted,
and changing the existing comment drops the previous hint about what
was going on.

>
>> Does this code ever see opc == FMA?
>
> No, that's the problem, reassociation ignores the fact that we actually want 
> FMAs.

Yeah, but I was wondering if later code would sometimes query this
hook for existing FMAs, even if that code wasn't the focus of the patch.
Once we add the distinction between FMAs and other ops, it seemed natural
to test for existing FMAs.

But of course, FMA is an rtl code rather than a tree code (oops), so that
was never going to happen.

> A smart reassociation pass could form more FMAs while also increasing
> parallelism, but the way it currently works always results in fewer FMAs.

Yeah, as Richard said, that seems the right long-term fix.
It would also avoid the hack of treating PLUS_EXPR as a signal
of an FMA, which has the drawback of assuming (for 2-FMA cores)
that plain addition never benefits from reassociation in its own right.

Still, I guess the hackiness is pre-existing and the patch is removing
the hackiness for some cores, so from that point of view it's a strict
improvement over the status quo.  And it's too late in the GCC 13
cycle to do FMA reassociation properly.  So I'm OK with the patch
in principle, but could you post an update with more commentary?

Thanks,
Richard


[PATCH] ipa-cp: Do not be too optimistic about self-recursive edges (PR 107661)

2022-11-22 Thread Martin Jambor
Hi,

PR 107661 shows that function push_agg_values_for_index_from_edge
should not attempt to optimize self-recursive call graph edges when
called from cgraph_edge_brings_all_agg_vals_for_node.  Unlike when
being called from find_aggregate_values_for_callers_subset, we cannot
expect that any cloning for constants would lead to the edge leading
from a new clone to the same new clone, in this case it would only be
redirected to a new callee.

Fixed by adding a parameter to push_agg_values_from_edge whether being
optimistic about self-recursive edges is possible.

Bootstrapped, LTO-bootstrapped and tested on x86_64-linux.  OK for
trunk?

Thanks,

Martin


gcc/ChangeLog:

2022-11-22  Martin Jambor  

PR ipa/107661
* ipa-cp.cc (push_agg_values_from_edge): New parameter
optimize_self_recursion, use it to decide whether to pass interim to
the helper function.
(find_aggregate_values_for_callers_subset): Pass true in the new
parameter of push_agg_values_from_edge.
(cgraph_edge_brings_all_agg_vals_for_node): Pass false in the new
parameter of push_agg_values_from_edge.

gcc/testsuite/ChangeLog:

2022-11-22  Martin Jambor  

PR ipa/107661
* g++.dg/ipa/pr107661.C: New test.
---
 gcc/ipa-cp.cc   | 18 +++-
 gcc/testsuite/g++.dg/ipa/pr107661.C | 45 +
 2 files changed, 56 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr107661.C

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index d2bcd5e5e69..f0feb4beb8f 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -5752,14 +5752,16 @@ push_agg_values_for_index_from_edge (struct cgraph_edge 
*cs, int index,
description of ultimate callee of CS or the one it was cloned from (the
summary where lattices are).  If INTERIM is non-NULL, it contains the
current interim state of collected aggregate values which can be used to
-   compute values passed over self-recursive edges and to skip values which
-   clearly will not be part of intersection with INTERIM.  */
+   compute values passed over self-recursive edges (if OPTIMIZE_SELF_RECURSION
+   is true) and to skip values which clearly will not be part of intersection
+   with INTERIM.  */
 
 static void
 push_agg_values_from_edge (struct cgraph_edge *cs,
   ipa_node_params *dest_info,
   vec *res,
-  const ipa_argagg_value_list *interim)
+  const ipa_argagg_value_list *interim,
+  bool optimize_self_recursion)
 {
   ipa_edge_args *args = ipa_edge_args_sum->get (cs);
   if (!args)
@@ -5785,7 +5787,9 @@ push_agg_values_from_edge (struct cgraph_edge *cs,
   ipcp_param_lattices *plats = ipa_get_parm_lattices (dest_info, index);
   if (plats->aggs_bottom)
continue;
-  push_agg_values_for_index_from_edge (cs, index, res, interim);
+  push_agg_values_for_index_from_edge (cs, index, res,
+  optimize_self_recursion ? interim
+  : NULL);
 }
 }
 
@@ -5804,7 +5808,7 @@ find_aggregate_values_for_callers_subset (struct 
cgraph_node *node,
   /* gather_edges_for_value puts a non-recursive call into the first element of
  callers if it can.  */
   auto_vec interim;
-  push_agg_values_from_edge (callers[0], dest_info, , NULL);
+  push_agg_values_from_edge (callers[0], dest_info, , NULL, true);
 
   unsigned valid_entries = interim.length ();
   if (!valid_entries)
@@ -5815,7 +5819,7 @@ find_aggregate_values_for_callers_subset (struct 
cgraph_node *node,
 {
   auto_vec last;
   ipa_argagg_value_list avs ();
-  push_agg_values_from_edge (callers[i], dest_info, , );
+  push_agg_values_from_edge (callers[i], dest_info, , , true);
 
   valid_entries = intersect_argaggs_with (interim, last);
   if (!valid_entries)
@@ -5882,7 +5886,7 @@ cgraph_edge_brings_all_agg_vals_for_node (struct 
cgraph_edge *cs,
   ipa_node_params *dest_info = ipa_node_params_sum->get (node);
   gcc_checking_assert (dest_info->ipcp_orig_node);
   dest_info = ipa_node_params_sum->get (dest_info->ipcp_orig_node);
-  push_agg_values_from_edge (cs, dest_info, _values, );
+  push_agg_values_from_edge (cs, dest_info, _values, , false);
   const ipa_argagg_value_list avl (_values);
   return avl.superset_of_p (existing);
 }
diff --git a/gcc/testsuite/g++.dg/ipa/pr107661.C 
b/gcc/testsuite/g++.dg/ipa/pr107661.C
new file mode 100644
index 000..cc6f8538dbf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr107661.C
@@ -0,0 +1,45 @@
+/* { dg-do run  { target c++11 } } */
+/* { dg-options "-O1 -fipa-cp -fipa-cp-clone" } */
+
+struct R {} RGood;
+struct L {} LBad;
+
+volatile int vi;
+static void __attribute__((noipa)) L_run(void) { vi = 0; __builtin_abort (); }
+static void callback_fn_L(void) { vi = 1; L_run(); }
+static void callback_fn_R(void) { vi = 2; }
+
+struct 

Re: [PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

2022-11-22 Thread Qing Zhao via Gcc-patches



> On Nov 22, 2022, at 3:16 AM, Richard Biener  wrote:
> 
> On Mon, 21 Nov 2022, Qing Zhao wrote:
> 
>> 
>> 
>>> On Nov 18, 2022, at 11:31 AM, Kees Cook  wrote:
>>> 
>>> On Fri, Nov 18, 2022 at 03:19:07PM +, Qing Zhao wrote:
 Hi, Richard,
 
 Honestly, it?s very hard for me to decide what?s the best way to handle 
 the interaction 
 between -fstrict-flex-array=M and -Warray-bounds=N. 
 
 Ideally,  -fstrict-flex-array=M should completely control the behavior of 
 -Warray-bounds.
 If possible, I prefer this solution.
 
 However, -Warray-bounds is included in -Wall, and has been used 
 extensively for a long time.
 It?s not safe to change its default behavior. 
>>> 
>>> I prefer that -fstrict-flex-arrays controls -Warray-bounds. That
>>> it is in -Wall is _good_ for this reason. :) No one is going to add
>>> -fstrict-flex-arrays (at any level) without understanding what it does
>>> and wanting those effects on -Warray-bounds.
>> 
>> 
>> The major difficulties to let -fstrict-flex-arrays controlling 
>> -Warray-bounds was discussed in the following threads:
>> 
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604133.html
>> 
>> Please take a look at the discussion and let me know your opinion.
> 
> My opinion is now, after re-considering and with seeing your new 
> patch, that -Warray-bounds=2 should be changed to only add
> "the intermediate results of pointer arithmetic that may yield out of 
> bounds values" and that what it considers a flex array should now
> be controlled by -fstrict-flex-arrays only.
> 
> That is, I think, the only thing that's not confusing to users even
> if that implies a change from previous behavior that we should
> document by clarifying the -Warray-bounds documentation as well as
> by adding an entry to the Caveats section of gcc-13/changes.html
> 
> That also means that =2 will get _less_ warnings with GCC 13 when
> the user doesn't use -fstrict-flex-arrays as well.

Okay.  So, this is for -Warray-bounds=2.

For -Warray-bounds=1 -fstrict-flex-array=N, if N > 1, should 
-fstrict-flex-array=N control -Warray-bounds=1?

Qing

> 
> Richard.
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)



Re: [PATCH v4] LoongArch: Optimize immediate load.

2022-11-22 Thread Xi Ruoyao via Gcc-patches
While I still can't fully understand the immediate load issue and how
this patch fix it, I've tested this patch (alongside the prefetch
instruction patch) with bootstrap-ubsan.  And the compiled result of
imm-load1.c seems OK.

On Thu, 2022-11-17 at 17:59 +0800, Lulu Cheng wrote:
> v1 -> v2:
> 1. Change the code format.
> 2. Fix bugs in the code.
> 
> v2 -> v3:
> Modifying a code implementation of an undefined behavior.
> 
> v3 -> v4:
> Move the part of the immediate number decomposition from expand pass
> to split
> pass.
> 
> Both regression tests and spec2006 passed.
> 
> The problem mentioned in the link does not move the four immediate
> load
> instructions out of the loop. It has been optimized. Now, as in the
> test case,
> four immediate load instructions are generated outside the loop.
> (
> https://sourceware.org/pipermail/libc-alpha/2022-September/142202.html)
> 
> 
> Because loop2_invariant pass will extract the instructions that do not
> change
> in the loop out of the loop, some instructions will not meet the
> extraction
> conditions if the machine performs immediate decomposition while
> expand pass,
> so the immediate decomposition will be transferred to the split
> process.
> 
> gcc/ChangeLog:
> 
> * config/loongarch/loongarch.cc (enum
> loongarch_load_imm_method):
> Remove the member METHOD_INSV that is not currently used.
> (struct loongarch_integer_op): Define a new member curr_value,
> that records the value of the number stored in the destination
> register immediately after the current instruction has run.
> (loongarch_build_integer): Assign a value to the curr_value
> member variable.
> (loongarch_move_integer): Adds information for the immediate
> load instruction.
> * config/loongarch/loongarch.md (*movdi_32bit): Redefine as
> define_insn_and_split.
> (*movdi_64bit): Likewise.
> (*movsi_internal): Likewise.
> (*movhi_internal): Likewise.
> * config/loongarch/predicates.md: Return true as long as it is
> CONST_INT, ensure
> that the immediate number is not optimized by decomposition
> during expand
> optimization loop.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/loongarch/imm-load.c: New test.
> * gcc.target/loongarch/imm-load1.c: New test.
> ---
>  gcc/config/loongarch/loongarch.cc | 62 ++
> -
>  gcc/config/loongarch/loongarch.md | 44 +++--
>  gcc/config/loongarch/predicates.md    |  2 +-
>  gcc/testsuite/gcc.target/loongarch/imm-load.c | 10 +++
>  .../gcc.target/loongarch/imm-load1.c  | 26 
>  5 files changed, 110 insertions(+), 34 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load1.c
> 
> diff --git a/gcc/config/loongarch/loongarch.cc
> b/gcc/config/loongarch/loongarch.cc
> index 8ee32c90573..9e0d6c7c3ea 100644
> --- a/gcc/config/loongarch/loongarch.cc
> +++ b/gcc/config/loongarch/loongarch.cc
> @@ -139,22 +139,21 @@ struct loongarch_address_info
>  
>     METHOD_LU52I:
>   Load 52-63 bit of the immediate number.
> -
> -   METHOD_INSV:
> - immediate like 0xfff0fxxx
> -   */
> +*/
>  enum loongarch_load_imm_method
>  {
>    METHOD_NORMAL,
>    METHOD_LU32I,
> -  METHOD_LU52I,
> -  METHOD_INSV
> +  METHOD_LU52I
>  };
>  
>  struct loongarch_integer_op
>  {
>    enum rtx_code code;
>    HOST_WIDE_INT value;
> +  /* Represent the result of the immediate count of the load
> instruction at
> + each step.  */
> +  HOST_WIDE_INT curr_value;
>    enum loongarch_load_imm_method method;
>  };
>  
> @@ -1475,24 +1474,27 @@ loongarch_build_integer (struct
> loongarch_integer_op *codes,
>  {
>    /* The value of the lower 32 bit be loaded with one
> instruction.
>  lu12i.w.  */
> -  codes[0].code = UNKNOWN;
> -  codes[0].method = METHOD_NORMAL;
> -  codes[0].value = low_part;
> +  codes[cost].code = UNKNOWN;
> +  codes[cost].method = METHOD_NORMAL;
> +  codes[cost].value = low_part;
> +  codes[cost].curr_value = low_part;
>    cost++;
>  }
>    else
>  {
>    /* lu12i.w + ior.  */
> -  codes[0].code = UNKNOWN;
> -  codes[0].method = METHOD_NORMAL;
> -  codes[0].value = low_part & ~(IMM_REACH - 1);
> +  codes[cost].code = UNKNOWN;
> +  codes[cost].method = METHOD_NORMAL;
> +  codes[cost].value = low_part & ~(IMM_REACH - 1);
> +  codes[cost].curr_value = codes[cost].value;
>    cost++;
>    HOST_WIDE_INT iorv = low_part & (IMM_REACH - 1);
>    if (iorv != 0)
> {
> - codes[1].code = IOR;
> - codes[1].method = METHOD_NORMAL;
> - codes[1].value = iorv;
> + codes[cost].code = IOR;
> + codes[cost].method = METHOD_NORMAL;
> + codes[cost].value = iorv;
> + 

Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Richard Earnshaw via Gcc-patches




On 22/11/2022 13:09, Christophe Lyon wrote:



On 11/22/22 12:33, Richard Earnshaw wrote:



On 22/11/2022 11:21, Richard Sandiford wrote:

Richard Earnshaw via Gcc-patches  writes:

On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:
gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on 
big-endian, because the _Decimal32 on-stack argument is not

padded in the same direction depending on endianness.

This patch fixes the testcase so that it expects the argument
in the right stack location, similarly to what other tests do
in the same directory.

gcc/testsuite/ChangeLog:

PR target/107604 * gcc.target/aarch64/aapcs64/test_dfp_17.c:
Fix for big-endian. --- 
gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4

 1 file changed, 4 insertions(+)

diff --git
a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c index

22dc462bf7c..3c45f715cf7 100644 ---
a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c +++
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c @@
-32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd
}; ANON(struct z, a, D1) ANON(struct z, b, STACK) ANON(int , 5,
W0) +#ifndef __AAPCS64_BIG_ENDIAN__ ANON(_Decimal32, f1,
STACK+32) /* Note: no promotion to _Decimal64.  */ +#else +
ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to 
_Decimal64.  */ +#endif LAST_ANON(_Decimal64, 0.5dd, STACK+40) #endif


Why would a Decimal32 change stack placement based on the
endianness? Isn't it a 4-byte object?


Yes, but PARM_BOUNDARY (64) sets a minimum alignment for all stack
 arguments.

Richard


Ah, OK.
Indeed, it was not immediately obvious to me either, when looking at 
aarch64_layout_arg. aarch64_function_arg_padding comes into play, too.




I wonder if we should have a new macro in the tests, something like 
ANON_PADDED to describe this case and that works things out more 
automagically for big-endian.

Maybe. There are many other tests under aapcs64/ which have a similar
#ifndef __AAPCS64_BIG_ENDIAN__



Yes, it could be used to clean all those up as well.




I notice the new ANON definition is not correctly indented.

It looks OK on my side (2 spaces).


Never mind then, it must be a quirk of how the diff is displayed.


Thanks,

Christophe



R.


Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-22 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, November 15, 2022 11:34 AM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> ; nd ; Marcus Shawcroft
>> 
>> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> 
>> Tamar Christina  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford 
>> >> Sent: Tuesday, November 15, 2022 11:15 AM
>> >> To: Tamar Christina 
>> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> >> ; nd ; Marcus Shawcroft
>> >> 
>> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> >>
>> >> Tamar Christina  writes:
>> >> >> -Original Message-
>> >> >> From: Richard Sandiford 
>> >> >> Sent: Tuesday, November 15, 2022 10:51 AM
>> >> >> To: Tamar Christina 
>> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> >> >> ; nd ; Marcus
>> Shawcroft
>> >> >> 
>> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> >> >>
>> >> >> Tamar Christina  writes:
>> >> >> >> -Original Message-
>> >> >> >> From: Richard Sandiford 
>> >> >> >> Sent: Tuesday, November 15, 2022 10:36 AM
>> >> >> >> To: Tamar Christina 
>> >> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> >> >> >> ; nd ; Marcus
>> >> Shawcroft
>> >> >> >> 
>> >> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> >> >> >>
>> >> >> >> Tamar Christina  writes:
>> >> >> >> > Hello,
>> >> >> >> >
>> >> >> >> > Ping and updated patch.
>> >> >> >> >
>> >> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no
>> issues.
>> >> >> >> >
>> >> >> >> > Ok for master?
>> >> >> >> >
>> >> >> >> > Thanks,
>> >> >> >> > Tamar
>> >> >> >> >
>> >> >> >> > gcc/ChangeLog:
>> >> >> >> >
>> >> >> >> > * config/aarch64/aarch64.md (*tb1):
>> >> >> >> > Rename
>> >> to...
>> >> >> >> > (*tb1): ... this.
>> >> >> >> > (tbranch4): New.
>> >> >> >> >
>> >> >> >> > gcc/testsuite/ChangeLog:
>> >> >> >> >
>> >> >> >> > * gcc.target/aarch64/tbz_1.c: New test.
>> >> >> >> >
>> >> >> >> > --- inline copy of patch ---
>> >> >> >> >
>> >> >> >> > diff --git a/gcc/config/aarch64/aarch64.md
>> >> >> >> > b/gcc/config/aarch64/aarch64.md index
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
>> >> >> >> 71
>> >> >> >> > 2bde55c7c72e 100644
>> >> >> >> > --- a/gcc/config/aarch64/aarch64.md
>> >> >> >> > +++ b/gcc/config/aarch64/aarch64.md
>> >> >> >> > @@ -943,12 +943,29 @@ (define_insn "*cb1"
>> >> >> >> >   (const_int 1)))]
>> >> >> >> >  )
>> >> >> >> >
>> >> >> >> > -(define_insn "*tb1"
>> >> >> >> > +(define_expand "tbranch4"
>> >> >> >> >[(set (pc) (if_then_else
>> >> >> >> > - (EQL (zero_extract:DI (match_operand:GPI 0
>> >> "register_operand"
>> >> >> >> "r")
>> >> >> >> > -   (const_int 1)
>> >> >> >> > -   (match_operand 1
>> >> >> >> > - 
>> >> >> >> > "aarch64_simd_shift_imm_" "n"))
>> >> >> >> > +   (match_operator 0 "aarch64_comparison_operator"
>> >> >> >> > +[(match_operand:ALLI 1 "register_operand")
>> >> >> >> > + (match_operand:ALLI 2
>> >> >> >> "aarch64_simd_shift_imm_")])
>> >> >> >> > +   (label_ref (match_operand 3 "" ""))
>> >> >> >> > +   (pc)))]
>> >> >> >> > +  "optimize > 0"
>> >> >> >>
>> >> >> >> Why's the pattern conditional on optimize?  Seems a valid
>> >> >> >> choice at -O0
>> >> >> too.
>> >> >> >>
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > I had explained the reason why in the original patch, just
>> >> >> > didn't repeat it in
>> >> >> the ping:
>> >> >> >
>> >> >> > Instead of emitting the instruction directly I've chosen to
>> >> >> > expand the pattern using a zero extract and generating the
>> >> >> > existing pattern for comparisons for two
>> >> >> > reasons:
>> >> >> >
>> >> >> >   1. Allows for CSE of the actual comparison.
>> >> >> >   2. It looks like the code in expand makes the label as unused
>> >> >> > and removed
>> >> >> it
>> >> >> >  if it doesn't see a separate reference to it.
>> >> >> >
>> >> >> > Because of this expansion though I disable the pattern at -O0
>> >> >> > since we
>> >> >> have no combine in that case so we'd end up with worse code.  I
>> >> >> did try emitting the pattern directly, but as mentioned in no#2
>> >> >> expand would then kill the label.
>> >> >> >
>> >> >> > Basically I emit the pattern directly, immediately during expand
>> >> >> > the label is
>> >> >> marked as dead for some weird reason.
>> >> >>
>> >> >> Isn't #2 a bug though?  It seems like something we should fix
>> >> >> rather than work around.
>> >> >
>> >> > Yes it's a bug ☹ ok if I'm going to fix that bug then do I need to
>> >> > split the optabs still? Isn't the problem atm that I need the split?
>> >> > If I'm emitting the instruction directly then 

[PATCH] Remove follow_assert_exprs from overflow_comparison.

2022-11-22 Thread Aldy Hernandez via Gcc-patches
OK pending tests?

gcc/ChangeLog:

* tree-vrp.cc (overflow_comparison_p_1): Remove follow_assert_exprs.
(overflow_comparison_p): Remove use_equiv_p.
* tree-vrp.h (overflow_comparison_p): Same.
* vr-values.cc (vrp_evaluate_conditional_warnv_with_ops): Remove
use_equiv_p argument to overflow_comparison_p.
---
 gcc/tree-vrp.cc  | 40 
 gcc/tree-vrp.h   |  2 +-
 gcc/vr-values.cc |  2 +-
 3 files changed, 6 insertions(+), 38 deletions(-)

diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index d29941d0f2d..3846dc1d849 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -679,7 +679,7 @@ range_fold_unary_expr (value_range *vr,
 
 static bool
 overflow_comparison_p_1 (enum tree_code code, tree op0, tree op1,
-bool follow_assert_exprs, bool reversed, tree *new_cst)
+bool reversed, tree *new_cst)
 {
   /* See if this is a relational operation between two SSA_NAMES with
  unsigned, overflow wrapping values.  If so, check it more deeply.  */
@@ -693,19 +693,6 @@ overflow_comparison_p_1 (enum tree_code code, tree op0, 
tree op1,
 {
   gimple *op1_def = SSA_NAME_DEF_STMT (op1);
 
-  /* If requested, follow any ASSERT_EXPRs backwards for OP1.  */
-  if (follow_assert_exprs)
-   {
- while (gimple_assign_single_p (op1_def)
-&& TREE_CODE (gimple_assign_rhs1 (op1_def)) == ASSERT_EXPR)
-   {
- op1 = TREE_OPERAND (gimple_assign_rhs1 (op1_def), 0);
- if (TREE_CODE (op1) != SSA_NAME)
-   break;
- op1_def = SSA_NAME_DEF_STMT (op1);
-   }
-   }
-
   /* Now look at the defining statement of OP1 to see if it adds
 or subtracts a nonzero constant from another operand.  */
   if (op1_def
@@ -716,24 +703,6 @@ overflow_comparison_p_1 (enum tree_code code, tree op0, 
tree op1,
{
  tree target = gimple_assign_rhs1 (op1_def);
 
- /* If requested, follow ASSERT_EXPRs backwards for op0 looking
-for one where TARGET appears on the RHS.  */
- if (follow_assert_exprs)
-   {
- /* Now see if that "other operand" is op0, following the chain
-of ASSERT_EXPRs if necessary.  */
- gimple *op0_def = SSA_NAME_DEF_STMT (op0);
- while (op0 != target
-&& gimple_assign_single_p (op0_def)
-&& TREE_CODE (gimple_assign_rhs1 (op0_def)) == ASSERT_EXPR)
-   {
- op0 = TREE_OPERAND (gimple_assign_rhs1 (op0_def), 0);
- if (TREE_CODE (op0) != SSA_NAME)
-   break;
- op0_def = SSA_NAME_DEF_STMT (op0);
-   }
-   }
-
  /* If we did not find our target SSA_NAME, then this is not
 an overflow test.  */
  if (op0 != target)
@@ -764,13 +733,12 @@ overflow_comparison_p_1 (enum tree_code code, tree op0, 
tree op1,
the alternate range representation is often useful within VRP.  */
 
 bool
-overflow_comparison_p (tree_code code, tree name, tree val,
-  bool use_equiv_p, tree *new_cst)
+overflow_comparison_p (tree_code code, tree name, tree val, tree *new_cst)
 {
-  if (overflow_comparison_p_1 (code, name, val, use_equiv_p, false, new_cst))
+  if (overflow_comparison_p_1 (code, name, val, false, new_cst))
 return true;
   return overflow_comparison_p_1 (swap_tree_comparison (code), val, name,
- use_equiv_p, true, new_cst);
+ true, new_cst);
 }
 
 /* Handle
diff --git a/gcc/tree-vrp.h b/gcc/tree-vrp.h
index 07630b5b1ca..127909604f0 100644
--- a/gcc/tree-vrp.h
+++ b/gcc/tree-vrp.h
@@ -39,7 +39,7 @@ extern enum value_range_kind intersect_range_with_nonzero_bits
 extern bool find_case_label_range (gswitch *, tree, tree, size_t *, size_t *);
 extern tree find_case_label_range (gswitch *, const irange *vr);
 extern bool find_case_label_index (gswitch *, size_t, tree, size_t *);
-extern bool overflow_comparison_p (tree_code, tree, tree, bool, tree *);
+extern bool overflow_comparison_p (tree_code, tree, tree, tree *);
 extern void maybe_set_nonzero_bits (edge, tree);
 
 #endif /* GCC_TREE_VRP_H */
diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index 0347c29b216..b0dd30260ae 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -837,7 +837,7 @@ 
simplify_using_ranges::vrp_evaluate_conditional_warnv_with_ops
  occurs when the chosen argument is zero and does not occur if the
  chosen argument is not zero.  */
   tree x;
-  if (overflow_comparison_p (code, op0, op1, use_equiv_p, ))
+  if (overflow_comparison_p (code, op0, op1, ))
 {
   wide_int max = wi::max_value (TYPE_PRECISION (TREE_TYPE (op0)), 
UNSIGNED);
   /* B = A - 1; if (A < B) -> B = A - 1; if (A == 0)
-- 
2.38.1



[PATCH] Remove use_equiv_p in vr-values.cc

2022-11-22 Thread Aldy Hernandez via Gcc-patches
With no equivalences, the use_equiv_p argument in various methods in
simplify_using_ranges is always false.  This means we can remove all
calls to compare_names, along with the function.

OK pending tests?

gcc/ChangeLog:

* vr-values.cc (simplify_using_ranges::compare_names): Remove.
(vrp_evaluate_conditional_warnv_with_ops): Remove call to
compare_names.
(simplify_using_ranges::vrp_visit_cond_stmt): Remove use_equiv_p
argument to vrp_evaluate_conditional_warnv_with_ops.
* vr-values.h (class simplify_using_ranges): Remove
compare_names.
Remove use_equiv_p to vrp_evaluate_conditional_warnv_with_ops.
---
 gcc/vr-values.cc | 127 +--
 gcc/vr-values.h  |   4 +-
 2 files changed, 3 insertions(+), 128 deletions(-)

diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index b0dd30260ae..1dbd9e47085 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -667,124 +667,6 @@ simplify_using_ranges::compare_name_with_value
   return retval;
 }
 
-/* Given a comparison code COMP and names N1 and N2, compare all the
-   ranges equivalent to N1 against all the ranges equivalent to N2
-   to determine the value of N1 COMP N2.  Return the same value
-   returned by compare_ranges.  Set *STRICT_OVERFLOW_P to indicate
-   whether we relied on undefined signed overflow in the comparison.  */
-
-
-tree
-simplify_using_ranges::compare_names (enum tree_code comp, tree n1, tree n2,
- bool *strict_overflow_p, gimple *s)
-{
-  /* ?? These bitmaps are NULL as there are no longer any equivalences
- available in the value_range*.  */
-  bitmap e1 = NULL;
-  bitmap e2 = NULL;
-
-  /* Use the fake bitmaps if e1 or e2 are not available.  */
-  static bitmap s_e1 = NULL, s_e2 = NULL;
-  static bitmap_obstack *s_obstack = NULL;
-  if (s_obstack == NULL)
-{
-  s_obstack = XNEW (bitmap_obstack);
-  bitmap_obstack_initialize (s_obstack);
-  s_e1 = BITMAP_ALLOC (s_obstack);
-  s_e2 = BITMAP_ALLOC (s_obstack);
-}
-  if (e1 == NULL)
-e1 = s_e1;
-  if (e2 == NULL)
-e2 = s_e2;
-
-  /* Add N1 and N2 to their own set of equivalences to avoid
- duplicating the body of the loop just to check N1 and N2
- ranges.  */
-  bitmap_set_bit (e1, SSA_NAME_VERSION (n1));
-  bitmap_set_bit (e2, SSA_NAME_VERSION (n2));
-
-  /* If the equivalence sets have a common intersection, then the two
- names can be compared without checking their ranges.  */
-  if (bitmap_intersect_p (e1, e2))
-{
-  bitmap_clear_bit (e1, SSA_NAME_VERSION (n1));
-  bitmap_clear_bit (e2, SSA_NAME_VERSION (n2));
-
-  return (comp == EQ_EXPR || comp == GE_EXPR || comp == LE_EXPR)
-? boolean_true_node
-: boolean_false_node;
-}
-
-  /* Start at -1.  Set it to 0 if we do a comparison without relying
- on overflow, or 1 if all comparisons rely on overflow.  */
-  int used_strict_overflow = -1;
-
-  /* Otherwise, compare all the equivalent ranges.  First, add N1 and
- N2 to their own set of equivalences to avoid duplicating the body
- of the loop just to check N1 and N2 ranges.  */
-  bitmap_iterator bi1;
-  unsigned i1;
-  EXECUTE_IF_SET_IN_BITMAP (e1, 0, i1, bi1)
-{
-  if (!ssa_name (i1))
-   continue;
-
-  value_range tem_vr1;
-  const value_range *vr1 = get_vr_for_comparison (i1, _vr1, s);
-
-  tree t = NULL_TREE, retval = NULL_TREE;
-  bitmap_iterator bi2;
-  unsigned i2;
-  EXECUTE_IF_SET_IN_BITMAP (e2, 0, i2, bi2)
-   {
- if (!ssa_name (i2))
-   continue;
-
- bool sop = false;
-
- value_range tem_vr2;
- const value_range *vr2 = get_vr_for_comparison (i2, _vr2, s);
-
- t = compare_ranges (comp, vr1, vr2, );
- if (t)
-   {
- /* If we get different answers from different members
-of the equivalence set this check must be in a dead
-code region.  Folding it to a trap representation
-would be correct here.  For now just return don't-know.  */
- if (retval != NULL && t != retval)
-   {
- bitmap_clear_bit (e1, SSA_NAME_VERSION (n1));
- bitmap_clear_bit (e2, SSA_NAME_VERSION (n2));
- return NULL_TREE;
-   }
- retval = t;
-
- if (!sop)
-   used_strict_overflow = 0;
- else if (used_strict_overflow < 0)
-   used_strict_overflow = 1;
-   }
-   }
-
-  if (retval)
-   {
- bitmap_clear_bit (e1, SSA_NAME_VERSION (n1));
- bitmap_clear_bit (e2, SSA_NAME_VERSION (n2));
- if (used_strict_overflow > 0)
-   *strict_overflow_p = true;
- return retval;
-   }
-}
-
-  /* None of the equivalent ranges are useful in computing this
- comparison.  */
-  bitmap_clear_bit (e1, SSA_NAME_VERSION (n1));
-  

[PATCH] Remove ASSERT_EXPR.

2022-11-22 Thread Aldy Hernandez via Gcc-patches
This removes all uses of ASSERT_EXPR except the internal one in ipa-*.

OK pending tests?

gcc/ChangeLog:

* doc/gimple.texi: Remove ASSERT_EXPR references.
* fold-const.cc (tree_expr_nonzero_warnv_p): Same.
(fold_binary_loc): Same.
(tree_expr_nonnegative_warnv_p): Same.
* gimple-array-bounds.cc (get_base_decl): Same.
* gimple-pretty-print.cc (dump_unary_rhs): Same.
* gimple.cc (get_gimple_rhs_num_ops): Same.
* pointer-query.cc (handle_ssa_name): Same.
* tree-cfg.cc (verify_gimple_assign_single): Same.
* tree-pretty-print.cc (dump_generic_node): Same.
* tree-scalar-evolution.cc (scev_dfs::follow_ssa_edge_expr):Same.
(interpret_rhs_expr): Same.
* tree-ssa-operands.cc (operands_scanner::get_expr_operands): Same.
* tree-ssa-propagate.cc
(substitute_and_fold_dom_walker::before_dom_children): Same.
* tree-ssa-threadedge.cc: Same.
* tree-vrp.cc (overflow_comparison_p): Same.
* tree.def (ASSERT_EXPR): Add note.
* tree.h (ASSERT_EXPR_VAR): Remove.
(ASSERT_EXPR_COND): Remove.
* vr-values.cc (simplify_using_ranges::vrp_visit_cond_stmt):
Remove comment.
---
 gcc/doc/gimple.texi  |  3 +--
 gcc/fold-const.cc|  6 -
 gcc/gimple-array-bounds.cc   |  9 +---
 gcc/gimple-pretty-print.cc   |  1 -
 gcc/gimple.cc|  1 -
 gcc/pointer-query.cc |  6 -
 gcc/tree-cfg.cc  | 11 -
 gcc/tree-pretty-print.cc |  8 ---
 gcc/tree-scalar-evolution.cc | 15 -
 gcc/tree-ssa-operands.cc |  1 -
 gcc/tree-ssa-propagate.cc|  5 +
 gcc/tree-ssa-threadedge.cc   |  6 ++---
 gcc/tree-vrp.cc  |  7 +++---
 gcc/tree.def |  5 -
 gcc/tree.h   |  4 
 gcc/vr-values.cc | 43 
 16 files changed, 13 insertions(+), 118 deletions(-)

diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 7832fa6ff90..a4263922887 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -682,8 +682,7 @@ more than two slots on the RHS.  For instance, a 
@code{COND_EXPR}
 expression of the form @code{(a op b) ? x : y} could be flattened
 out on the operand vector using 4 slots, but it would also
 require additional processing to distinguish @code{c = a op b}
-from @code{c = a op b ? x : y}.  Something similar occurs with
-@code{ASSERT_EXPR}.   In time, these special case tree
+from @code{c = a op b ? x : y}.  In time, these special case tree
 expressions should be flattened into the operand vector.
 @end itemize
 
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index b89cac91cae..114258fa182 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10751,7 +10751,6 @@ tree_expr_nonzero_warnv_p (tree t, bool 
*strict_overflow_p)
 case COND_EXPR:
 case CONSTRUCTOR:
 case OBJ_TYPE_REF:
-case ASSERT_EXPR:
 case ADDR_EXPR:
 case WITH_SIZE_EXPR:
 case SSA_NAME:
@@ -12618,10 +12617,6 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
 : fold_convert_loc (loc, type, arg1);
   return tem;
 
-case ASSERT_EXPR:
-  /* An ASSERT_EXPR should never be passed to fold_binary.  */
-  gcc_unreachable ();
-
 default:
   return NULL_TREE;
 } /* switch (code) */
@@ -15117,7 +15112,6 @@ tree_expr_nonnegative_warnv_p (tree t, bool 
*strict_overflow_p, int depth)
 case COND_EXPR:
 case CONSTRUCTOR:
 case OBJ_TYPE_REF:
-case ASSERT_EXPR:
 case ADDR_EXPR:
 case WITH_SIZE_EXPR:
 case SSA_NAME:
diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index 1eafd3fd3e1..eae49ab3910 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -75,14 +75,7 @@ get_base_decl (tree ref)
   if (gimple_assign_single_p (def))
{
  base = gimple_assign_rhs1 (def);
- if (TREE_CODE (base) != ASSERT_EXPR)
-   return base;
-
- base = TREE_OPERAND (base, 0);
- if (TREE_CODE (base) != SSA_NAME)
-   return base;
-
- continue;
+ return base;
}
 
   if (!gimple_nop_p (def))
diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc
index 7ec079f15c6..af704257633 100644
--- a/gcc/gimple-pretty-print.cc
+++ b/gcc/gimple-pretty-print.cc
@@ -339,7 +339,6 @@ dump_unary_rhs (pretty_printer *buffer, const gassign *gs, 
int spc,
   switch (rhs_code)
 {
 case VIEW_CONVERT_EXPR:
-case ASSERT_EXPR:
   dump_generic_node (buffer, rhs, spc, flags, false);
   break;
 
diff --git a/gcc/gimple.cc b/gcc/gimple.cc
index 6c23dd77609..dd054e16453 100644
--- a/gcc/gimple.cc
+++ b/gcc/gimple.cc
@@ -2408,7 +2408,6 @@ get_gimple_rhs_num_ops (enum tree_code code)
   || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS
\
: ((SYM) == CONSTRUCTOR 

RE: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-22 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, November 15, 2022 11:34 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> ; nd ; Marcus Shawcroft
> 
> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Tuesday, November 15, 2022 11:15 AM
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> ; nd ; Marcus Shawcroft
> >> 
> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >>
> >> Tamar Christina  writes:
> >> >> -Original Message-
> >> >> From: Richard Sandiford 
> >> >> Sent: Tuesday, November 15, 2022 10:51 AM
> >> >> To: Tamar Christina 
> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> >> ; nd ; Marcus
> Shawcroft
> >> >> 
> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >>
> >> >> Tamar Christina  writes:
> >> >> >> -Original Message-
> >> >> >> From: Richard Sandiford 
> >> >> >> Sent: Tuesday, November 15, 2022 10:36 AM
> >> >> >> To: Tamar Christina 
> >> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> >> >> ; nd ; Marcus
> >> Shawcroft
> >> >> >> 
> >> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >> >>
> >> >> >> Tamar Christina  writes:
> >> >> >> > Hello,
> >> >> >> >
> >> >> >> > Ping and updated patch.
> >> >> >> >
> >> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no
> issues.
> >> >> >> >
> >> >> >> > Ok for master?
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Tamar
> >> >> >> >
> >> >> >> > gcc/ChangeLog:
> >> >> >> >
> >> >> >> > * config/aarch64/aarch64.md (*tb1):
> >> >> >> > Rename
> >> to...
> >> >> >> > (*tb1): ... this.
> >> >> >> > (tbranch4): New.
> >> >> >> >
> >> >> >> > gcc/testsuite/ChangeLog:
> >> >> >> >
> >> >> >> > * gcc.target/aarch64/tbz_1.c: New test.
> >> >> >> >
> >> >> >> > --- inline copy of patch ---
> >> >> >> >
> >> >> >> > diff --git a/gcc/config/aarch64/aarch64.md
> >> >> >> > b/gcc/config/aarch64/aarch64.md index
> >> >> >> >
> >> >> >>
> >> >>
> >>
> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
> >> >> >> 71
> >> >> >> > 2bde55c7c72e 100644
> >> >> >> > --- a/gcc/config/aarch64/aarch64.md
> >> >> >> > +++ b/gcc/config/aarch64/aarch64.md
> >> >> >> > @@ -943,12 +943,29 @@ (define_insn "*cb1"
> >> >> >> >   (const_int 1)))]
> >> >> >> >  )
> >> >> >> >
> >> >> >> > -(define_insn "*tb1"
> >> >> >> > +(define_expand "tbranch4"
> >> >> >> >[(set (pc) (if_then_else
> >> >> >> > - (EQL (zero_extract:DI (match_operand:GPI 0
> >> "register_operand"
> >> >> >> "r")
> >> >> >> > -   (const_int 1)
> >> >> >> > -   (match_operand 1
> >> >> >> > - 
> >> >> >> > "aarch64_simd_shift_imm_" "n"))
> >> >> >> > +   (match_operator 0 "aarch64_comparison_operator"
> >> >> >> > +[(match_operand:ALLI 1 "register_operand")
> >> >> >> > + (match_operand:ALLI 2
> >> >> >> "aarch64_simd_shift_imm_")])
> >> >> >> > +   (label_ref (match_operand 3 "" ""))
> >> >> >> > +   (pc)))]
> >> >> >> > +  "optimize > 0"
> >> >> >>
> >> >> >> Why's the pattern conditional on optimize?  Seems a valid
> >> >> >> choice at -O0
> >> >> too.
> >> >> >>
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > I had explained the reason why in the original patch, just
> >> >> > didn't repeat it in
> >> >> the ping:
> >> >> >
> >> >> > Instead of emitting the instruction directly I've chosen to
> >> >> > expand the pattern using a zero extract and generating the
> >> >> > existing pattern for comparisons for two
> >> >> > reasons:
> >> >> >
> >> >> >   1. Allows for CSE of the actual comparison.
> >> >> >   2. It looks like the code in expand makes the label as unused
> >> >> > and removed
> >> >> it
> >> >> >  if it doesn't see a separate reference to it.
> >> >> >
> >> >> > Because of this expansion though I disable the pattern at -O0
> >> >> > since we
> >> >> have no combine in that case so we'd end up with worse code.  I
> >> >> did try emitting the pattern directly, but as mentioned in no#2
> >> >> expand would then kill the label.
> >> >> >
> >> >> > Basically I emit the pattern directly, immediately during expand
> >> >> > the label is
> >> >> marked as dead for some weird reason.
> >> >>
> >> >> Isn't #2 a bug though?  It seems like something we should fix
> >> >> rather than work around.
> >> >
> >> > Yes it's a bug ☹ ok if I'm going to fix that bug then do I need to
> >> > split the optabs still? Isn't the problem atm that I need the split?
> >> > If I'm emitting the instruction directly then the recog pattern for
> >> > it can just be (eq (vec_extract x 1) 0) which is the correct semantics?
> >>
> >> What rtx does the code that uses the optab pass for 

Re: [PATCH 2/2] Fortran: add attribute target_clones

2022-11-22 Thread Mikael Morin

Le 21/11/2022 à 23:26, Bernhard Reutner-Fischer a écrit :

On Mon, 21 Nov 2022 20:13:40 +0100
Mikael Morin  wrote:


Hello,

Le 09/11/2022 à 20:02, Bernhard Reutner-Fischer via Fortran a écrit :

Hi!


(...)

+  if (allow_multiple && gfc_match_char (')') != MATCH_YES)
+{
+  gfc_error ("expected ')' at %C");
+  return NULL_TREE;
+}
+
+  return attr_args;
+}

I'm not sure this function need to do all the parsing manually.
I would rather use gfc_match_actual_arglist, or maybe implement the
function as a wrapper around it.
What is allowed here?  Are non-literal constants allowed, for example
parameter variables?  Is line continuation supported ?


Line continuation is supported i think.
Parameter variables supposedly are or should not be supported. Why would
you do that in the context of an attribute target decl? > Either way, if the ME 
does not find such an fndecl, it will complain
and ignore the attribute.
I don't understand non-literal constants in this context.
This very attribute applies to decls, so the existing code supposedly
matches a comma separated list of identifiers. The usual dollar-ok
caveats apply.

No, my comment and my questions were about your function, which, as I 
understand it, matches the arguments to the attribute: it matches open 
and closing parenthesis, double quotes, etc.

Matching of decl names comes after that.
I asked the question about non-literal constant (and the other as well), 
because I saw it as a possible reason to not reuse the existing parsing 
functions.



As to gfc_match_actual_arglist, probably.
target_clones has
+  { "target_clones",  1, -1, true, false, false, false,
+ dummy, NULL },
with tree-core.h struct attribute_spec, so
name, min=1, max=unbounded, decl_required=yes, ...ignore...

hence applies to functions and subroutines and the like. It does take an
unbounded list of strings, isa1, isa2, isa4, default. We could add
"default" unless seen, but i'd rather want it spelled out by the user
for the user is supposed to know what she's doing, as in c or c++.
The ME has code to sanity-check the attributes, including conflicting
(ME) attributes.

The reason why i contemplated with a separate parser was that for stuff
like regparm or sseregparm, you would want to require a single number
for the equivalent of

__attribute__((regparm(3),stdcall)

which you would provide in 2 separate !GCC$ attributes i assume.

Well, the check could as easily be enforced after parsing with the 
existing parsing functions.




Nothing (bad) to say about the rest, but there is enough to change with
the above comments.


Yes, many thanks for your comments.
I think there is no other non-intrusive way to pass the data through the
frontend. So for an acceptable way this means touching quite some spots
for every single ME attribute anybody would like to add in the future.


I'm not sure I understand.  Please let's just add what is necessary for 
this attribute, not more.




Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Christophe Lyon via Gcc-patches




On 11/22/22 12:33, Richard Earnshaw wrote:



On 22/11/2022 11:21, Richard Sandiford wrote:

Richard Earnshaw via Gcc-patches  writes:

On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:
gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on 
big-endian, because the _Decimal32 on-stack argument is not

padded in the same direction depending on endianness.

This patch fixes the testcase so that it expects the argument
in the right stack location, similarly to what other tests do
in the same directory.

gcc/testsuite/ChangeLog:

PR target/107604 * gcc.target/aarch64/aapcs64/test_dfp_17.c:
Fix for big-endian. --- 
gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4

 1 file changed, 4 insertions(+)

diff --git
a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c index

22dc462bf7c..3c45f715cf7 100644 ---
a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c +++
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c @@
-32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd
}; ANON(struct z, a, D1) ANON(struct z, b, STACK) ANON(int , 5,
W0) +#ifndef __AAPCS64_BIG_ENDIAN__ ANON(_Decimal32, f1,
STACK+32) /* Note: no promotion to _Decimal64.  */ +#else +
ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to 
_Decimal64.  */ +#endif LAST_ANON(_Decimal64, 0.5dd, STACK+40) 
#endif


Why would a Decimal32 change stack placement based on the
endianness? Isn't it a 4-byte object?


Yes, but PARM_BOUNDARY (64) sets a minimum alignment for all stack
 arguments.

Richard


Ah, OK.
Indeed, it was not immediately obvious to me either, when looking at 
aarch64_layout_arg. aarch64_function_arg_padding comes into play, too.




I wonder if we should have a new macro in the tests, something like 
ANON_PADDED to describe this case and that works things out more 
automagically for big-endian.

Maybe. There are many other tests under aapcs64/ which have a similar
#ifndef __AAPCS64_BIG_ENDIAN__



I notice the new ANON definition is not correctly indented.

It looks OK on my side (2 spaces).

Thanks,

Christophe



R.


Re: [PATCH] d: respect --enable-link-mutex configure option

2022-11-22 Thread Iain Buclaw via Gcc-patches
Excerpts from Martin Liška's message of November 22, 2022 10:41 am:
> I noticed the option is ignored because @DO_LINK_MUTEX@
> is not defined in d/Make-lang.in.
> 
> Tested locally before and after the patch.
> 
> Ready to be installed?
> Thanks,
> Martin
> 

Fine on my end.  Thanks!

Iain.


[COMMITTED] ada: Accept aspects Global and Depends on abstract subprograms

2022-11-22 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

Aspects Global and Depends are now allowed on abstract subprograms
(as substitutes for Global'Class and Depends'Class).

This patch implements the recently modified rules SPARK RM 6.1.2(2-3).
The behavior for Contract_Cases and aspects on null subprograms stays
as it was.

gcc/ada/

* sem_prag.adb (Analyze_Depends_Global): Accept aspects on
abstract subprograms.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_prag.adb | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index f2c1a3f0e6e..0a91518cff9 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -4549,6 +4549,11 @@ package body Sem_Prag is
  elsif Nkind (Subp_Decl) = N_Single_Task_Declaration then
 null;
 
+ --  Abstract subprogram declaration
+
+ elsif Nkind (Subp_Decl) = N_Abstract_Subprogram_Declaration then
+null;
+
  --  Subprogram body acts as spec
 
  elsif Nkind (Subp_Decl) = N_Subprogram_Body
-- 
2.34.1



[COMMITTED] ada: Disable checking of Elab_Spec procedures in CodePeer_Mode

2022-11-22 Thread Marc Poulhiès via Gcc-patches
From: Ghjuvan Lacambre 

This commit re-enables the Validate_Subprogram_Calls check that had been
disabled in a previous commit and has said check skip over Elab_Spec
procedures in CodePeer_Mode.

gcc/ada/

* frontend.adb (Frontend): Re-enable Validate_Subprogram_Calls.
* exp_ch6.adb (Check_BIP_Actuals): When in CodePeer mode, do not
attempt to validate procedures coming from an
Elab_Spec/Elab_Body/Elab_Subp_Body procedure.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch6.adb  | 17 +
 gcc/ada/frontend.adb |  2 +-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
index a5dee38c55f..237a19d1327 100644
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -1115,6 +1115,23 @@ package body Exp_Ch6 is
 | N_Function_Call
 | N_Procedure_Call_Statement);
 
+  --  In CodePeer_Mode, the tree for `'Elab_Spec` procedures will be
+  --  malformed because GNAT does not perform the usual expansion that
+  --  results in the importation of external elaboration procedure symbols.
+  --  This is expected: the CodePeer backend has special handling for this
+  --  malformed tree.
+  --  Thus, we do not need to check the tree (and in fact can't, because
+  --  it's malformed).
+
+  if CodePeer_Mode
+and then Nkind (Name (Subp_Call)) = N_Attribute_Reference
+and then Attribute_Name (Name (Subp_Call)) in Name_Elab_Spec
+| Name_Elab_Body
+| Name_Elab_Subp_Body
+  then
+ return True;
+  end if;
+
   Formal := First_Formal_With_Extras (Subp_Id);
   Actual := First_Actual (Subp_Call);
 
diff --git a/gcc/ada/frontend.adb b/gcc/ada/frontend.adb
index bc3da30b0cf..033ecf3b7be 100644
--- a/gcc/ada/frontend.adb
+++ b/gcc/ada/frontend.adb
@@ -531,7 +531,7 @@ begin
--  formals). It is invoked using pragma Debug to avoid adding any cost
--  when the compiler is built with assertions disabled.
 
-   if not Debug_Flag_Underscore_XX and then not CodePeer_Mode then
+   if not Debug_Flag_Underscore_XX then
   pragma Debug (Exp_Ch6.Validate_Subprogram_Calls (Cunit (Main_Unit)));
end if;
 
-- 
2.34.1



[COMMITTED] ada: Adjust number of errors when removing warning in dead code

2022-11-22 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

When a warning about a runtime exception is emitted for a code in
generic instance, we add continuation warnings "in instantiation ..."
and only the original message increase the total number of errors.

When removing these messages, e.g. after detecting that the code inside
generic instance is dead, we must decrease the total number of errors,
as otherwise the compiler exit status might stop gnatmake or gprbuild.

gcc/ada/

* errout.adb (To_Be_Removed): Decrease total number of errors when
removing a warning that has been escalated into error.
* erroutc.adb (dmsg): Print Warn_Runtime_Raise flag.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/errout.adb  | 11 +++
 gcc/ada/erroutc.adb | 35 ++-
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
index afa30674fa3..b30e8b51d15 100644
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -3351,6 +3351,17 @@ package body Errout is
   Warning_Info_Messages := Warning_Info_Messages - 1;
end if;
 
+   --  When warning about a runtime exception has been escalated
+   --  into error, the starting message has increased the total
+   --  errors counter, so here we decrease this counter.
+
+   if Errors.Table (E).Warn_Runtime_Raise
+ and then not Errors.Table (E).Msg_Cont
+ and then Warning_Mode = Treat_Run_Time_Warnings_As_Errors
+   then
+  Total_Errors_Detected := Total_Errors_Detected - 1;
+   end if;
+
return True;
 
 --  No removal required
diff --git a/gcc/ada/erroutc.adb b/gcc/ada/erroutc.adb
index 7766c972730..d40c668be8a 100644
--- a/gcc/ada/erroutc.adb
+++ b/gcc/ada/erroutc.adb
@@ -312,32 +312,33 @@ package body Erroutc is
 
begin
   w ("Dumping error message, Id = ", Int (Id));
-  w ("  Text = ", E.Text.all);
-  w ("  Next = ", Int (E.Next));
-  w ("  Prev = ", Int (E.Prev));
-  w ("  Sfile= ", Int (E.Sfile));
+  w ("  Text   = ", E.Text.all);
+  w ("  Next   = ", Int (E.Next));
+  w ("  Prev   = ", Int (E.Prev));
+  w ("  Sfile  = ", Int (E.Sfile));
 
   Write_Str
-("  Sptr = ");
+("  Sptr   = ");
   Write_Location (E.Sptr.Ptr);  --  ??? Do not write the full span for now
   Write_Eol;
 
   Write_Str
-("  Optr = ");
+("  Optr   = ");
   Write_Location (E.Optr.Ptr);
   Write_Eol;
 
-  w ("  Line = ", Int (E.Line));
-  w ("  Col  = ", Int (E.Col));
-  w ("  Warn = ", E.Warn);
-  w ("  Warn_Err = ", E.Warn_Err);
-  w ("  Warn_Chr = '" & E.Warn_Chr & ''');
-  w ("  Style= ", E.Style);
-  w ("  Serious  = ", E.Serious);
-  w ("  Uncond   = ", E.Uncond);
-  w ("  Msg_Cont = ", E.Msg_Cont);
-  w ("  Deleted  = ", E.Deleted);
-  w ("  Node = ", Int (E.Node));
+  w ("  Line   = ", Int (E.Line));
+  w ("  Col= ", Int (E.Col));
+  w ("  Warn   = ", E.Warn);
+  w ("  Warn_Err   = ", E.Warn_Err);
+  w ("  Warn_Runtime_Raise = ", E.Warn_Runtime_Raise);
+  w ("  Warn_Chr   = '" & E.Warn_Chr & ''');
+  w ("  Style  = ", E.Style);
+  w ("  Serious= ", E.Serious);
+  w ("  Uncond = ", E.Uncond);
+  w ("  Msg_Cont   = ", E.Msg_Cont);
+  w ("  Deleted= ", E.Deleted);
+  w ("  Node   = ", Int (E.Node));
 
   Write_Eol;
end dmsg;
-- 
2.34.1



[COMMITTED] ada: Fix formatting glitches in Make_Tag_Assignment

2022-11-22 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

gcc/ada/

* exp_ch3.adb (Make_Tag_Assignment): Fix formatting glitches.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb | 40 +---
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index 7b194bb9816..2661a3ff9f6 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -11769,37 +11769,39 @@ package body Exp_Ch3 is
 
function Make_Tag_Assignment (N : Node_Id) return Node_Id is
   Loc  : constant Source_Ptr := Sloc (N);
-  Def_If   : constant Entity_Id := Defining_Identifier (N);
-  Expr : constant Node_Id := Expression (N);
-  Typ  : constant Entity_Id := Etype (Def_If);
-  Full_Typ : constant Entity_Id := Underlying_Type (Typ);
+  Def_If   : constant Entity_Id  := Defining_Identifier (N);
+  Expr : constant Node_Id:= Expression (N);
+  Typ  : constant Entity_Id  := Etype (Def_If);
+  Full_Typ : constant Entity_Id  := Underlying_Type (Typ);
+
   New_Ref  : Node_Id;
 
begin
-  --  This expansion activity is called during analysis.
+  --  This expansion activity is called during analysis
 
   if Is_Tagged_Type (Typ)
-   and then not Is_Class_Wide_Type (Typ)
-   and then not Is_CPP_Class (Typ)
-   and then Tagged_Type_Expansion
-   and then Nkind (Expr) /= N_Aggregate
-   and then (Nkind (Expr) /= N_Qualified_Expression
-  or else Nkind (Expression (Expr)) /= N_Aggregate)
+and then not Is_Class_Wide_Type (Typ)
+and then not Is_CPP_Class (Typ)
+and then Tagged_Type_Expansion
+and then Nkind (Expr) /= N_Aggregate
+and then (Nkind (Expr) /= N_Qualified_Expression
+   or else Nkind (Expression (Expr)) /= N_Aggregate)
   then
  New_Ref :=
Make_Selected_Component (Loc,
-  Prefix=> New_Occurrence_Of (Def_If, Loc),
-  Selector_Name =>
-New_Occurrence_Of (First_Tag_Component (Full_Typ), Loc));
+ Prefix=> New_Occurrence_Of (Def_If, Loc),
+ Selector_Name =>
+   New_Occurrence_Of (First_Tag_Component (Full_Typ), Loc));
+
  Set_Assignment_OK (New_Ref);
 
  return
Make_Assignment_Statement (Loc,
-  Name   => New_Ref,
-  Expression =>
-Unchecked_Convert_To (RTE (RE_Tag),
-  New_Occurrence_Of (Node
-  (First_Elmt (Access_Disp_Table (Full_Typ))), Loc)));
+ Name   => New_Ref,
+ Expression =>
+   Unchecked_Convert_To (RTE (RE_Tag),
+ New_Occurrence_Of
+   (Node (First_Elmt (Access_Disp_Table (Full_Typ))), Loc)));
   else
  return Empty;
   end if;
-- 
2.34.1



[COMMITTED] ada: Fix recent assertion failure on GPR2

2022-11-22 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

It's the compiler trying to load the nonexistent body of a generic package
when trying to inline a call to an expression function of this package that
has a pre or post-condition (hence the need for -gnata to trigger the ICE).

gcc/ada/

* contracts.adb (Build_Subprogram_Contract_Wrapper): Do not fiddle
with the Was_Expression_Function flag. Move a few lines around.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/contracts.adb | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/gcc/ada/contracts.adb b/gcc/ada/contracts.adb
index fef3d24870f..6f474eb2944 100644
--- a/gcc/ada/contracts.adb
+++ b/gcc/ada/contracts.adb
@@ -1691,6 +1691,10 @@ package body Contracts is
   Set_Debug_Info_Needed  (Wrapper_Id);
   Set_Wrapped_Statements (Subp_Id, Wrapper_Id);
 
+  Set_Has_Pragma_Inline (Wrapper_Id, Has_Pragma_Inline (Subp_Id));
+  Set_Has_Pragma_Inline_Always
+(Wrapper_Id, Has_Pragma_Inline_Always (Subp_Id));
+
   --  Create specification and declaration for the wrapper
 
   if No (Ret_Type) or else Ret_Type = Standard_Void_Type then
@@ -1719,20 +1723,6 @@ package body Contracts is
 Make_Handled_Sequence_Of_Statements (Loc,
   End_Label  => Make_Identifier (Loc, Chars (Wrapper_Id;
 
-  --  Move certain flags which are relevant to the body
-
-  --  Wouldn't a better way be to perform some sort of copy of Body_Decl
-  --  for Wrapper_Body be less error-prone ???
-
-  if Was_Expression_Function (Body_Decl) then
- Set_Was_Expression_Function (Body_Decl, False);
- Set_Was_Expression_Function (Wrapper_Body);
-  end if;
-
-  Set_Has_Pragma_Inline (Wrapper_Id, Has_Pragma_Inline (Subp_Id));
-  Set_Has_Pragma_Inline_Always
-(Wrapper_Id, Has_Pragma_Inline_Always (Subp_Id));
-
   --  Prepend a call to the wrapper when the subprogram is a procedure
 
   if No (Ret_Type) or else Ret_Type = Standard_Void_Type then
-- 
2.34.1



[PATCH] c: Propagate erroneous types to declaration specifiers [PR107805]

2022-11-22 Thread Florian Weimer via Gcc-patches
Without this change, finish_declspecs cannot tell that whether there
was an erroneous type specified, or no type at all.  This may result
in additional diagnostics for implicit ints, or missing diagnostics
for multiple types.

PR c/107805

gcc/c/
* c-decl.cc (declspecs_add_type): Propagate error_mark_bode
from type to specs.

gcc/testsuite/
* gcc.dg/pr107805-1.c: New test.
* gcc.dg/pr107805-1.c: Likewise.

---
Note regarding testing: I boostrap with c,c++,lto on x86-64
(non-multlib) and diffed these .sum files:

gcc/testsuite/gcc/gcc.sum
gcc/testsuite/g++/g++.sum
x86_64-pc-linux-gnu/libgomp/testsuite/libgomp.sum
x86_64-pc-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum
x86_64-pc-linux-gnu/libatomic/testsuite/libatomic.sum
x86_64-pc-linux-gnu/libitm/testsuite/libitm.sum

Apart from timestamps, the only differences I get is this change:

--- ./gcc/testsuite/gcc/gcc.sum 2022-11-22 05:45:33.813264761 -0500
+++ /tmp/b/build/./gcc/testsuite/gcc/gcc.sum2022-11-22 06:39:10.667590185 
-0500
@@ -83303,6 +83303,11 @@
 PASS: gcc.dg/pr107618.c  (test for bogus messages, line 9)
 PASS: gcc.dg/pr107618.c (test for excess errors)
 PASS: gcc.dg/pr107686.c (test for excess errors)
+PASS: gcc.dg/pr107805-1.c  (test for errors, line 3)
+PASS: gcc.dg/pr107805-1.c (test for excess errors)
+PASS: gcc.dg/pr107805-2.c  (test for errors, line 3)
+PASS: gcc.dg/pr107805-2.c  (test for errors, line 4)
+PASS: gcc.dg/pr107805-2.c (test for excess errors)
 PASS: gcc.dg/pr11459-1.c (test for excess errors)
 PASS: gcc.dg/pr11492.c  (test for bogus messages, line 8)
 PASS: gcc.dg/pr11492.c (test for excess errors)
@@ -190486,7 +190491,7 @@
 
=== gcc Summary ===
 
-# of expected passes   185932
+# of expected passes   185937
 # of unexpected failures   99
 # of unexpected successes  20
 # of expected failures 1484

So I think this means there are no test suite regressions.

Thanks,
Florian

 gcc/c/c-decl.cc   | 6 ++
 gcc/testsuite/gcc.dg/pr107805-1.c | 5 +
 gcc/testsuite/gcc.dg/pr107805-2.c | 4 
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 098e475f65d..4adb89e4aaf 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -12243,11 +12243,9 @@ declspecs_add_type (location_t loc, struct c_declspecs 
*specs,
 error_at (loc, "two or more data types in declaration specifiers");
   else if (TREE_CODE (type) == TYPE_DECL)
 {
-  if (TREE_TYPE (type) == error_mark_node)
-   ; /* Allow the type to default to int to avoid cascading errors.  */
-  else
+  specs->type = TREE_TYPE (type);
+  if (TREE_TYPE (type) != error_mark_node)
{
- specs->type = TREE_TYPE (type);
  specs->decl_attr = DECL_ATTRIBUTES (type);
  specs->typedef_p = true;
  specs->explicit_signed_p = C_TYPEDEF_EXPLICITLY_SIGNED (type);
diff --git a/gcc/testsuite/gcc.dg/pr107805-1.c 
b/gcc/testsuite/gcc.dg/pr107805-1.c
new file mode 100644
index 000..559b6a5586e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr107805-1.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+typedef int t;
+typedef struct { double a; int b; } t; /* { dg-error "conflicting types" } */
+t x; /* No warning here.  */
+
diff --git a/gcc/testsuite/gcc.dg/pr107805-2.c 
b/gcc/testsuite/gcc.dg/pr107805-2.c
new file mode 100644
index 000..fa5fa4ce273
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr107805-2.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+typedef int t;
+typedef struct { double a; int b; } t; /* { dg-error "conflicting types" } */
+t char x; /* { dg-error "two or more data types" } */

base-commit: e4faee8d02ec5d65bf418612f7181823eb08c078



Re: [PATCH] maintainer-scripts/gcc_release: compress xz in parallel

2022-11-22 Thread Richard Sandiford via Gcc-patches
Sam James via Gcc-patches  writes:
>> On 8 Nov 2022, at 07:14, Sam James  wrote:
>> 
>> 1. This should speed up decompression for folks, as parallel xz
>>   creates a different archive which can be decompressed in parallel.
>> 
>>   Note that this different method is enabled by default in a new
>>   xz release coming shortly anyway (>= 5.3.3_alpha1).
>> 
>>   I build GCC regularly from the weekly snapshots
>>   and so the decompression time adds up.
>> 
>> 2. It should speed up compression on the webserver a bit.
>> 
>>   Note that -T0 won't be the default in the new xz release,
>>   only the parallel compression mode (which enables parallel
>>   decompression).
>> 
>>   -T0 detects the number of cores available.
>> 
>>   So, if a different number of threads is preferred, it's fine
>>   to set e.g. -T2, etc.
>> 
>> Signed-off-by: Sam James 
>> ---
>> maintainer-scripts/gcc_release | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> 
>
> Given no disagreements, anyone fancy pushing
> this in time for Sunday evening for the next 13
> snapshot? ;)

I didn't see an explicit ACK or NACK, but it looks good to me.  I'll push
tomorrow if there are no objections before then.

Thanks,
Richard


Re: [PATCH 1/2] symtab: also change RTL decl name

2022-11-22 Thread Jan Hubicka via Gcc-patches
> On Mon, 21 Nov 2022 20:02:49 +0100
> Jan Hubicka  wrote:
> 
> > > Hi Honza, Ping.
> > > Regtests cleanly for c,fortran,c++,ada,d,go,lto,objc,obj-c++
> > > Ok?
> > > I'd need this for attribute target_clones for the Fortran FE.  
> > Sorry for delay here.
> > > >  void
> > > > @@ -303,6 +301,10 @@ symbol_table::change_decl_assembler_name (tree 
> > > > decl, tree name)
> > > > warning (0, "%qD renamed after being referenced in assembly", 
> > > > decl);
> > > >  
> > > >SET_DECL_ASSEMBLER_NAME (decl, name);
> > > > +  /* Set the new name in rtl.  */
> > > > +  if (DECL_RTL_SET_P (decl))
> > > > +   XSTR (XEXP (DECL_RTL (decl), 0), 0) = IDENTIFIER_POINTER 
> > > > (name);  
> > 
> > I am not quite sure how safe this is.  We generally produce DECL_RTL
> > when we produce assembly file.  So if DECL_RTL is set then we probably
> > already output the original function name and it is too late to change
> > it.
> 
> AFAICS we make_decl_rtl in the fortran FE in trans_function_start.

I see, it may be a relic of something that is no longer necessary.  Can
you see why one needs DECL_RTL so early?
> 
> > 
> > Also RTL is shared so changing it in-place is going to rewrite all the
> > existing RTL expressions using it.
> > 
> > Why the DECL_RTL is produced for function you want to rename?
> 
> I think the fortran FE sets it quite early when lowering a function.
> Later, when the ME creates the target_clones, it wants to rename the
> initial function to initial_fun.default for the default target.
> That's where the change_decl_assembler_name is called (only on the
> decl).
> But nobody changes the RTL name, so the ifunc (which should be the
> initial, unchanged name) is properly emitted but
> assemble_start_function uses the same, unchanged, initial fnname it
> later obtains by get_fnname_from_decl which fetches the (wrong) initial
> name where it should use the .default target name.
> See
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605081.html
> 
> I'm open to other suggestions to make this work in a different way, of
> course. Maybe we're missing some magic somewhere that might share the
> name between the fndecl and the RTL XSTR so the RTL is magically
> updated by that single SET_ECL_ASSEMBLER_NAME in
> change_decl_assembler_name? But i didn't quite see where that'd be?

I think we should start by understanding why Fortran FE produces
DECL_RTL early.  It was written before symbol table code emerged, it may
be simply an oversight I made while converting FE to symbol table.

Honza
> 
> thanks,
> 
> > Honza
> > > > +
> > > >if (alias)
> > > > {
> > > >   IDENTIFIER_TRANSPARENT_ALIAS (name) = 1;  
> > >   
> 


Re: [PATCH 1/2] Fortran: Cleanup struct ext_attr_t

2022-11-22 Thread Mikael Morin

Le 21/11/2022 à 21:34, Bernhard Reutner-Fischer a écrit :

On Mon, 21 Nov 2022 12:08:20 +0100
Mikael Morin  wrote:


* gfortran.h (struct ext_attr_t): Remove middle_end_name.
* trans-decl.cc (add_attributes_to_decl): Move building
tree_list to ...
* decl.cc (gfc_match_gcc_attributes): ... here. Add the attribute to
the tree_list for the middle end.
   

I prefer to not do any middle-end stuff at parsing time, so I would
rather not do this change.
Not OK.


Ok, that means we should filter-out those bits that we don't want to
write to the module (right?). We've plenty of bits left, more than Dave
Love would want to have added, i hope, so that should not be much of a
concern.

I didn't think of modules.  Yes, that means we have to store (in memory) 
the attribute we have parsed, and we can filter-out the attributes at 
the time the attributes are written to the module.  I don't think it is 
strictly necessary (for flatten, at least) though.



What that table really wants to say is whether or not this attribute
should be passed to the ME. Would it be acceptable to remove these
duplicate strings and just have a bool/char/int that is true if it
should be lowered (in trans-decl, as before)? But now i admit it's just
bikeshedding and we can as well leave it alone for now.. It was just a
though.


Yes, that would be acceptable.


Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Richard Earnshaw via Gcc-patches




On 22/11/2022 11:21, Richard Sandiford wrote:

Richard Earnshaw via Gcc-patches  writes:

On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:

gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on
big-endian, because the _Decimal32 on-stack argument is not padded in
the same direction depending on endianness.

This patch fixes the testcase so that it expects the argument in the
right stack location, similarly to what other tests do in the same
directory.

gcc/testsuite/ChangeLog:

PR target/107604
* gcc.target/aarch64/aapcs64/test_dfp_17.c: Fix for big-endian.
---
   gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
index 22dc462bf7c..3c45f715cf7 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
@@ -32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd };
 ANON(struct z, a, D1)
 ANON(struct z, b, STACK)
 ANON(int , 5, W0)
+#ifndef __AAPCS64_BIG_ENDIAN__
 ANON(_Decimal32, f1, STACK+32) /* Note: no promotion to _Decimal64.  */
+#else
+  ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to _Decimal64.  */
+#endif
 LAST_ANON(_Decimal64, 0.5dd, STACK+40)
   #endif


Why would a Decimal32 change stack placement based on the endianness?
Isn't it a 4-byte object?


Yes, but PARM_BOUNDARY (64) sets a minimum alignment for all stack arguments.

Richard


Ah, OK.

I wonder if we should have a new macro in the tests, something like 
ANON_PADDED to describe this case and that works things out more 
automagically for big-endian.


I notice the new ANON definition is not correctly indented.

R.


Re: [PATCH] Fix wrong array type conversion with different storage order

2022-11-22 Thread Richard Biener via Gcc-patches
On Tue, Nov 22, 2022 at 12:06 PM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> when two arrays of scalars have a different storage order in Ada, the
> front-end makes sure that the conversion is performed component-wise
> so that each component can be reversed.  So it's a little bit counter
> productive that the ldist pass performs the opposite transformation
> and synthesizes a memcpy/memmove in this case.
>
> Tested on x86-64/Linux, OK for the mainline?

OK for trunk and branches.

Richard.

>
> 2022-11-22  Eric Botcazou  
>
> * tree-loop-distribution.cc 
> (loop_distribution::classify_builtin_ldst):
> Bail out if source and destination do not have the same storage order.
>
>
> 2022-11-22  Eric Botcazou  
>
> * gnat.dg/sso18.adb: New test.
>
> --
> Eric Botcazou


Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw via Gcc-patches  writes:
> On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:
>> gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on
>> big-endian, because the _Decimal32 on-stack argument is not padded in
>> the same direction depending on endianness.
>> 
>> This patch fixes the testcase so that it expects the argument in the
>> right stack location, similarly to what other tests do in the same
>> directory.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  PR target/107604
>>  * gcc.target/aarch64/aapcs64/test_dfp_17.c: Fix for big-endian.
>> ---
>>   gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4 
>>   1 file changed, 4 insertions(+)
>> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
>> b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
>> index 22dc462bf7c..3c45f715cf7 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
>> @@ -32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd };
>> ANON(struct z, a, D1)
>> ANON(struct z, b, STACK)
>> ANON(int , 5, W0)
>> +#ifndef __AAPCS64_BIG_ENDIAN__
>> ANON(_Decimal32, f1, STACK+32) /* Note: no promotion to _Decimal64.  */
>> +#else
>> +  ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to _Decimal64.  */
>> +#endif
>> LAST_ANON(_Decimal64, 0.5dd, STACK+40)
>>   #endif
>
> Why would a Decimal32 change stack placement based on the endianness? 
> Isn't it a 4-byte object?

Yes, but PARM_BOUNDARY (64) sets a minimum alignment for all stack arguments.

Richard


Re: [PATCH RFA(configure)] c++: provide strchrnul on targets without it [PR107781]

2022-11-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 22, 2022 at 09:41:24AM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Nov 21, 2022 at 06:31:47PM -0500, Jason Merrill via Gcc-patches wrote:
> > Tested x86_64-pc-linux-gnu, and also manually changing the 
> > HAVE_DECL_STRCHRNUL
> > flag.  OK for trunk?
> > 
> > -- 8< --
> > 
> > The Contracts implementation uses strchrnul, which is a glibc extension, so
> > bootstrap broke on non-glibc targets.  I considered unconditionally using a
> > local definition, but I guess we might as well use the libc version if it
> > exists.
> > 
> > PR c++/107781
> > 
> > gcc/cp/ChangeLog:
> > 
> > * contracts.cc (strchrnul): Define if needed.
> > 
> > gcc/ChangeLog:
> > 
> > * configure.ac: Check for strchrnul.
> > * config.in, configure: Regenerate.
> 
> Normally we'd add such a local definition to libiberty, shouldn't we do it
> in this case too?

Or use strcspn as Jonathan posted in the PR, at least glibc will handle
it as strchrnul (start, reject[0]) - start early in the strcspn
implementation.

Jakub



Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Richard Earnshaw via Gcc-patches




On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:

gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on
big-endian, because the _Decimal32 on-stack argument is not padded in
the same direction depending on endianness.

This patch fixes the testcase so that it expects the argument in the
right stack location, similarly to what other tests do in the same
directory.

gcc/testsuite/ChangeLog:

PR target/107604
* gcc.target/aarch64/aapcs64/test_dfp_17.c: Fix for big-endian.
---
  gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
index 22dc462bf7c..3c45f715cf7 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
@@ -32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd };
ANON(struct z, a, D1)
ANON(struct z, b, STACK)
ANON(int , 5, W0)
+#ifndef __AAPCS64_BIG_ENDIAN__
ANON(_Decimal32, f1, STACK+32) /* Note: no promotion to _Decimal64.  */
+#else
+  ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to _Decimal64.  */
+#endif
LAST_ANON(_Decimal64, 0.5dd, STACK+40)
  #endif


Why would a Decimal32 change stack placement based on the endianness? 
Isn't it a 4-byte object?


Re: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-11-22 Thread Richard Biener via Gcc-patches
On Tue, 22 Nov 2022, Richard Sandiford wrote:

> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Biener 
> >> Sent: Tuesday, November 22, 2022 10:59 AM
> >> To: Richard Sandiford 
> >> Cc: Tamar Christina via Gcc-patches ; Tamar
> >> Christina ; Richard Biener
> >> ; nd 
> >> Subject: Re: [PATCH 1/8]middle-end: Recognize scalar reductions from
> >> bitfields and array_refs
> >>
> >> On Tue, 22 Nov 2022, Richard Sandiford wrote:
> >>
> >> > Tamar Christina via Gcc-patches  writes:
> >> > >> So it's not easily possible the within current infrastructure.  But
> >> > >> it does look like ARM might eventually benefit from something like STV
> >> on x86?
> >> > >>
> >> > >
> >> > > I'm not sure.  The problem with trying to do this in RTL is that
> >> > > you'd have to be able to decide from two psuedos whether they come
> >> > > from extracts that are sequential. When coming in from a hard
> >> > > register that's easy yes.  When coming in from a load, or any other
> >> operation that produces psuedos that becomes harder.
> >> >
> >> > Yeah.
> >> >
> >> > Just in case anyone reading the above is tempted to implement STV for
> >> > AArch64: I think it would set a bad precedent if we had a
> >> > paste-&-adjust version of the x86 pass.  AFAIK, the target
> >> > capabilities and constraints are mostly modelled correctly using
> >> > existing mechanisms, so I don't think there's anything particularly
> >> > target-specific about the process of forcing things to be on the general 
> >> > or
> >> SIMD/FP side.
> >> >
> >> > So if we did have an STV-ish thing for AArch64, I think it should be a
> >> > target-independent pass that uses hooks and recog, even if the pass is
> >> > initially enabled for AArch64 only.
> >>
> >> Agreed - maybe some of the x86 code can be leveraged, but of course the
> >> cost modeling is the most difficult to get right - IIRC the x86 backend 
> >> resorts
> >> to backend specific tuning flags rather than trying to get rtx_cost or 
> >> insn_cost
> >> "correct" here.
> >>
> >> > (FWIW, on the patch itself, I tend to agree that this is really an SLP
> >> > optimisation.  If the vectoriser fails to see the benefit, or if it
> >> > fails to handle more complex cases, then it would be good to try to
> >> > fix that.)
> >>
> >> Also agreed - but costing is hard ;)
> >
> > I guess, I still disagree here but I've clearly been out-Richard.  The 
> > problem is still
> > that this is just basic codegen.  I still don't think it requires -O2 to be 
> > usable.
> >
> > So I guess the only correct implementation is to use an STV-like patch.  
> > But given
> > that this is already the second attempt, first RTL one was rejected by 
> > Richard,
> > second GIMPLE one was rejected by Richi I'd like to get an agreement on 
> > this STV
> > thing before I waste months more..
> 
> I don't think this in itself is a good motivation for STV.  My comment
> above was more about the idea of STV for AArch64 in general (since it
> had been raised).
> 
> Personally I still think the reduction should be generated in gimple.

I agree, and the proper place to generate the reduction is in SLP.

Richard.


  1   2   >