Re: [PATCH] RISC-V: Fix ICE in non-canonical march parsing

2023-11-15 Thread Kito Cheng
On Thu, Nov 16, 2023 at 2:32 AM Patrick O'Neill  wrote:
>
> Does relax mean no longer enforcing the canonical order of extensions?

Yes, we've discussed that a long time ago, but we just didn't have
enough people to moving that forward:

https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/14

> Patrick
>
> On 11/14/23 17:52, Kito Cheng wrote:
>
> LGTM, and BTW...I am thinking we could relax the canonical order
> during parsing, did you have interesting and time working on that
> item?
>
> On Wed, Nov 15, 2023 at 9:35 AM Patrick O'Neill  wrote:
>
> Passing in a base extension in non-canonical order (i, e, g) causes GCC
> to ICE:
> xgcc: error: '-march=rv64ge': ISA string is not in canonical order. 'e'
> xgcc: internal compiler error: in add, at 
> common/config/riscv/riscv-common.cc:671
> ...
>
> This is fixed by skipping to the next extension when a non-canonical
> order is detected.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc
> (riscv_subset_list::parse_std_ext): Emit an error and skip to
> the next extension when a non-canonical ordering is detected.


Re: [PATCH] RISC-V: Change unaligned fast/slow/avoid macros to misaligned [PR111557]

2023-11-15 Thread Kito Cheng
ohhh, thanks for fixing that, LGTM!

On Thu, Nov 16, 2023 at 7:31 AM Edwin Lu  wrote:
>
> Fix __riscv_unaligned_fast/slow/avoid macro name to
> __riscv_misaligned_fast/slow/avoid to be consistent with the RISC-V API Spec
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): update macro name
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/attribute-1.c: update macro name
> * gcc.target/riscv/attribute-4.c: ditto
> * gcc.target/riscv/attribute-5.c: ditto
> * gcc.target/riscv/predef-align-1.c: ditto
> * gcc.target/riscv/predef-align-2.c: ditto
> * gcc.target/riscv/predef-align-3.c: ditto
> * gcc.target/riscv/predef-align-4.c: ditto
> * gcc.target/riscv/predef-align-5.c: ditto
> * gcc.target/riscv/predef-align-6.c: ditto
>
> Signed-off-by: Edwin Lu 
> ---
>  gcc/config/riscv/riscv-c.cc |  6 +++---
>  gcc/testsuite/gcc.target/riscv/attribute-1.c| 10 +-
>  gcc/testsuite/gcc.target/riscv/attribute-4.c|  8 
>  gcc/testsuite/gcc.target/riscv/attribute-5.c| 10 +-
>  gcc/testsuite/gcc.target/riscv/predef-align-1.c | 10 +-
>  gcc/testsuite/gcc.target/riscv/predef-align-2.c |  8 
>  gcc/testsuite/gcc.target/riscv/predef-align-3.c | 10 +-
>  gcc/testsuite/gcc.target/riscv/predef-align-4.c | 10 +-
>  gcc/testsuite/gcc.target/riscv/predef-align-5.c |  8 
>  gcc/testsuite/gcc.target/riscv/predef-align-6.c | 10 +-
>  10 files changed, 45 insertions(+), 45 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
> index b7f9ba204f7..dd1bd0596fc 100644
> --- a/gcc/config/riscv/riscv-c.cc
> +++ b/gcc/config/riscv/riscv-c.cc
> @@ -109,11 +109,11 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
>  }
>
>if (riscv_user_wants_strict_align)
> -builtin_define_with_int_value ("__riscv_unaligned_avoid", 1);
> +builtin_define_with_int_value ("__riscv_misaligned_avoid", 1);
>else if (riscv_slow_unaligned_access_p)
> -builtin_define_with_int_value ("__riscv_unaligned_slow", 1);
> +builtin_define_with_int_value ("__riscv_misaligned_slow", 1);
>else
> -builtin_define_with_int_value ("__riscv_unaligned_fast", 1);
> +builtin_define_with_int_value ("__riscv_misaligned_fast", 1);
>
>if (TARGET_MIN_VLEN != 0)
>  builtin_define_with_int_value ("__riscv_v_min_vlen", TARGET_MIN_VLEN);
> diff --git a/gcc/testsuite/gcc.target/riscv/attribute-1.c 
> b/gcc/testsuite/gcc.target/riscv/attribute-1.c
> index abfb0b498e0..a39efb3e6ff 100644
> --- a/gcc/testsuite/gcc.target/riscv/attribute-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/attribute-1.c
> @@ -4,13 +4,13 @@ int foo()
>  {
>
>  /* In absence of -m[no-]strict-align, default mcpu is currently
> -   set to rocket.  rocket has slow_unaligned_access=true.  */
> -#if !defined(__riscv_unaligned_slow)
> -#error "__riscv_unaligned_slow is not set"
> +   set to rocket.  rocket has slow_misaligned_access=true.  */
> +#if !defined(__riscv_misaligned_slow)
> +#error "__riscv_misaligned_slow is not set"
>  #endif
>
> -#if defined(__riscv_unaligned_avoid) || defined(__riscv_unaligned_fast)
> -#error "__riscv_unaligned_avoid or __riscv_unaligned_fast is unexpectedly 
> set"
> +#if defined(__riscv_misaligned_avoid) || defined(__riscv_misaligned_fast)
> +#error "__riscv_misaligned_avoid or __riscv_misaligned_fast is unexpectedly 
> set"
>  #endif
>
>  return 0;
> diff --git a/gcc/testsuite/gcc.target/riscv/attribute-4.c 
> b/gcc/testsuite/gcc.target/riscv/attribute-4.c
> index 545f87cb899..a5a95042a31 100644
> --- a/gcc/testsuite/gcc.target/riscv/attribute-4.c
> +++ b/gcc/testsuite/gcc.target/riscv/attribute-4.c
> @@ -3,12 +3,12 @@
>  int foo()
>  {
>
> -#if !defined(__riscv_unaligned_avoid)
> -#error "__riscv_unaligned_avoid is not set"
> +#if !defined(__riscv_misaligned_avoid)
> +#error "__riscv_misaligned_avoid is not set"
>  #endif
>
> -#if defined(__riscv_unaligned_fast) || defined(__riscv_unaligned_slow)
> -#error "__riscv_unaligned_fast or __riscv_unaligned_slow is unexpectedly set"
> +#if defined(__riscv_misaligned_fast) || defined(__riscv_misaligned_slow)
> +#error "__riscv_misaligned_fast or __riscv_misaligned_slow is unexpectedly 
> set"
>  #endif
>
>return 0;
> diff --git a/gcc/testsuite/gcc.target/riscv/attribute-5.c 
> b/gcc/testsuite/gcc.target/riscv/attribute-5.c
> index 753043c31e9..ad1a1811fa3 100644
> --- a/gcc/testsuite/gcc.target/riscv/attribute-5.c
> +++ b/gcc/testsuite/gcc.target/riscv/attribute-5.c
> @@ -3,13 +3,13 @@
>  int foo()
>  {
>
> -/* Default mcpu is rocket which has slow_unaligned_access=true.  */
> -#if !defined(__riscv_unaligned_slow)
> -#error "__riscv_unaligned_slow is not set"
> +/* Default mcpu is rocket which has slow_misaligned_access=true.  */
> +#if !defined(__riscv_misaligned_slow)
> +#error "__riscv_misaligned_slow is not set"
>  #endif
>
> -#if defined(__riscv_unaligned_avoid) || 

[PATCH v2] LoongArch: Add code generation support for call36 function calls.

2023-11-15 Thread Lulu Cheng
When compiling with '-mcmodel=medium', the function call is made through
'pcaddu18i+jirl' if binutils supports call36, otherwise the
native implementation 'pcalau12i+jirl' is used.

gcc/ChangeLog:

* config.in: Regenerate.
* config/loongarch/loongarch-opts.h (HAVE_AS_SUPPORT_CALL36): Define 
macro.
* config/loongarch/loongarch.cc (loongarch_legitimize_call_address):
If binutils supports call36, the function call is not split over expand.
* config/loongarch/loongarch.md: Add call36 generation code.
* config/loongarch/predicates.md: Likewise.
* configure: Regenerate.
* configure.ac: Check whether binutils supports call36.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/func-call-medium-5.c: If the assembler supports 
call36,
the test is abandoned.
* gcc.target/loongarch/func-call-medium-6.c: Likewise.
* gcc.target/loongarch/func-call-medium-7.c: Likewise.
* gcc.target/loongarch/func-call-medium-8.c: Likewise.
* lib/target-supports.exp: Added a function to see if the assembler 
supports
the call36 relocation.
* gcc.target/loongarch/func-call-medium-call36-1.c: New test.
* gcc.target/loongarch/func-call-medium-call36.c: New test.

Co-authored-by: Xi Ruoyao 

---
v1 -> v2:
  1. Add '(clobber (reg:P 12))' instead of '-fno-ipa-ra' in sibcall 
implementation.
  2. Add test cases.

---
 gcc/config.in |   6 +
 gcc/config/loongarch/loongarch-opts.h |   4 +
 gcc/config/loongarch/loongarch.cc |  12 +-
 gcc/config/loongarch/loongarch.md | 171 +++---
 gcc/config/loongarch/predicates.md|   7 +-
 gcc/configure |  32 
 gcc/configure.ac  |   6 +
 .../gcc.target/loongarch/func-call-medium-5.c |   1 +
 .../gcc.target/loongarch/func-call-medium-6.c |   1 +
 .../gcc.target/loongarch/func-call-medium-7.c |   1 +
 .../gcc.target/loongarch/func-call-medium-8.c |   1 +
 .../loongarch/func-call-medium-call36-1.c |  21 +++
 .../loongarch/func-call-medium-call36.c   |  32 
 gcc/testsuite/lib/target-supports.exp |   9 +
 14 files changed, 268 insertions(+), 36 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/func-call-medium-call36-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-medium-call36.c

diff --git a/gcc/config.in b/gcc/config.in
index 866f9fff101..e100c20dcd0 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -781,6 +781,12 @@
 #endif
 
 
+/* Define if your assembler supports call36 relocation. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_SUPPORT_CALL36
+#endif
+
+
 /* Define if your assembler and linker support thread-local storage. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_TLS
diff --git a/gcc/config/loongarch/loongarch-opts.h 
b/gcc/config/loongarch/loongarch-opts.h
index 8de41bbc4f7..6dd309aad96 100644
--- a/gcc/config/loongarch/loongarch-opts.h
+++ b/gcc/config/loongarch/loongarch-opts.h
@@ -97,6 +97,10 @@ loongarch_update_gcc_opt_status (struct loongarch_target 
*target,
 #define HAVE_AS_EXPLICIT_RELOCS 0
 #endif
 
+#ifndef HAVE_AS_SUPPORT_CALL36
+#define HAVE_AS_SUPPORT_CALL36 0
+#endif
+
 #ifndef HAVE_AS_MRELAX_OPTION
 #define HAVE_AS_MRELAX_OPTION 0
 #endif
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 738911661d7..0bd416255be 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3006,12 +3006,16 @@ loongarch_legitimize_call_address (rtx addr)
 
   enum loongarch_symbol_type symbol_type = loongarch_classify_symbol (addr);
 
-  /* Split function call insn 'bl sym' or 'bl %plt(sym)' to :
- pcalau12i $rd, %pc_hi20(sym)
- jr $rd, %pc_lo12(sym).  */
+  /* If add the compilation option '-cmodel=medium', and the assembler does
+ not support call36.  The following sequence of instructions will be
+ used for the function call:
+   pcalau12i $rd, %pc_hi20(sym)
+   jr $rd, %pc_lo12(sym)
+  */
 
   if (TARGET_CMODEL_MEDIUM
-  && TARGET_EXPLICIT_RELOCS
+  && !HAVE_AS_SUPPORT_CALL36
+  && (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
   && (SYMBOL_REF_P (addr) || LABEL_REF_P (addr))
   && (symbol_type == SYMBOL_PCREL
  || (symbol_type == SYMBOL_GOT_DISP && flag_plt)))
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 22814a3679c..f0b6ae3e2a2 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3274,7 +3274,13 @@ (define_expand "sibcall"
XEXP (target, 1),
operands[1]));
   else
-emit_call_insn (gen_sibcall_internal (target, operands[1]));
+{
+  rtx call = emit_call_insn (gen_sibcall_internal (target, operands[1]));
+
+  if (TARGET_CMODEL_MEDIUM && !REG_P (target))
+   

Re: [PATCH] i386: Fix mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi peephole2 [PR112526]

2023-11-15 Thread Uros Bizjak
On Thu, Nov 16, 2023 at 8:16 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase is miscompiled on x86_64 since PR110551 r14-4968
> commit.  That commit added 2 peephole2s, one for
> mov imm,%rXX; mov %rYY,%rax; mulq %rXX -> mov imm,%rax; mulq %rYY
> which I believe is ok, and another one for
> mov imm,%rXX; mov %rYY,%rdx; mulx %rXX, %rZZ, %rWW -> mov imm,%rdx; mulx 
> %rYY, %rZZ, %rWW
> which is wrong.  Both peephole2s verify that %rXX above is dead at
> the end of the pattern, by checking if %rXX is either one of the
> registers overwritten in the multiplication (%rdx:%rax in the first
> case, the 2 destination registers of mulx in the latter case), because
> we no longer set %rXX to that immediate (we set %rax resp. %rdx to it
> instead) when the peephole2 replaces it.  But, we also need to ensure
> that the other register previously set to the value of %rYY and newly
> to imm isn't used after the multiplication, and neither of the peephole2s
> does that.  Now, for the first one (at least assuming in the % pattern
> the matching operand (i.e. hardcoded %rax resp. %rdx) after RA will always go
> first) I think it is always the case, because operands[2] if it must be %rax
> register will be overwritten by mulq writing to %rdx:%rax.  But in the
> second case, there is no reason why %rdx couldn't be used after the pattern,
> and if it is (like in the testcase), we can't make those changes.
> So, the patch checks similarly to operands[0] that operands[2] (which ought
> to be %rdx if RA puts the % match_dup operand first and nothing swaps it
> afterwards) is either the same register as one of the destination registers
> of mulx or dies at the end of the multiplication.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2023-11-16  Jakub Jelinek  
>
> PR target/112526
> * config/i386/i386.md
> (mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi):
> Verify in define_peephole2 that operands[2] dies or is overwritten
> at the end of multiplication.
>
> * gcc.target/i386/bmi2-pr112526.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.md.jj  2023-11-14 21:38:38.667046713 +0100
> +++ gcc/config/i386/i386.md 2023-11-15 17:03:28.308048728 +0100
> @@ -9918,7 +9918,10 @@ (define_peephole2
> && REGNO (operands[0]) != REGNO (operands[3])
> && (REGNO (operands[0]) == REGNO (operands[4])
> || REGNO (operands[0]) == REGNO (operands[5])
> -   || peep2_reg_dead_p (3, operands[0]))"
> +   || peep2_reg_dead_p (3, operands[0]))
> +   && (REGNO (operands[2]) == REGNO (operands[4])
> +   || REGNO (operands[2]) == REGNO (operands[5])
> +   || peep2_reg_dead_p (3, operands[2]))"
>[(set (match_dup 2) (match_dup 1))
> (parallel [(set (match_dup 4)
>(mult:DWIH (match_dup 2) (match_dup 3)))
> --- gcc/testsuite/gcc.target/i386/bmi2-pr112526.c.jj2023-11-15 
> 16:58:02.230380183 +0100
> +++ gcc/testsuite/gcc.target/i386/bmi2-pr112526.c   2023-11-15 
> 17:02:22.478942259 +0100
> @@ -0,0 +1,27 @@
> +/* PR target/112526 */
> +/* { dg-do run { target { bmi2 && int128 } } } */
> +/* { dg-options "-O2 -mbmi2" } */
> +
> +#include "bmi2-check.h"
> +
> +__attribute__((noipa)) void
> +foo (unsigned long x, unsigned __int128 *y, unsigned long z, unsigned long 
> *w)
> +{
> +  register unsigned long a __asm ("%r10") = x + z;
> +  register unsigned __int128 b __asm ("%r8") = ((unsigned __int128) a) * 
> 257342423UL;
> +  asm volatile ("" : "+r" (b));
> +  asm volatile ("" : "+d" (a));
> +  *y = b;
> +  *w = a;
> +}
> +
> +static void
> +bmi2_test ()
> +{
> +  unsigned __int128 y;
> +  unsigned long w;
> +  foo (10268318293806702989UL, , 4702524958196331333UL, );
> +  if (y != unsigned __int128) 0xc72d2c9UL) << 64) | 0x9586adfdc95b225eUL)
> +  || w != 14970843252003034322UL)
> +abort ();
> +}
>
> Jakub
>


Re: [PATCH] LoongArch: Increase cost of vector aligned store/load.

2023-11-15 Thread WANG Xuerui

On 11/16/23 14:17, Jiahao Xu wrote:

Based on SPEC2017 performance evaluation results, making them equal to the
cost of unaligned store/load to avoid odd alignment peeling is better.


Paraphrasing a bit to shorten the subject of the sentence:

"it's better to make them equal to ... so as to avoid odd-alignment peeling"



gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Adjust.

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 738911661d7..d05743bec87 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3893,11 +3893,9 @@ loongarch_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
case scalar_stmt:
case scalar_load:
case vector_stmt:
-  case vector_load:
case vec_to_scalar:
case scalar_to_vec:
case scalar_store:
-  case vector_store:
return 1;
  
case vec_promote_demote:

@@ -3905,6 +3903,8 @@ loongarch_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
return LASX_SUPPORTED_MODE_P (mode)
  && !LSX_SUPPORTED_MODE_P (mode) ? 2 : 1;
  
+  case vector_load:

+  case vector_store:
case unaligned_load:
case unaligned_store:
return 2;


Re: [PATCH] VECT: Add MASK_LEN_STRIDED_LOAD/MASK_LEN_STRIDED_STORE into loop vectorizer

2023-11-15 Thread juzhe.zh...@rivai.ai
Just finished X86 bootstrap && regtest no regression
And tested on aarch64 no regression.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-11-14 11:46
To: gcc-patches
CC: richard.sandiford; rguenther; Juzhe-Zhong
Subject: [PATCH] VECT: Add MASK_LEN_STRIDED_LOAD/MASK_LEN_STRIDED_STORE into 
loop vectorizer
This patch support generating MASK_LEN_STRIDED_LOAD/MASK_LEN_STRIDED_STORE IR
for invariant stride memory access.
 
It's a special optimization for targets like RVV.
 
RVV has both indexed load/store and stride load/store.
 
In RVV, we always have gather/scatter and strided optab at the same time.
 
E.g. 
void foo (int *__restrict a,
int * __restrict b, int n, int *__restrict indice)
{
for (int i = 0; i < n; i++)
  a[indice[i]] = b[indice[i]] + a[i];
}
 
Such vector codes, RVV is using indexed load/store for gather/scatter.
 
E.g.
 
void foo (int *__restrict a,
int * __restrict b, int n, int m)
{
for (int i = 0; i < n; i++)
  a[i] = b[i * m] + a[i];
}
 
Such vector codes, RVV is using stride load/store instructions.
 
We only need to support direct mask_len_stride_xxx optab for invariant stride.
 
gcc/ChangeLog:
 
* tree-vect-stmts.cc (vect_get_strided_load_store_ops): Add 
MASK_LEN_STRIDED_LOAD/MASK_LEN_STRIDED_STORE.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.
 
---
gcc/tree-vect-stmts.cc | 47 ++
1 file changed, 38 insertions(+), 9 deletions(-)
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ee89f47c468..9c65b688510 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2863,6 +2863,17 @@ vect_get_strided_load_store_ops (stmt_vec_info stmt_info,
   *dataref_bump = cse_and_gimplify_to_preheader (loop_vinfo, bump);
 }
+  /* Target supports strided load/store use DR_STEP as stride for VEC_OFFSET
+ directly instead of build VEC_OFFSET with VEC_SERIES.  */
+  internal_fn ifn
+= DR_IS_READ (dr) ? IFN_MASK_LEN_STRIDED_LOAD : IFN_MASK_LEN_STRIDED_STORE;
+  if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED))
+{
+  *vec_offset = cse_and_gimplify_to_preheader (loop_vinfo,
+unshare_expr (DR_STEP (dr)));
+  return;
+}
+
   /* The offset given in GS_INFO can have pointer type, so use the element
  type of the vector instead.  */
   tree offset_type = TREE_TYPE (gs_info->offset_vectype);
@@ -9012,10 +9023,20 @@ vectorizable_store (vec_info *vinfo,
  gcall *call;
  if (final_len && final_mask)
- call = gimple_build_call_internal
-  (IFN_MASK_LEN_SCATTER_STORE, 7, dataref_ptr,
-   vec_offset, scale, vec_oprnd, final_mask,
-   final_len, bias);
+ {
+   if (VECTOR_TYPE_P (TREE_TYPE (vec_offset)))
+ call = gimple_build_call_internal (
+   IFN_MASK_LEN_SCATTER_STORE, 7, dataref_ptr,
+   vec_offset, scale, vec_oprnd, final_mask, final_len,
+   bias);
+   else
+ /* non-vector type offset means that target prefers to
+use MASK_LEN_STRIDED_STORE instead of
+MASK_LEN_GATHER_STORE with direct stride argument. */
+ call = gimple_build_call_internal (
+   IFN_MASK_LEN_STRIDED_STORE, 6, dataref_ptr,
+   vec_offset, vec_oprnd, final_mask, final_len, bias);
+ }
  else if (final_mask)
call = gimple_build_call_internal
 (IFN_MASK_SCATTER_STORE, 5, dataref_ptr,
@@ -10956,11 +10977,19 @@ vectorizable_load (vec_info *vinfo,
  gcall *call;
  if (final_len && final_mask)
- call
-   = gimple_build_call_internal (IFN_MASK_LEN_GATHER_LOAD, 7,
- dataref_ptr, vec_offset,
- scale, zero, final_mask,
- final_len, bias);
+ {
+   if (VECTOR_TYPE_P (TREE_TYPE (vec_offset)))
+ call = gimple_build_call_internal (
+   IFN_MASK_LEN_GATHER_LOAD, 7, dataref_ptr, vec_offset,
+   scale, zero, final_mask, final_len, bias);
+   else
+ /* non-vector type offset means that target prefers to
+use MASK_LEN_STRIDED_LOAD instead of
+MASK_LEN_GATHER_LOAD with direct stride argument.  */
+ call = gimple_build_call_internal (
+   IFN_MASK_LEN_STRIDED_LOAD, 6, dataref_ptr, vec_offset,
+   zero, final_mask, final_len, bias);
+ }
  else if (final_mask)
call = gimple_build_call_internal (IFN_MASK_GATHER_LOAD, 5,
   dataref_ptr, vec_offset,
-- 
2.36.3
 


Re: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN

2023-11-15 Thread juzhe.zh...@rivai.ai
Update just finished test CI.

Tested on aarch64 QEMU no regression.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-11-14 11:39
To: gcc-patches
CC: richard.sandiford; rguenther; Juzhe-Zhong
Subject: [PATCH] DOC/IFN/OPTAB: Add 
mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN
This patch adds mask_len_strided_load/mask_len_strided_store.
 
Document already has been reviewed.
 
This patch adds OPTAB/IFN support as follows:
 
1. strided load
GIMPLE level:
 
v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias)
 
be expand (by internal-fn.cc) into:
 
v = mask_len_strided_load (ptr, stried, mask, len, bias)
 
2. strided store
 
GIMPLE leve:
 
MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)
 
be expand (by internal-fn.cc) into:
 
mask_len_stried_store (ptr, stride, v, mask, len, bias)
 
Bootstrap and regression on X86 no regression.
 
Ok for trunk ?
gcc/ChangeLog:
 
* doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
* internal-fn.cc (strided_load_direct): Ditto.
(strided_store_direct): Ditto.
(expand_strided_store_optab_fn): Ditto.
(expand_strided_load_optab_fn): Ditto.
(direct_strided_load_optab_supported_p): Ditto.
(direct_strided_store_optab_supported_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
(MASK_LEN_STRIDED_STORE): Ditto.
* optabs.def (OPTAB_D): Ditto.
 
---
gcc/doc/md.texi | 27 +++
gcc/internal-fn.cc  | 63 +
gcc/internal-fn.def |  6 +
gcc/optabs.def  |  2 ++
4 files changed, 98 insertions(+)
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5d86152e5dd..5dc76a1183c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the 
result should
be loaded from memory and clear if element @var{i} of the result should be 
undefined.
Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode 
@var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
5) elements from memory.
+Element @var{i} of the mask (operand 3) is set if element @var{i} of the 
result should
+be loaded from memory and clear if element @var{i} of the result should be 
zero.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
@cindex @code{scatter_store@var{m}@var{n}} instruction pattern
@item @samp{scatter_store@var{m}@var{n}}
Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) 
to memory.
Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+Operand 2 is the vector of values that should be stored, which is of mode 
@var{m}.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+Similar to mask_len_store, the instruction stores at most (operand 4 + operand 
5) elements of mask (operand 3) to memory.
+Element @var{i} of the mask is set if element @var{i} of (operand 3) should be 
stored.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
@cindex @code{vec_set@var{m}} instruction pattern
@item @samp{vec_set@var{m}}
Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5a998e794ad..bfb307684a9 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -164,6 +164,7 @@ init_internal_fns ()
#define load_lanes_direct { -1, -1, false }
#define mask_load_lanes_direct { -1, -1, false }
#define gather_load_direct { 3, 1, false }
+#define strided_load_direct { -1, -1, false }

[PATCH] i386: Fix mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi peephole2 [PR112526]

2023-11-15 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled on x86_64 since PR110551 r14-4968
commit.  That commit added 2 peephole2s, one for
mov imm,%rXX; mov %rYY,%rax; mulq %rXX -> mov imm,%rax; mulq %rYY
which I believe is ok, and another one for
mov imm,%rXX; mov %rYY,%rdx; mulx %rXX, %rZZ, %rWW -> mov imm,%rdx; mulx %rYY, 
%rZZ, %rWW
which is wrong.  Both peephole2s verify that %rXX above is dead at
the end of the pattern, by checking if %rXX is either one of the
registers overwritten in the multiplication (%rdx:%rax in the first
case, the 2 destination registers of mulx in the latter case), because
we no longer set %rXX to that immediate (we set %rax resp. %rdx to it
instead) when the peephole2 replaces it.  But, we also need to ensure
that the other register previously set to the value of %rYY and newly
to imm isn't used after the multiplication, and neither of the peephole2s
does that.  Now, for the first one (at least assuming in the % pattern
the matching operand (i.e. hardcoded %rax resp. %rdx) after RA will always go
first) I think it is always the case, because operands[2] if it must be %rax
register will be overwritten by mulq writing to %rdx:%rax.  But in the
second case, there is no reason why %rdx couldn't be used after the pattern,
and if it is (like in the testcase), we can't make those changes.
So, the patch checks similarly to operands[0] that operands[2] (which ought
to be %rdx if RA puts the % match_dup operand first and nothing swaps it
afterwards) is either the same register as one of the destination registers
of mulx or dies at the end of the multiplication.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-11-16  Jakub Jelinek  

PR target/112526
* config/i386/i386.md
(mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi):
Verify in define_peephole2 that operands[2] dies or is overwritten
at the end of multiplication.

* gcc.target/i386/bmi2-pr112526.c: New test.

--- gcc/config/i386/i386.md.jj  2023-11-14 21:38:38.667046713 +0100
+++ gcc/config/i386/i386.md 2023-11-15 17:03:28.308048728 +0100
@@ -9918,7 +9918,10 @@ (define_peephole2
&& REGNO (operands[0]) != REGNO (operands[3])
&& (REGNO (operands[0]) == REGNO (operands[4])
|| REGNO (operands[0]) == REGNO (operands[5])
-   || peep2_reg_dead_p (3, operands[0]))"
+   || peep2_reg_dead_p (3, operands[0]))
+   && (REGNO (operands[2]) == REGNO (operands[4])
+   || REGNO (operands[2]) == REGNO (operands[5])
+   || peep2_reg_dead_p (3, operands[2]))"
   [(set (match_dup 2) (match_dup 1))
(parallel [(set (match_dup 4)
   (mult:DWIH (match_dup 2) (match_dup 3)))
--- gcc/testsuite/gcc.target/i386/bmi2-pr112526.c.jj2023-11-15 
16:58:02.230380183 +0100
+++ gcc/testsuite/gcc.target/i386/bmi2-pr112526.c   2023-11-15 
17:02:22.478942259 +0100
@@ -0,0 +1,27 @@
+/* PR target/112526 */
+/* { dg-do run { target { bmi2 && int128 } } } */
+/* { dg-options "-O2 -mbmi2" } */
+
+#include "bmi2-check.h"
+
+__attribute__((noipa)) void
+foo (unsigned long x, unsigned __int128 *y, unsigned long z, unsigned long *w)
+{
+  register unsigned long a __asm ("%r10") = x + z;
+  register unsigned __int128 b __asm ("%r8") = ((unsigned __int128) a) * 
257342423UL;
+  asm volatile ("" : "+r" (b));
+  asm volatile ("" : "+d" (a));
+  *y = b;
+  *w = a;
+}
+
+static void
+bmi2_test ()
+{
+  unsigned __int128 y;
+  unsigned long w;
+  foo (10268318293806702989UL, , 4702524958196331333UL, );
+  if (y != unsigned __int128) 0xc72d2c9UL) << 64) | 0x9586adfdc95b225eUL)
+  || w != 14970843252003034322UL)
+abort ();
+}

Jakub



Re: [PATCH] tree-optimization: Add register pressure heuristics

2023-11-15 Thread Richard Biener
On Thu, Nov 16, 2023 at 7:12 AM Ajit Agarwal  wrote:
>
> Hello Richard:
>
> With the below decison making I get the performance at par with trunk
> changes and better than trunk for FP and INT SPEC 2017 benchmarks.
>
> int best_bb_liveout_cnt
> = bitmap_count_bits (>liveout[best_bb->index]);
> int early_bb_liveout_cnt
> = bitmap_count_bits (>liveout[early_bb->index]);
> int early_livein_cnt
> = bitmap_count_bits (>livein[early_bb->index]);
>
> /* High register pressure region is the region where there are live-in of
>  early blocks that has been modified by the early block. If there are
>  modification of the variables in best block that are live-in in early
>  block that are live-out of best block.  */
>   bool live_in_rgn = (early_livein_cnt != 0
>   && early_bb_liveout_cnt <= early_livein_cnt);
>
>   bool high_reg_pressure_rgn = false;
>
>   if (live_in_rgn)
> high_reg_pressure_rgn
>   = (best_bb_liveout_cnt <= early_bb_liveout_cnt);
>   else
> high_reg_pressure_rgn
>   = (best_bb_liveout_cnt != 0 && best_bb_liveout_cnt <= 
> early_bb_liveout_cnt);
>
>   high_reg_pressure_rgn
> = (high_reg_pressure_rgn) && !(best_bb->count >= early_bb->count));
>
> I have included profile data without any sinking threshold or multiplying 
> with 100.
> This will fixes the error prone code as you have mentioned.
>
> This will add the register pressure and profile data both in better way.
>
> Please let me know if this is okay to submit.
>
> I will send the patch accordingly.

Please send an updated patch.

Richard.

> Thanks & Regards
> Ajit
>
> On 03/11/23 8:24 pm, Ajit Agarwal wrote:
> > Hello Richard:
> >
> >
> > On 03/11/23 7:06 pm, Richard Biener wrote:
> >> On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal  
> >> wrote:
> >>>
> >>> Hello Richard:
> >>>
> >>> On 03/11/23 12:51 pm, Richard Biener wrote:
>  On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal  
>  wrote:
> >
> > Hello All:
> >
> >> [...]
> >
> > High register pressure region is the region where there are live-in of
> > early blocks that has been modified by the early block. If there are
> > modification of the variables in best block that are live-in in early
> > block that are live-out of best block.
> 
>  ?!  Parse error.
> 
> >>>
> >>> I didnt understand what you meant here. Please suggest.
> >>
> >> I can't even guess what that paragraph means.  It fails at a
> >> parsing level already, I can't even start to reason about what
> >> the sentences mean.
> >
> > Sorry for that I will modify.
> >
> >>
> > Bootstrapped and regtested on powerpc64-linux-gnu.
> 
>  What's the effect on code generation?
> 
>  Note that live is a quadratic problem while sinking was not.  You
>  are effectively making the pass unfit for -O1.
> 
>  You are computing "liveness" on GIMPLE where within EBBs there
>  isn't really any particular order of stmts, so it's kind of a garbage
>  heuristic.  Likewise you are not computing the effect that moving
>  a stmt has on liveness as far as I can see but you are just identifying
>  some odd metrics (I don't really understand them) to rank blocks,
>  not even taking the register file size into account.
> >>>
> >>>
> >>> if the live out of best_bb  <= live out of early_bb, that shows
> >>> that there are modification in best_bb.
> >>
> >> Hm?  Do you maybe want to say that if live_out (bb) < live_in (bb)
> >> then some variables die during the execution of bb?
> >
> > live_out (bb) < live_in(bb) means in bb there may be KILL (Variables)
> > and there are more GEN (Variables).
> >
> >   Otherwise,
> >> if live_out (early) > live_out (best) then somewhere on the path
> >> from early to best some variables die.
> >>
> >
> > If live_out (early) > live_out (best) means there are more GEN (Variables)
> > between path from early to best.
> >
> >
> >>> Then it's
> >>> safer to move statements in best_bb as there are lesser interfering
> >>> live variables in best_bb.
> >>
> >> consider a stmt
> >>
> >>  a = b + c;
> >>
> >> where b and c die at the definition of a.  Then moving the stmt
> >> down from early_bb means you increase live_out (early_bb) by
> >> one.  So why's that "safer" then?  Of course live_out (best_bb)
> >> also increases by two then.
> >>
> >
> > If b and c die at the definition of a and generates a live_in(early_bb)
> > would be live_out(early_bb) - 2 + 1.
> >
> > the moving the stmt from early_bb down to best_bb increases 
> > live_out(early_bb)
> > by one and live_out (best_bb) depends on the LIVEIN(for all successors of 
> > best_bb)
> > which may be same even if we move down.
> >
> > There are chances that live_out (best_bb) greater if for all successors of
> > best_bb there are more GEN ( variables). If live_out (best_bb) is less
> > means there more KILL (Variables) in successors of best_bb.
> >
> > With my heuristics live_out (best_bb ) > live_out (early_bb) 

Re: [PATCH] slp: Fix handling of IFN_CLZ/CTZ [PR112536]

2023-11-15 Thread Richard Biener
On Thu, 16 Nov 2023, Jakub Jelinek wrote:

> Hi!
> 
> We ICE on the following testcase now that IFN_C[LT]Z calls can have one or
> two arguments (where 2 mean it is well defined at zero).
> The following patch makes us create child node only for the first argument
> and compatible_calls_p ensures the other argument is the same, which
> at least according to the testcase seems sufficient because of vect
> patterns.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2023-11-16  Jakub Jelinek  
> 
>   PR tree-optimization/112536
>   * tree-vect-slp.cc (arg0_map): New variable.
>   (vect_get_operand_map): For IFN_CLZ or IFN_CTZ, return arg0_map.
> 
>   * gcc.dg/pr112536.c: New test.
> 
> --- gcc/tree-vect-slp.cc.jj   2023-11-11 08:52:20.896838494 +0100
> +++ gcc/tree-vect-slp.cc  2023-11-15 10:30:57.606329777 +0100
> @@ -505,6 +505,7 @@ static const int cond_expr_maps[3][5] =
>{ 4, -2, -1, 1, 2 },
>{ 4, -1, -2, 2, 1 }
>  };
> +static const int arg0_map[] = { 1, 0 };
>  static const int arg1_map[] = { 1, 1 };
>  static const int arg2_map[] = { 1, 2 };
>  static const int arg1_arg4_map[] = { 2, 1, 4 };
> @@ -580,6 +581,10 @@ vect_get_operand_map (const gimple *stmt
>   return nullptr;
>   }
>  
> +   case IFN_CLZ:
> +   case IFN_CTZ:
> + return arg0_map;
> +
> default:
>   break;
> }
> --- gcc/testsuite/gcc.dg/pr112536.c.jj2023-11-15 10:37:44.316580909 
> +0100
> +++ gcc/testsuite/gcc.dg/pr112536.c   2023-11-15 10:37:19.464932191 +0100
> @@ -0,0 +1,58 @@
> +/* PR tree-optimization/112536 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-ipa-icf" } */
> +/* { dg-additional-options "-mlzcnt -mavx512cd -mavx512vl" { target { 
> i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-assembler-times "\tvplzcntd\t" 3 { target { i?86-*-* 
> x86_64-*-* } } } } */
> +
> +unsigned a[12];
> +
> +void
> +foo (void)
> +{
> +  int i = a[0];
> +  int j = a[1];
> +  int k = a[2];
> +  int l = a[3];
> +  int e = i ? __builtin_clz (i) : __SIZEOF_INT__ * __CHAR_BIT__;
> +  int f = j ? __builtin_clz (j) : __SIZEOF_INT__ * __CHAR_BIT__;
> +  int g = k ? __builtin_clz (k) : __SIZEOF_INT__ * __CHAR_BIT__;
> +  int h = l ? __builtin_clz (l) : __SIZEOF_INT__ * __CHAR_BIT__;
> +  a[0] = e;
> +  a[1] = f;
> +  a[2] = g;
> +  a[3] = h;
> +}
> +
> +void
> +bar (void)
> +{
> +  int i = a[4];
> +  int j = a[5];
> +  int k = a[6];
> +  int l = a[7];
> +  int e = i ? __builtin_clz (i) : __SIZEOF_INT__ * __CHAR_BIT__;
> +  int f = __builtin_clz (j);
> +  int g = __builtin_clz (k);
> +  int h = l ? __builtin_clz (l) : __SIZEOF_INT__ * __CHAR_BIT__;
> +  a[4] = e;
> +  a[5] = f;
> +  a[6] = g;
> +  a[7] = h;
> +}
> +
> +void
> +baz (void)
> +{
> +  int i = a[8];
> +  int j = a[9];
> +  int k = a[10];
> +  int l = a[11];
> +  int e = __builtin_clz (i);
> +  int f = j ? __builtin_clz (j) : __SIZEOF_INT__ * __CHAR_BIT__;
> +  int g = __builtin_clz (k);
> +  int h = l ? __builtin_clz (l) : __SIZEOF_INT__ * __CHAR_BIT__;
> +  a[8] = e;
> +  a[9] = f;
> +  a[10] = g;
> +  a[11] = h;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] tree-optimization/112282 - wrong-code with ifcvt hoisting

2023-11-15 Thread Richard Biener
On Thu, 16 Nov 2023, Dimitar Dimitrov wrote:

> On Wed, Nov 15, 2023 at 12:11:50PM +, Richard Biener wrote:
> > The following avoids hoisting of invariants from conditionally
> > executed parts of an if-converted loop.  That now makes a difference
> > since we perform bitfield lowering even when we do not actually
> > if-convert the loop.  if-conversion deals with resetting flow-sensitive
> > info when necessary already.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> > 
> > PR tree-optimization/112282
> > * tree-if-conv.cc (ifcvt_hoist_invariants): Only hoist from
> > the loop header.
> > 
> > * gcc.dg/torture/pr112282.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.dg/torture/pr112282.c | 132 
> >  gcc/tree-if-conv.cc |  44 
> >  2 files changed, 153 insertions(+), 23 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr112282.c
> > 
> > diff --git a/gcc/testsuite/gcc.dg/torture/pr112282.c 
> > b/gcc/testsuite/gcc.dg/torture/pr112282.c
> > new file mode 100644
> > index 000..23e0ed64b82
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/torture/pr112282.c
> > @@ -0,0 +1,132 @@
> > +/* { dg-do run } */
> > +
> > +int printf(const char *, ...);
> > +void __assert_fail();
> 
> This function is glibc-only. Thus the test fails on newlib targets with:
> 
>   FAIL: gcc.dg/torture/pr112282.c   -O0  (test for excess errors)
>   Excess errors:
>   pr112282.c:(.text+0x1944): undefined reference to `__assert_fail'
>   pr112282.c:(.text+0x2480): undefined reference to `__assert_fail'
> 
> Perhaps __builtin_abort should be used instead?

Ah.  We need an abort that isn't noreturn to reproduce the problem
though.

I verified the following still reproduces the original problem.

Pushed.

>From 6cb42f8398d10e5161c7e412a7706d2e31bae708 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 16 Nov 2023 08:03:55 +0100
Subject: [PATCH] tree-optimization/112282 - fix testcase
To: gcc-patches@gcc.gnu.org

Avoid requiring a glibc specific symbol.

PR tree-optimization/112282
* gcc.dg/torture/pr112282.c: Do not use __assert_fail.
---
 gcc/testsuite/gcc.dg/torture/pr112282.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr112282.c 
b/gcc/testsuite/gcc.dg/torture/pr112282.c
index 23e0ed64b82..6190b90cf66 100644
--- a/gcc/testsuite/gcc.dg/torture/pr112282.c
+++ b/gcc/testsuite/gcc.dg/torture/pr112282.c
@@ -1,7 +1,12 @@
 /* { dg-do run } */
 
 int printf(const char *, ...);
-void __assert_fail();
+void abort ();
+/* We need an abort that isn't noreturn.  */
+void __attribute__((noipa)) my_abort ()
+{
+  abort ();
+}
 int a, g, h, i, v, w = 2, x, y, ab, ac, ad, ae, af, ag;
 static int f, j, m, n, p, r, u, aa;
 struct b {
@@ -72,7 +77,7 @@ ak:
 ah.e = l.c % q.d;
 q.c = au.e;
 if ((q.d && q.c) || ah.e)
-  __assert_fail();
+  my_abort ();
 q.c = 0;
 if (au.d > m || ah.e)
   w = au.c | (n & ah.c);
@@ -93,7 +98,7 @@ ak:
   if (ah.d)
 o.c = l.c & o.c & q.c;
   if (q.d)
-__assert_fail();
+my_abort ();
   printf("", an);
   printf("", q);
   printf("", au);
-- 
2.35.3



Re: [committed] i386: Return CCmode from ix86_cc_mode for unknown RTX code [PR112494]

2023-11-15 Thread Uros Bizjak
On Tue, Nov 14, 2023 at 6:51 PM Jakub Jelinek  wrote:
>
> On Mon, Nov 13, 2023 at 10:49:23PM +0100, Uros Bizjak wrote:
> > Combine wants to combine following instructions into an insn that can
> > perform both an (arithmetic) operation and set the condition code.  During
> > the conversion a new RTX is created, and combine passes the RTX code of the
> > innermost RTX expression of the CC use insn in which CC reg is used to
> > SELECT_CC_MODE, to determine the new mode of the comparison:
> >
> > Trying 5 -> 8:
> > 5: r98:DI=0xd7
> > 8: flags:CCZ=cmp(r98:DI,0)
> >   REG_EQUAL cmp(0xd7,0)
> > Failed to match this instruction:
> > (parallel [
> > (set (reg:CC 17 flags)
> > (compare:CC (const_int 215 [0xd7])
> > (const_int 0 [0])))
> > (set (reg/v:DI 98 [ flags ])
> > (const_int 215 [0xd7]))
> > ])
> >
> > where:
> >
> > (insn 5 2 6 2 (set (reg/v:DI 98 [ flags ])
> > (const_int 215 [0xd7])) "pr112494.c":8:8 84 {*movdi_internal}
> >  (nil))
> >
> > (insn 8 7 11 2 (set (reg:CCZ 17 flags)
> > (compare:CCZ (reg/v:DI 98 [ flags ])
> > (const_int 0 [0]))) "pr112494.c":11:9 8 {*cmpdi_ccno_1}
> >  (expr_list:REG_EQUAL (compare:CCZ (const_int 215 [0xd7])
> > (const_int 0 [0]))
> > (nil)))
> >
> > x86_cc_mode (AKA SELECT_CC_MODE) is not prepared to handle random RTX
> > codes and triggers gcc_unreachable() when SET RTX code is passed to it.
> > The patch removes gcc_unreachable() and returns CCmode for unknown
> > RTX codes, so combine can try various combinations involving CC reg
> > without triggering ICE.
> >
> > Please note that x86 MOV instructions do not set flags, so the above
> > combination is not recognized as a valid x86 instruction.
> >
> > PR target/112494
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.cc (ix86_cc_mode) [default]: Return CCmode.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr112494.c: New test.
> >
> > Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> For me the test ICEs with RTL checking on both x86_64-linux and i686-linux.
> pr112494.c:17:1: internal compiler error: RTL check: expected elt 0 type 'e' 
> or 'u', have 'E' (rtx unspec) in try_combine, at combine.cc:3237
> This is on
> 3236  /* Just replace the CC reg with a new mode.  */
> 3237  SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
> 3238  undobuf.other_insn = cc_use_insn;
> in combine.cc, where *cc_use_loc is
> (unspec:DI [
> (reg:CC 17 flags)
> ] UNSPEC_PUSHFL)
> on which XEXP (guess combine assumes CC must be used inside of a
> comparison).

Can you please fill out a bugreport, so the bug can be properly
tiaged? I don't think there is anything wrong with new patterns, at
least the documentation doesn't say that CC_reg can't be used in
unspecs and other non-compare RTXes.

Thanks,
Uros.


[PATCH] slp: Fix handling of IFN_CLZ/CTZ [PR112536]

2023-11-15 Thread Jakub Jelinek
Hi!

We ICE on the following testcase now that IFN_C[LT]Z calls can have one or
two arguments (where 2 mean it is well defined at zero).
The following patch makes us create child node only for the first argument
and compatible_calls_p ensures the other argument is the same, which
at least according to the testcase seems sufficient because of vect
patterns.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-11-16  Jakub Jelinek  

PR tree-optimization/112536
* tree-vect-slp.cc (arg0_map): New variable.
(vect_get_operand_map): For IFN_CLZ or IFN_CTZ, return arg0_map.

* gcc.dg/pr112536.c: New test.

--- gcc/tree-vect-slp.cc.jj 2023-11-11 08:52:20.896838494 +0100
+++ gcc/tree-vect-slp.cc2023-11-15 10:30:57.606329777 +0100
@@ -505,6 +505,7 @@ static const int cond_expr_maps[3][5] =
   { 4, -2, -1, 1, 2 },
   { 4, -1, -2, 2, 1 }
 };
+static const int arg0_map[] = { 1, 0 };
 static const int arg1_map[] = { 1, 1 };
 static const int arg2_map[] = { 1, 2 };
 static const int arg1_arg4_map[] = { 2, 1, 4 };
@@ -580,6 +581,10 @@ vect_get_operand_map (const gimple *stmt
return nullptr;
}
 
+ case IFN_CLZ:
+ case IFN_CTZ:
+   return arg0_map;
+
  default:
break;
  }
--- gcc/testsuite/gcc.dg/pr112536.c.jj  2023-11-15 10:37:44.316580909 +0100
+++ gcc/testsuite/gcc.dg/pr112536.c 2023-11-15 10:37:19.464932191 +0100
@@ -0,0 +1,58 @@
+/* PR tree-optimization/112536 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ipa-icf" } */
+/* { dg-additional-options "-mlzcnt -mavx512cd -mavx512vl" { target { i?86-*-* 
x86_64-*-* } } } */
+/* { dg-final { scan-assembler-times "\tvplzcntd\t" 3 { target { i?86-*-* 
x86_64-*-* } } } } */
+
+unsigned a[12];
+
+void
+foo (void)
+{
+  int i = a[0];
+  int j = a[1];
+  int k = a[2];
+  int l = a[3];
+  int e = i ? __builtin_clz (i) : __SIZEOF_INT__ * __CHAR_BIT__;
+  int f = j ? __builtin_clz (j) : __SIZEOF_INT__ * __CHAR_BIT__;
+  int g = k ? __builtin_clz (k) : __SIZEOF_INT__ * __CHAR_BIT__;
+  int h = l ? __builtin_clz (l) : __SIZEOF_INT__ * __CHAR_BIT__;
+  a[0] = e;
+  a[1] = f;
+  a[2] = g;
+  a[3] = h;
+}
+
+void
+bar (void)
+{
+  int i = a[4];
+  int j = a[5];
+  int k = a[6];
+  int l = a[7];
+  int e = i ? __builtin_clz (i) : __SIZEOF_INT__ * __CHAR_BIT__;
+  int f = __builtin_clz (j);
+  int g = __builtin_clz (k);
+  int h = l ? __builtin_clz (l) : __SIZEOF_INT__ * __CHAR_BIT__;
+  a[4] = e;
+  a[5] = f;
+  a[6] = g;
+  a[7] = h;
+}
+
+void
+baz (void)
+{
+  int i = a[8];
+  int j = a[9];
+  int k = a[10];
+  int l = a[11];
+  int e = __builtin_clz (i);
+  int f = j ? __builtin_clz (j) : __SIZEOF_INT__ * __CHAR_BIT__;
+  int g = __builtin_clz (k);
+  int h = l ? __builtin_clz (l) : __SIZEOF_INT__ * __CHAR_BIT__;
+  a[8] = e;
+  a[9] = f;
+  a[10] = g;
+  a[11] = h;
+}

Jakub



Re: [PATCH]middle-end: skip checking loop exits if loop malformed [PR111878]

2023-11-15 Thread Richard Biener
On Wed, 15 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> Before my refactoring if the loop->latch was incorrect then find_loop_location
> skipped checking the edges and would eventually return a dummy location.
> 
> It turns out that a loop can have
> loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS) but also not have a latch
> in which case get_loop_exit_edges traps.
> 
> This restores the old behavior.
> 
> Bootstrapped Regtested on x86_64-pc-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/111878
>   * tree-vect-loop-manip.cc (find_loop_location): Skip edges check if
>   latch incorrect.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/111878
>   * gcc.dg/graphite/pr111878.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/graphite/pr111878.c 
> b/gcc/testsuite/gcc.dg/graphite/pr111878.c
> new file mode 100644
> index 
> ..6722910062e43c827e94c53b43f106af1848852a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/graphite/pr111878.c
> @@ -0,0 +1,19 @@
> +/* { dg-options "-O3 -fgraphite-identity -fsave-optimization-record" } */
> +
> +int long_c2i_ltmp;
> +int *long_c2i_cont;
> +
> +void
> +long_c2i (long utmp, int i)
> +{
> +  int neg = 1;
> +  switch (long_c2i_cont[0])
> +case 0:
> +neg = 0;
> +  for (; i; i++)
> +if (neg)
> +  utmp |= long_c2i_cont[i] ^ 5;
> +else
> +  utmp |= long_c2i_cont[i];
> +  long_c2i_ltmp = utmp;
> +}
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> b9161274ce401a7307f3e61ad23aa036701190d7..ff188840c1762d0b5fb6655cb93b5a8662b31343
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1792,7 +1792,8 @@ find_loop_location (class loop *loop)
>if (!loop)

The testcase asks for the location of the root of the loop tree,
I think it's more sensible to explicitly handle this case adding

  /* For the root of the loop tree return the function location.  */
  if (!loop_outer (loop))
return dump_user_location_t::from_function_decl (cfun->decl);

OK with that change.

Richard.

>  return dump_user_location_t ();
>  
> -  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
> +  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS)
> +  && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun))
>  {
>/* We only care about the loop location, so use any exit with location
>information.  */
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] VECT: Clear LOOP_VINFO_USING_SELECT_VL_P when loop is not partial vectorized

2023-11-15 Thread Richard Biener
On Thu, 16 Nov 2023, Juzhe-Zhong wrote:

> This patch fixes ICE:
> https://godbolt.org/z/z8T6o6qov
> 
> : In function 'b':
> :2:6: error: missing definition
> 2 | void b() {
>   |  ^
> for SSA_NAME: loop_len_8 in statement:
> _1 = -loop_len_8;
> during GIMPLE pass: vect
> :2:6: internal compiler error: verify_ssa failed
> 0x7f1b56331082 __libc_start_main
>   ???:0
> Please submit a full bug report, with preprocessed source (by using 
> -freport-bug).
> Please include the complete backtrace with any bug report.
> See  for instructions.
> Compiler returned: 1
> 
> The root cause is we generate such IR in vectorization:
> 
>   _1 = -loop_len_8;
>   vect_cst__11 = {_1, _1};
>   _18 = vect_vec_iv_.6_14 + vect_cst__11;
> 
> loop_len_8 is uninitialized value.
> 
> The IR _18 = vect_vec_iv_.6_14 + vect_cst__11; is generated because of we are 
> adding induction variable with
> the result of SELECT_VL instead of VF.
> 
> The code is:
> 
>   else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> {
>   /* When we're using loop_len produced by SELEC_VL, the non-final
>iterations are not always processing VF elements.  So vectorize
>induction variable instead of
> 
>  _21 = vect_vec_iv_.6_22 + { VF, ... };
> 
>We should generate:
> 
>  _35 = .SELECT_VL (ivtmp_33, VF);
>  vect_cst__22 = [vec_duplicate_expr] _35;
>  _21 = vect_vec_iv_.6_22 + vect_cst__22;  */
>   gcc_assert (!slp_node);
>   gimple_seq seq = NULL;
>   vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
>   tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0);
>   expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
>unshare_expr (len)),
>  , true, NULL_TREE);
>   new_name = gimple_build (, MULT_EXPR, TREE_TYPE (step_expr), expr,
>  step_expr);
>   gsi_insert_seq_before (, seq, GSI_SAME_STMT);
>   step_iv_si = 
> }
> 
> LOOP_VINFO_USING_SELECT_VL_P is set before loop vectorization analysis so we 
> don't know whether it is partial
> vectorization or not but the induction variable depends on SELECT_VL_P is 
> true.
> 
> So update SELECT_VL_P as false when it is not partial vectorization.

OK.

>   PR middle-end/112554
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling): Clear 
> SELECT_VL_P for non-partial vectorization.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/autovec/pr112554.c: New test.
> 
> ---
>  .../gcc.target/riscv/rvv/autovec/pr112554.c | 12 
>  gcc/tree-vect-loop.cc   | 13 +
>  2 files changed, 25 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112554.c
> 
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112554.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112554.c
> new file mode 100644
> index 000..4afa7c2b15c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112554.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 
> --param=riscv-autovec-preference=fixed-vlmax" } */
> +
> +int a;
> +void b() {
> +  unsigned long c = 18446744073709551612UL;
> +d:
> +  --c;
> +  a ^= c;
> +  if (c)
> +goto d;
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index fb8d999ee6b..3f59139cb01 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -2657,6 +2657,19 @@ vect_determine_partial_vectors_and_peeling 
> (loop_vec_info loop_vinfo)
>  = (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> && need_peeling_or_partial_vectors_p);
>  
> +  /* We set LOOP_VINFO_USING_SELECT_VL_P as true before loop vectorization
> + analysis that we don't know whether the loop is vectorized by partial
> + vectors (More details see tree-vect-loop-manip.cc).
> +
> + However, SELECT_VL vectorizaton style should only applied on partial
> + vectorization since SELECT_VL is the GIMPLE IR that calculates the
> + number of elements to be process for each iteration.
> +
> + After loop vectorization analysis, Clear LOOP_VINFO_USING_SELECT_VL_P
> + if it is not partial vectorized loop.  */
> +  if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> +LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = false;
> +
>return opt_result::success ();
>  }
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] tree-optimization/112282 - wrong-code with ifcvt hoisting

2023-11-15 Thread Dimitar Dimitrov
On Wed, Nov 15, 2023 at 12:11:50PM +, Richard Biener wrote:
> The following avoids hoisting of invariants from conditionally
> executed parts of an if-converted loop.  That now makes a difference
> since we perform bitfield lowering even when we do not actually
> if-convert the loop.  if-conversion deals with resetting flow-sensitive
> info when necessary already.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> 
>   PR tree-optimization/112282
>   * tree-if-conv.cc (ifcvt_hoist_invariants): Only hoist from
>   the loop header.
> 
>   * gcc.dg/torture/pr112282.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/torture/pr112282.c | 132 
>  gcc/tree-if-conv.cc |  44 
>  2 files changed, 153 insertions(+), 23 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr112282.c
> 
> diff --git a/gcc/testsuite/gcc.dg/torture/pr112282.c 
> b/gcc/testsuite/gcc.dg/torture/pr112282.c
> new file mode 100644
> index 000..23e0ed64b82
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr112282.c
> @@ -0,0 +1,132 @@
> +/* { dg-do run } */
> +
> +int printf(const char *, ...);
> +void __assert_fail();

This function is glibc-only. Thus the test fails on newlib targets with:

  FAIL: gcc.dg/torture/pr112282.c   -O0  (test for excess errors)
  Excess errors:
  pr112282.c:(.text+0x1944): undefined reference to `__assert_fail'
  pr112282.c:(.text+0x2480): undefined reference to `__assert_fail'

Perhaps __builtin_abort should be used instead?

Regards,
Dimitar


[PATCH] LoongArch: Increase cost of vector aligned store/load.

2023-11-15 Thread Jiahao Xu
Based on SPEC2017 performance evaluation results, making them equal to the
cost of unaligned store/load to avoid odd alignment peeling is better.

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Adjust.

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 738911661d7..d05743bec87 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3893,11 +3893,9 @@ loongarch_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
   case scalar_stmt:
   case scalar_load:
   case vector_stmt:
-  case vector_load:
   case vec_to_scalar:
   case scalar_to_vec:
   case scalar_store:
-  case vector_store:
return 1;
 
   case vec_promote_demote:
@@ -3905,6 +3903,8 @@ loongarch_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
return LASX_SUPPORTED_MODE_P (mode)
  && !LSX_SUPPORTED_MODE_P (mode) ? 2 : 1;
 
+  case vector_load:
+  case vector_store:
   case unaligned_load:
   case unaligned_store:
return 2;
-- 
2.20.1



[PATCH] aarch64: Add support for Ampere-1B (-mcpu=ampere1b) CPU

2023-11-15 Thread Philipp Tomsich
This patch adds initial support for Ampere-1B core.

The Ampere-1B core implements ARMv8.7 with the following (compiler
visible) extensions:
 - CSSC (Common Short Sequence Compression instructions),
 - MTE (Memory Tagging Extension)
 - SM3/SM4

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere-1b
* config/aarch64/aarch64-cost-tables.h: Add ampere1b_extra_costs
* config/aarch64/aarch64.cc: Add ampere1b_prefetch_tune and
ampere1b_advsimd_vector_costs
* config/aarch64/aarch64-tune.md: Regenerate
* doc/invoke.texi: Document -mcpu=ampere1b

Signed-off-by: Philipp Tomsich 
---

 gcc/config/aarch64/aarch64-cores.def |   1 +
 gcc/config/aarch64/aarch64-cost-tables.h | 107 +++
 gcc/config/aarch64/aarch64-tune.md   |   2 +-
 gcc/config/aarch64/aarch64.cc|  89 +++
 gcc/doc/invoke.texi  |   2 +-
 5 files changed, 199 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index eae40b29df6..19dfb133d29 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -74,6 +74,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  V8A,  
(CRC, CRYPTO), thu
 /* Ampere Computing ('\xC0') cores. */
 AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, SHA3), 
ampere1, 0xC0, 0xac3, -1)
 AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, SHA3, 
SM4, MEMTAG), ampere1a, 0xC0, 0xac4, -1)
+AARCH64_CORE("ampere1b", ampere1b, cortexa57, V8_7A, (F16, RNG, AES, SHA3, 
SM4, MEMTAG, CSSC), ampere1b, 0xC0, 0xac5, -1)
 /* Do not swap around "emag" and "xgene1",
this order is required to handle variant correctly. */
 AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), emag, 
0x50, 0x000, 3)
diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index 0cb638f3a13..4c8da7f119b 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -882,4 +882,111 @@ const struct cpu_cost_table ampere1a_extra_costs =
   }
 };
 
+const struct cpu_cost_table ampere1b_extra_costs =
+{
+  /* ALU */
+  {
+0, /* arith.  */
+0, /* logical.  */
+0, /* shift.  */
+COSTS_N_INSNS (1), /* shift_reg.  */
+0, /* arith_shift.  */
+COSTS_N_INSNS (1), /* arith_shift_reg.  */
+0, /* log_shift.  */
+COSTS_N_INSNS (1), /* log_shift_reg.  */
+0, /* extend.  */
+COSTS_N_INSNS (1), /* extend_arith.  */
+0, /* bfi.  */
+0, /* bfx.  */
+0, /* clz.  */
+0, /* rev.  */
+0, /* non_exec.  */
+true   /* non_exec_costs_exec.  */
+  },
+  {
+/* MULT SImode */
+{
+  COSTS_N_INSNS (2),   /* simple.  */
+  COSTS_N_INSNS (2),   /* flag_setting.  */
+  COSTS_N_INSNS (2),   /* extend.  */
+  COSTS_N_INSNS (3),   /* add.  */
+  COSTS_N_INSNS (3),   /* extend_add.  */
+  COSTS_N_INSNS (12)   /* idiv.  */
+},
+/* MULT DImode */
+{
+  COSTS_N_INSNS (2),   /* simple.  */
+  0,   /* flag_setting (N/A).  */
+  COSTS_N_INSNS (2),   /* extend.  */
+  COSTS_N_INSNS (3),   /* add.  */
+  COSTS_N_INSNS (3),   /* extend_add.  */
+  COSTS_N_INSNS (18)   /* idiv.  */
+}
+  },
+  /* LD/ST */
+  {
+COSTS_N_INSNS (2), /* load.  */
+COSTS_N_INSNS (2), /* load_sign_extend.  */
+0, /* ldrd (n/a).  */
+0, /* ldm_1st.  */
+0, /* ldm_regs_per_insn_1st.  */
+0, /* ldm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (3), /* loadf.  */
+COSTS_N_INSNS (3), /* loadd.  */
+COSTS_N_INSNS (3), /* load_unaligned.  */
+0, /* store.  */
+0, /* strd.  */
+0, /* stm_1st.  */
+0, /* stm_regs_per_insn_1st.  */
+0, /* stm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (1), /* storef.  */
+COSTS_N_INSNS (1), /* stored.  */
+COSTS_N_INSNS (1), /* store_unaligned.  */
+COSTS_N_INSNS (3), /* loadv.  */
+COSTS_N_INSNS (3)  /* storev.  */
+  },
+  {
+/* FP SFmode */
+{
+  COSTS_N_INSNS (18),  /* div.  */
+  COSTS_N_INSNS (3),   /* mult.  */
+  COSTS_N_INSNS (3),   /* mult_addsub.  */
+  COSTS_N_INSNS (3),   /* fma.  */
+  COSTS_N_INSNS (2),   /* addsub.  */
+  COSTS_N_INSNS (1),   /* fpconst.  */
+  COSTS_N_INSNS (2),   /* neg.  */
+  

[PATCH] aarch64: costs: update for TARGET_CSSC

2023-11-15 Thread Philipp Tomsich
With the addition of CSSC (Common Short Sequence Compression)
instructions, a number of idioms match to single instructions (e.g.,
abs) that previously expanded to multi-instruction sequences.

This recognizes (some of) those idioms that are now misclassified and
returns a cost of a single instruction.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_rtx_costs): Support
idioms matching to CSSC instructions, if target CSSC is
present

Signed-off-by: Philipp Tomsich 
---

 gcc/config/aarch64/aarch64.cc | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 800a8b0e110..d89c94519e9 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -14431,10 +14431,17 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int 
outer ATTRIBUTE_UNUSED,
   return false;
 
 case CTZ:
-  *cost = COSTS_N_INSNS (2);
+  if (!TARGET_CSSC)
+   {
+ /* Will be split to a bit-reversal + clz */
+ *cost = COSTS_N_INSNS (2);
+
+ if (speed)
+   *cost += extra_cost->alu.clz + extra_cost->alu.rev;
+   }
+  else
+   *cost = COSTS_N_INSNS (1);
 
-  if (speed)
-   *cost += extra_cost->alu.clz + extra_cost->alu.rev;
   return false;
 
 case COMPARE:
@@ -15373,12 +15380,17 @@ cost_plus:
}
   else
{
- /* Integer ABS will either be split to
-two arithmetic instructions, or will be an ABS
-(scalar), which we don't model.  */
- *cost = COSTS_N_INSNS (2);
- if (speed)
-   *cost += 2 * extra_cost->alu.arith;
+ if (!TARGET_CSSC)
+   {
+ /* Integer ABS will either be split to
+two arithmetic instructions, or will be an ABS
+(scalar), which we don't model.  */
+ *cost = COSTS_N_INSNS (2);
+ if (speed)
+   *cost += 2 * extra_cost->alu.arith;
+   }
+ else
+   *cost = COSTS_N_INSNS (1);
}
   return false;
 
@@ -15388,13 +15400,15 @@ cost_plus:
{
  if (VECTOR_MODE_P (mode))
*cost += extra_cost->vect.alu;
- else
+ else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
{
  /* FMAXNM/FMINNM/FMAX/FMIN.
 TODO: This may not be accurate for all implementations, but
 we do not model this in the cost tables.  */
  *cost += extra_cost->fp[mode == DFmode].addsub;
}
+ else if (TARGET_CSSC)
+   *cost = COSTS_N_INSNS (1);
}
   return false;
 
-- 
2.34.1



Re: [PATCH] tree-optimization: Add register pressure heuristics

2023-11-15 Thread Ajit Agarwal
Hello Richard:

With the below decison making I get the performance at par with trunk
changes and better than trunk for FP and INT SPEC 2017 benchmarks.

int best_bb_liveout_cnt
= bitmap_count_bits (>liveout[best_bb->index]);
int early_bb_liveout_cnt
= bitmap_count_bits (>liveout[early_bb->index]);
int early_livein_cnt
= bitmap_count_bits (>livein[early_bb->index]);
  
/* High register pressure region is the region where there are live-in of
 early blocks that has been modified by the early block. If there are
 modification of the variables in best block that are live-in in early
 block that are live-out of best block.  */
  bool live_in_rgn = (early_livein_cnt != 0
  && early_bb_liveout_cnt <= early_livein_cnt);

  bool high_reg_pressure_rgn = false;

  if (live_in_rgn)
high_reg_pressure_rgn
  = (best_bb_liveout_cnt <= early_bb_liveout_cnt);
  else
high_reg_pressure_rgn
  = (best_bb_liveout_cnt != 0 && best_bb_liveout_cnt <= 
early_bb_liveout_cnt);

  high_reg_pressure_rgn
= (high_reg_pressure_rgn) && !(best_bb->count >= early_bb->count));

I have included profile data without any sinking threshold or multiplying with 
100.
This will fixes the error prone code as you have mentioned.

This will add the register pressure and profile data both in better way.

Please let me know if this is okay to submit.

I will send the patch accordingly.

Thanks & Regards
Ajit

On 03/11/23 8:24 pm, Ajit Agarwal wrote:
> Hello Richard:
> 
> 
> On 03/11/23 7:06 pm, Richard Biener wrote:
>> On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal  wrote:
>>>
>>> Hello Richard:
>>>
>>> On 03/11/23 12:51 pm, Richard Biener wrote:
 On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal  wrote:
>
> Hello All:
>
>> [...]
>
> High register pressure region is the region where there are live-in of
> early blocks that has been modified by the early block. If there are
> modification of the variables in best block that are live-in in early
> block that are live-out of best block.

 ?!  Parse error.

>>>
>>> I didnt understand what you meant here. Please suggest.
>>
>> I can't even guess what that paragraph means.  It fails at a
>> parsing level already, I can't even start to reason about what
>> the sentences mean.
> 
> Sorry for that I will modify.
> 
>>
> Bootstrapped and regtested on powerpc64-linux-gnu.

 What's the effect on code generation?

 Note that live is a quadratic problem while sinking was not.  You
 are effectively making the pass unfit for -O1.

 You are computing "liveness" on GIMPLE where within EBBs there
 isn't really any particular order of stmts, so it's kind of a garbage
 heuristic.  Likewise you are not computing the effect that moving
 a stmt has on liveness as far as I can see but you are just identifying
 some odd metrics (I don't really understand them) to rank blocks,
 not even taking the register file size into account.
>>>
>>>
>>> if the live out of best_bb  <= live out of early_bb, that shows
>>> that there are modification in best_bb.
>>
>> Hm?  Do you maybe want to say that if live_out (bb) < live_in (bb)
>> then some variables die during the execution of bb?
> 
> live_out (bb) < live_in(bb) means in bb there may be KILL (Variables)
> and there are more GEN (Variables).
> 
>   Otherwise,
>> if live_out (early) > live_out (best) then somewhere on the path
>> from early to best some variables die.
>>
> 
> If live_out (early) > live_out (best) means there are more GEN (Variables)
> between path from early to best.
> 
> 
>>> Then it's
>>> safer to move statements in best_bb as there are lesser interfering
>>> live variables in best_bb.
>>
>> consider a stmt
>>
>>  a = b + c;
>>
>> where b and c die at the definition of a.  Then moving the stmt
>> down from early_bb means you increase live_out (early_bb) by
>> one.  So why's that "safer" then?  Of course live_out (best_bb)
>> also increases by two then.
>>
> 
> If b and c die at the definition of a and generates a live_in(early_bb)
> would be live_out(early_bb) - 2 + 1.
> 
> the moving the stmt from early_bb down to best_bb increases live_out(early_bb)
> by one and live_out (best_bb) depends on the LIVEIN(for all successors of 
> best_bb)
> which may be same even if we move down.
> 
> There are chances that live_out (best_bb) greater if for all successors of 
> best_bb there are more GEN ( variables). If live_out (best_bb) is less
> means there more KILL (Variables) in successors of best_bb.
> 
> With my heuristics live_out (best_bb ) > live_out (early_bb) then we dont do
> code motion as there are chances of more interfering live ranges. If 
> liveout(best_bb)
> <= liveout (early_bb) then we do code motion as there is there are more 
> KILL(for
> all successors of best_bb) and there is less chance of interfering live 
> ranges.
> 
> With moving down above stmt from early_bb to best_bb increases 
> 

[PATCH] Fix ICE of unrecognizable insn.

2023-11-15 Thread liuhongt
The new added splitter will generate

(insn 58 56 59 2 (set (reg:V4HI 20 xmm0 [129])
(vec_duplicate:V4HI (reg:HI 22 xmm2 [123]))) "testcase.c":16:21 -1

But we only have

(define_insn "*vec_dupv4hi"
  [(set (match_operand:V4HI 0 "register_operand" "=y,Yw")
(vec_duplicate:V4HI
  (truncate:HI
(match_operand:SI 1 "register_operand" "0,Yw"]

The patch add patterns for V4HI and V2HI.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

PR target/112532
* config/i386/mmx.md (*vec_dup): Extend for V4HI and
V2HI.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112532.c: New test.
---
 gcc/config/i386/mmx.md   |  8 
 gcc/testsuite/gcc.target/i386/pr112532.c | 21 +
 2 files changed, 25 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112532.c

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index a3d08bb9d3b..e4b89160fc0 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -5277,8 +5277,8 @@ (define_insn "*vec_dupv4hi"
(set_attr "mode" "DI,TI")])
 
 (define_insn "*vec_dup"
-  [(set (match_operand:V4F_64 0 "register_operand" "=Yw")
-   (vec_duplicate:V4F_64
+  [(set (match_operand:V4FI_64 0 "register_operand" "=Yw")
+   (vec_duplicate:V4FI_64
  (match_operand: 1 "register_operand" "Yw")))]
   "TARGET_MMX_WITH_SSE"
   "%vpshuflw\t{$0, %1, %0|%0, %1, 0}"
@@ -5869,8 +5869,8 @@ (define_insn "*vec_dupv2hi"
(set_attr "mode" "TI")])
 
 (define_insn "*vec_dup"
-  [(set (match_operand:V2F_32 0 "register_operand" "=Yw")
-   (vec_duplicate:V2F_32
+  [(set (match_operand:V2FI_32 0 "register_operand" "=Yw")
+   (vec_duplicate:V2FI_32
  (match_operand: 1 "register_operand" "Yw")))]
   "TARGET_SSE2"
   "%vpshuflw\t{$0, %1, %0|%0, %1, 0}"
diff --git a/gcc/testsuite/gcc.target/i386/pr112532.c 
b/gcc/testsuite/gcc.target/i386/pr112532.c
new file mode 100644
index 000..690f1d9670d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112532.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-msse4 -O2" } */
+
+typedef char __attribute__((__vector_size__(2))) v16u8;
+typedef int __attribute__((__vector_size__(8))) v64u8;
+typedef unsigned short __attribute__((__vector_size__(2))) v16u16;
+typedef unsigned short __attribute__((__vector_size__(8))) v64u16;
+v64u16 foo0_v64u16_0;
+int __attribute__((__vector_size__(4 * sizeof(int foo0_v128u32_0;
+__attribute__((__vector_size__(8 * sizeof(short unsigned short 
foo0_v128u16_0;
+v16u16 foo0_v16u16_0;
+v16u8 foo0() {
+  v16u16 v16u16_1 = 
__builtin_shufflevector(__builtin_shufflevector(__builtin_convertvector(foo0_v128u32_0,
 v64u16),foo0_v16u16_0, 1, 4, 2, 0, 0, 2, 2, 2),foo0_v16u16_0, 7);
+  foo0_v64u16_0 -= (short)v16u16_1;
+  v64u16 v64u16_3 = __builtin_shufflevector(v16u16_1, 
__builtin_shufflevector((v16u16){}, foo0_v128u16_0, 7, 0), 0, 1, 2, 2);
+  return (union {v16u8 b;})
+{((union {
+  v64u8 a;
+  int b;
+})(v64u8)v64u16_3).b}.b + (v16u8)v16u16_1;
+}
-- 
2.31.1



Re: [PATCH v2] RISC-V: Implement TLS Descriptors.

2023-11-15 Thread Tatsuyuki Ishi
> On Nov 16, 2023, at 14:33, Fangrui Song  wrote:
> 
> On Wed, Nov 15, 2023 at 9:23 PM Jeff Law  wrote:
>> 
>> 
>> 
>> On 11/15/23 18:51, Tatsuyuki Ishi wrote:
 On Nov 16, 2023, at 10:07, Jeff Law  wrote:
>> 
>>> 
>>> Based on what I have read in the AArch64 backend, there are two ways to
>>> do this: introduce a custom calling convention, or put in a RTX insn
>>> that covers the whole sequence. Ideally we should do the first, but then
>>> there’s the label issue and it’s quite a bit more complicated. So I’m
>>> sticking with this for now.
>> As I said, I think we're OK here.  We can always revamp as we get
>> experience with the implementation -- I don't think any of the stuff
>> we're talking about is an ABI change, they're just implementation details.
>> 
>>> 
>>> Sorry for all the delay on this. My progress has been (and still)
>>> blocked on supporting relaxation of TLSDESC in binutils (turns out you
>>> can’t run static binaries without relaxing it first). But that doesn’t
>>> seem exactly easy to do either, because relaxation that involves GOT
>>> elimination isn’t something we have in the RISC-V backend.
>> Note that binutils is due for another release in the next month or two.
>> It'd certainly be helpful to have any issues there resolved in time for
>> that release.
>> 
>>> 
>>> I’ll try to send a new version of this patch and get this unblocked on
>>> GCC side first.
>> Sounds good.  We can always guard its use behind a feature test for GAS
>> support.
>> 
>> Jeff
> 
> Agreed.
> 
> 
> Tatsuyuki, could you also add some tests? For example
> 
> // end of https://maskray.me/blog/2021-02-14-all-about-thread-local-storage
> __thread int tls0;
> extern __thread int tls1;
> int foo() { return ++tls0 + ++tls1; }
> static __thread int tls2, tls3;
> int bar() { return ++tls2 + ++tls3; }
> 
> I have used this to check rtld and linker behavior. I think we need
> some `scan-assembler`.
> To make it a runnable test, some assembler feature check may be
> needed. Perhaps Jeff can make some suggestion or contribute code!
> 

I believe there’s existing platform-generic TLS coverage in 
gcc/testsuite/gcc.dg/torture/tls. GCC's test suite seems pretty sparse, but a 
lot more testing is done by glibc’s testsuite (which is also where I found the 
static TLS relaxation issue).

Tatsuyuki.

> 
> -- 
> 宋方睿



Re: [PATCH v2] RISC-V: Implement TLS Descriptors.

2023-11-15 Thread Jeff Law




On 11/15/23 22:33, Fangrui Song wrote:



I have used this to check rtld and linker behavior. I think we need
some `scan-assembler`.
To make it a runnable test, some assembler feature check may be
needed. Perhaps Jeff can make some suggestion or contribute code!
TLS isn't really on my radar yet.  I've got about a million things to do 
on the scalar and vector optimization fronts.  The TLS stuff is just one 
of the lingering items from our Tuesday patchwork sync meetings -- 
getting it wrapped up gets it off the list of things to look at every 
week ;-)



jeff


Re: [PATCH v2] RISC-V: Implement TLS Descriptors.

2023-11-15 Thread Fangrui Song
On Wed, Nov 15, 2023 at 9:23 PM Jeff Law  wrote:
>
>
>
> On 11/15/23 18:51, Tatsuyuki Ishi wrote:
> >> On Nov 16, 2023, at 10:07, Jeff Law  wrote:
>
> >
> > Based on what I have read in the AArch64 backend, there are two ways to
> > do this: introduce a custom calling convention, or put in a RTX insn
> > that covers the whole sequence. Ideally we should do the first, but then
> > there’s the label issue and it’s quite a bit more complicated. So I’m
> > sticking with this for now.
> As I said, I think we're OK here.  We can always revamp as we get
> experience with the implementation -- I don't think any of the stuff
> we're talking about is an ABI change, they're just implementation details.
>
> >
> > Sorry for all the delay on this. My progress has been (and still)
> > blocked on supporting relaxation of TLSDESC in binutils (turns out you
> > can’t run static binaries without relaxing it first). But that doesn’t
> > seem exactly easy to do either, because relaxation that involves GOT
> > elimination isn’t something we have in the RISC-V backend.
> Note that binutils is due for another release in the next month or two.
> It'd certainly be helpful to have any issues there resolved in time for
> that release.
>
> >
> > I’ll try to send a new version of this patch and get this unblocked on
> > GCC side first.
> Sounds good.  We can always guard its use behind a feature test for GAS
> support.
>
> Jeff

Agreed.


Tatsuyuki, could you also add some tests? For example

// end of https://maskray.me/blog/2021-02-14-all-about-thread-local-storage
__thread int tls0;
extern __thread int tls1;
int foo() { return ++tls0 + ++tls1; }
static __thread int tls2, tls3;
int bar() { return ++tls2 + ++tls3; }

I have used this to check rtld and linker behavior. I think we need
some `scan-assembler`.
To make it a runnable test, some assembler feature check may be
needed. Perhaps Jeff can make some suggestion or contribute code!




-- 
宋方睿


Re: [PATCH v2] RISC-V: Implement TLS Descriptors.

2023-11-15 Thread Jeff Law




On 11/15/23 18:51, Tatsuyuki Ishi wrote:

On Nov 16, 2023, at 10:07, Jeff Law  wrote:




Based on what I have read in the AArch64 backend, there are two ways to 
do this: introduce a custom calling convention, or put in a RTX insn 
that covers the whole sequence. Ideally we should do the first, but then 
there’s the label issue and it’s quite a bit more complicated. So I’m 
sticking with this for now.
As I said, I think we're OK here.  We can always revamp as we get 
experience with the implementation -- I don't think any of the stuff 
we're talking about is an ABI change, they're just implementation details.




Sorry for all the delay on this. My progress has been (and still) 
blocked on supporting relaxation of TLSDESC in binutils (turns out you 
can’t run static binaries without relaxing it first). But that doesn’t 
seem exactly easy to do either, because relaxation that involves GOT 
elimination isn’t something we have in the RISC-V backend.
Note that binutils is due for another release in the next month or two. 
It'd certainly be helpful to have any issues there resolved in time for 
that release.




I’ll try to send a new version of this patch and get this unblocked on 
GCC side first.
Sounds good.  We can always guard its use behind a feature test for GAS 
support.


Jeff


Re: [PATCH v2] RISC-V: Implement TLS Descriptors.

2023-11-15 Thread Jeff Law




On 11/15/23 18:39, Tatsuyuki Ishi wrote:



As mentioned in the commit message, the use of relaxation-only labels 
does not seem well supported in current GCC. Creating a label seems to 
force a basic block and I’m not sure how we can avoid it.


If there’s a better way to implement this I’m happy to adopt.
In general, yes creating a label in the IL is going to create a new 
block.  But you could emit the label as part of the auipc so that it 
doesn't really show up in the IL.  Then you have to make sure the other 
insns reference the right label, which is certainly do-able.   THere's 
also linker relaxing to worry about


jeff


Re: [PATCH v2] RISC-V: Implement TLS Descriptors.

2023-11-15 Thread Jeff Law




On 11/15/23 18:17, Fangrui Song wrote:



It seems that x86-64 supports non-adjacent code sequence. Writing the
pattern this way does not allow interleaving, but I assume
interleaving doesn't enable much.
It's of marginal benefit.  We could always split them before scheduling 
if it turned out to matter -- but given nobody's using this stuff yet, 
my inclination is to wait on that kind of micro-optimization.


jeff


Re: [PATCH] Reduce false positives for -Wnonnull for VLA parameters [PR98541]

2023-11-15 Thread Hans-Peter Nilsson
> From: Martin Uecker 
> Date: Tue, 07 Nov 2023 06:56:25 +0100

> Am Montag, dem 06.11.2023 um 21:01 -0700 schrieb Jeff Law:
> > 
> > On 11/6/23 20:58, Hans-Peter Nilsson wrote:
> > > This patch caused a testsuite regression: there's now an
> > > "excess error" failure for gcc.dg/Wnonnull-4.c for 32-bit
> > > targets (and 64-bit targets testing with a "-m32" option)
> > > after your r14-5115-g6e9ee44d96e5.  It's logged as PR112419.
> > It caused failures for just about every target ;(  Presumably it worked 
> > on x86_64...
> 
> I do not think this is a true regression
> just a problem with the test on 32-bit which somehow surfaced
> due to the change.
> 
> The excess error is:
> 
> FAIL: gcc.dg/Wnonnull-4.c (test for excess errors)
> Excess errors:
> /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.dg/Wnonnull-4.c:144:3:
>  warning: 'fda_n_5' specified size 4294967256 exceeds maximum object size
> 2147483647 [-Wstringop-overflow=]
> 
> I think the warning was suppressed before due to the other (nonnull)
> warning which I removed in this case.
> 
> I think the simple fix might be to to turn off -Wstringop-overflow.

No, that trigs many of the dg-warnings that are tested.

(I didn't pay attention to the actual warning messages and
tried to pursue that at first.)

Maybe think it's best to actually expect the warning, like
so.

Maintainers of 16-bit targets will have to address their
concerns separately.  For example, they may choose to not
run the test at all.

Ok to commit?

Subject: [PATCH] gcc.dg/Wnonnull-4.c: Handle new overflow warning for 32-bit 
targets [PR112419]

PR testsuite/112419
* gcc.dg/Wnonnull-4.c (test_fda_n_5): Expect warning for exceeding
maximum object size for 32-bit targets.
---
 gcc/testsuite/gcc.dg/Wnonnull-4.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/Wnonnull-4.c 
b/gcc/testsuite/gcc.dg/Wnonnull-4.c
index 1f14fbba45df..d63e76da70a2 100644
--- a/gcc/testsuite/gcc.dg/Wnonnull-4.c
+++ b/gcc/testsuite/gcc.dg/Wnonnull-4.c
@@ -142,6 +142,7 @@ void test_fda_n_5 (int r_m1)
   T (  1);  // { dg-bogus "argument 2 of variable length array 
'double\\\[n]\\\[5]' is null but the corresponding bound argument 1 value is 1" 
}
   T (  9);  // { dg-bogus "argument 2 of variable length array 
'double\\\[n]\\\[5]' is null but the corresponding bound argument 1 value is 9" 
}
   T (max);  // { dg-bogus "argument 2 of variable length array 
'double\\\[n]\\\[5]' is null but the corresponding bound argument 1 value is 
\\d+" }
+// { dg-warning "size 4294967256 exceeds maximum object size" "" { target 
ilp32 } .-1 }
 }
 
 
-- 
2.30.2



[PATCH] VECT: Clear LOOP_VINFO_USING_SELECT_VL_P when loop is not partial vectorized

2023-11-15 Thread Juzhe-Zhong
This patch fixes ICE:
https://godbolt.org/z/z8T6o6qov

: In function 'b':
:2:6: error: missing definition
2 | void b() {
  |  ^
for SSA_NAME: loop_len_8 in statement:
_1 = -loop_len_8;
during GIMPLE pass: vect
:2:6: internal compiler error: verify_ssa failed
0x7f1b56331082 __libc_start_main
???:0
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
Compiler returned: 1

The root cause is we generate such IR in vectorization:

  _1 = -loop_len_8;
  vect_cst__11 = {_1, _1};
  _18 = vect_vec_iv_.6_14 + vect_cst__11;

loop_len_8 is uninitialized value.

The IR _18 = vect_vec_iv_.6_14 + vect_cst__11; is generated because of we are 
adding induction variable with
the result of SELECT_VL instead of VF.

The code is:

  else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
{
  /* When we're using loop_len produced by SELEC_VL, the non-final
 iterations are not always processing VF elements.  So vectorize
 induction variable instead of

   _21 = vect_vec_iv_.6_22 + { VF, ... };

 We should generate:

   _35 = .SELECT_VL (ivtmp_33, VF);
   vect_cst__22 = [vec_duplicate_expr] _35;
   _21 = vect_vec_iv_.6_22 + vect_cst__22;  */
  gcc_assert (!slp_node);
  gimple_seq seq = NULL;
  vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
  tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0);
  expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
 unshare_expr (len)),
   , true, NULL_TREE);
  new_name = gimple_build (, MULT_EXPR, TREE_TYPE (step_expr), expr,
   step_expr);
  gsi_insert_seq_before (, seq, GSI_SAME_STMT);
  step_iv_si = 
}

LOOP_VINFO_USING_SELECT_VL_P is set before loop vectorization analysis so we 
don't know whether it is partial
vectorization or not but the induction variable depends on SELECT_VL_P is true.

So update SELECT_VL_P as false when it is not partial vectorization.

PR middle-end/112554

gcc/ChangeLog:

* tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling): Clear 
SELECT_VL_P for non-partial vectorization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112554.c: New test.

---
 .../gcc.target/riscv/rvv/autovec/pr112554.c | 12 
 gcc/tree-vect-loop.cc   | 13 +
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112554.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112554.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112554.c
new file mode 100644
index 000..4afa7c2b15c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112554.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+int a;
+void b() {
+  unsigned long c = 18446744073709551612UL;
+d:
+  --c;
+  a ^= c;
+  if (c)
+goto d;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index fb8d999ee6b..3f59139cb01 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2657,6 +2657,19 @@ vect_determine_partial_vectors_and_peeling 
(loop_vec_info loop_vinfo)
 = (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
&& need_peeling_or_partial_vectors_p);
 
+  /* We set LOOP_VINFO_USING_SELECT_VL_P as true before loop vectorization
+ analysis that we don't know whether the loop is vectorized by partial
+ vectors (More details see tree-vect-loop-manip.cc).
+
+ However, SELECT_VL vectorizaton style should only applied on partial
+ vectorization since SELECT_VL is the GIMPLE IR that calculates the
+ number of elements to be process for each iteration.
+
+ After loop vectorization analysis, Clear LOOP_VINFO_USING_SELECT_VL_P
+ if it is not partial vectorized loop.  */
+  if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
+LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = false;
+
   return opt_result::success ();
 }
 
-- 
2.36.3



Re: [PATCH] [i386] APX: Fix EGPR usage in several patterns.

2023-11-15 Thread Hongtao Liu
On Wed, Nov 15, 2023 at 5:43 PM Hongyu Wang  wrote:
>
> Hi,
>
> For vextract/insert{if}128 they cannot adopt EGPR in their memory operand, all
> related pattern should be adjusted to disable EGPR usage on them.
> Also fix a wrong gpr16 attr for insertps.
>
> Bootstrapped/regtested on x86-64-pc-linux-gnu{-m32,}
>
> Ok for master?
Ok.
>
> gcc/ChangeLog:
>
> * config/i386/sse.md (vec_extract_hi_): Add noavx512vl
> alternative with attr addr gpr16 and "jm" constraint.
> (vec_extract_hi_): Likewise for SF vector modes.
> (@vec_extract_hi_): Likewise.
> (*vec_extractv2ti): Likewise.
> (vec_set_hi_): Likewise.
> * config/i386/mmx.md (@sse4_1_insertps_): Correct gpr16 attr for
> each alternative.
> ---
>  gcc/config/i386/mmx.md |  2 +-
>  gcc/config/i386/sse.md | 32 
>  2 files changed, 21 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index a3d08bb9d3b..355538749d1 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -1215,7 +1215,7 @@ (define_insn "@sse4_1_insertps_"
>  }
>  }
>[(set_attr "isa" "noavx,noavx,avx")
> -   (set_attr "addr" "*,*,gpr16")
> +   (set_attr "addr" "gpr16,gpr16,*")
> (set_attr "type" "sselog")
> (set_attr "prefix_data16" "1,1,*")
> (set_attr "prefix_extra" "1")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index c502582102e..472c2190f89 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -12049,9 +12049,9 @@ (define_insn "vec_extract_hi__mask"
> (set_attr "mode" "")])
>
>  (define_insn "vec_extract_hi_"
> -  [(set (match_operand: 0 "nonimmediate_operand" "=vm")
> +  [(set (match_operand: 0 "nonimmediate_operand" "=xjm,vm")
> (vec_select:
> - (match_operand:VI8F_256 1 "register_operand" "v")
> + (match_operand:VI8F_256 1 "register_operand" "x,v")
>   (parallel [(const_int 2) (const_int 3)])))]
>"TARGET_AVX"
>  {
> @@ -12065,7 +12065,9 @@ (define_insn "vec_extract_hi_"
>else
>  return "vextract\t{$0x1, %1, %0|%0, %1, 0x1}";
>  }
> -  [(set_attr "type" "sselog1")
> +  [(set_attr "isa" "noavx512vl,avx512vl")
> +   (set_attr "addr" "gpr16,*")
> +   (set_attr "type" "sselog1")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> (set_attr "prefix" "vex")
> @@ -12132,7 +12134,7 @@ (define_insn "vec_extract_hi__mask"
> (set_attr "mode" "")])
>
>  (define_insn "vec_extract_hi_"
> -  [(set (match_operand: 0 "nonimmediate_operand" "=xm, vm")
> +  [(set (match_operand: 0 "nonimmediate_operand" "=xjm, vm")
> (vec_select:
>   (match_operand:VI4F_256 1 "register_operand" "x, v")
>   (parallel [(const_int 4) (const_int 5)
> @@ -12141,7 +12143,8 @@ (define_insn "vec_extract_hi_"
>"@
>  vextract\t{$0x1, %1, %0|%0, %1, 0x1}
>  vextract32x4\t{$0x1, %1, %0|%0, %1, 0x1}"
> -  [(set_attr "isa" "*, avx512vl")
> +  [(set_attr "isa" "noavx512vl, avx512vl")
> +   (set_attr "addr" "gpr16,*")
> (set_attr "prefix" "vex, evex")
> (set_attr "type" "sselog1")
> (set_attr "length_immediate" "1")
> @@ -1,7 +12225,7 @@ (define_insn_and_split "@vec_extract_lo_"
>"operands[1] = gen_lowpart (mode, operands[1]);")
>
>  (define_insn "@vec_extract_hi_"
> -  [(set (match_operand: 0 "nonimmediate_operand" "=xm,vm")
> +  [(set (match_operand: 0 "nonimmediate_operand" "=xjm,vm")
> (vec_select:
>   (match_operand:V16_256 1 "register_operand" "x,v")
>   (parallel [(const_int 8) (const_int 9)
> @@ -12236,7 +12239,8 @@ (define_insn "@vec_extract_hi_"
>[(set_attr "type" "sselog1")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> -   (set_attr "isa" "*,avx512vl")
> +   (set_attr "isa" "noavx512vl,avx512vl")
> +   (set_attr "addr" "gpr16,*")
> (set_attr "prefix" "vex,evex")
> (set_attr "mode" "OI")])
>
> @@ -20465,7 +20469,7 @@ (define_split
>  })
>
>  (define_insn "*vec_extractv2ti"
> -  [(set (match_operand:TI 0 "nonimmediate_operand" "=xm,vm")
> +  [(set (match_operand:TI 0 "nonimmediate_operand" "=xjm,vm")
> (vec_select:TI
>   (match_operand:V2TI 1 "register_operand" "x,v")
>   (parallel
> @@ -20477,6 +20481,8 @@ (define_insn "*vec_extractv2ti"
>[(set_attr "type" "sselog")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> +   (set_attr "isa" "noavx512vl,avx512vl")
> +   (set_attr "addr" "gpr16,*")
> (set_attr "prefix" "vex,evex")
> (set_attr "mode" "OI")])
>
> @@ -27556,12 +27562,12 @@ (define_insn "vec_set_lo_"
> (set_attr "mode" "")])
>
>  (define_insn "vec_set_hi_"
> -  [(set (match_operand:VI8F_256 0 "register_operand" "=v")
> +  [(set (match_operand:VI8F_256 0 "register_operand" "=x,v")
> (vec_concat:VI8F_256
>   (vec_select:
> -   (match_operand:VI8F_256 1 "register_operand" "v")
> +   (match_operand:VI8F_256 

Re: [PATCH v2] RISC-V: Implement TLS Descriptors.

2023-11-15 Thread Tatsuyuki Ishi
> On Nov 16, 2023, at 10:07, Jeff Law  wrote:
> 
> 
> 
> On 9/8/23 04:49, Tatsuyuki Ishi via Gcc-patches wrote:
>> This implements TLS Descriptors (TLSDESC) as specified in [1].
>> In TLSDESC instruction sequence, the first instruction relocates against
>> the target TLS variable, while subsequent instructions relocates against
>> the address of the first. Such usage of labels are not well-supported
>> within GCC. Due to this, the 4-instruction sequence is implemented as a
>> single RTX insn.
>> The default remains to be the traditional TLS model, but can be configured
>> with --with_tls={trad,desc}. The choice can be revisited once toolchain
>> and libc support ships.
>> [1]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373.
>> gcc/Changelog:
>> * config/riscv/riscv.opt: Add -mtls-dialect to configure TLS flavor.
>> * config.gcc: Add --with_tls configuration option to change the default
>> TLS flavor.
>> * config/riscv/riscv.h: Add TARGET_TLSDESC determined from
>> -mtls-dialect and with_tls defaults.
>> * config/riscv/riscv-opts.h: Define enum riscv_tls_type for the two TLS
>> flavors.
>> * config/riscv/riscv-protos.h: Define SYMBOL_TLSDESC symbol type.
>> * config/riscv/riscv.md: Add instruction sequence for TLSDESC.
>> * config/riscv/riscv.cc (riscv_symbol_insns): Add instruction sequence
>> length data for TLSDESC.
>> (riscv_legitimize_tls_address): Add lowering of TLSDESC.
>> ---
> 
>> @@ -4694,6 +4696,17 @@ case "${target}" in
>>  ;;
>>  esac
>>  fi
>> +# Handle --with-tls.
>> +case "$with_tls" in
>> +"" \
>> +| trad | desc)
>> +# OK
>> +;;
>> +*)
>> +echo "Unknown TLS method used in --with-tls=$with_tls" 1>&2
>> +exit 1
>> +;;
>> +esac
> Is there a reason why this isn't formatted like the other cases?

Sorry, this was an oversight. I’ll fix it in the next version.

> 
>> @@ -1869,6 +1870,24 @@
>>[(set_attr "got" "load")
>> (set_attr "mode" "")])
>>  +(define_insn "@tlsdesc"
>> +  [(set (reg:P A0_REGNUM)
>> +(unspec:P
>> +[(match_operand:P 0 "symbolic_operand" "")
>> + (match_operand:P 1 "const_int_operand")]
>> +UNSPEC_TLSDESC))
>> +   (clobber (reg:SI T0_REGNUM))]
>> +  "TARGET_TLSDESC"
>> +  {
>> +return ".LT%1: auipc\ta0, %%tlsdesc_hi(%0)\;"
>> +   "\tt0,%%tlsdesc_load_lo(.LT%1)(a0)\;"
>> +   "addi\ta0,a0,%%tlsdesc_add_lo(.LT%1)\;"
>> +   "jalr\tt0,t0,%%tlsdesc_call(.LT%1)";
>> +  }
>> +  [(set_attr "type" "multi")
>> +   (set_attr "length" "16")
>> +   (set_attr "mode" "")])
> Hmm, I would be a bit worried about explicitly using $a0 here.  That's 
> generally frowned upon, but probably unavoidable in this case since this is a 
> call under the hood.

Based on what I have read in the AArch64 backend, there are two ways to do 
this: introduce a custom calling convention, or put in a RTX insn that covers 
the whole sequence. Ideally we should do the first, but then there’s the label 
issue and it’s quite a bit more complicated. So I’m sticking with this for now.

> This needs changes to invoke.texi since it introduces new options.  I don't 
> think it has to be anything terribly verbose.  A one liner is probably 
> sufficient and I wouldn't be surprised if other ports have suitable text we 
> could copy.

Ack.

> So overall if Kito's OK, then I am with the trivial doc change and perhaps 
> the formatting fix in config.guess.

Sorry for all the delay on this. My progress has been (and still) blocked on 
supporting relaxation of TLSDESC in binutils (turns out you can’t run static 
binaries without relaxing it first). But that doesn’t seem exactly easy to do 
either, because relaxation that involves GOT elimination isn’t something we 
have in the RISC-V backend.

I’ll try to send a new version of this patch and get this unblocked on GCC side 
first.
Presumably this still needs the associated gas and ld support in place, so let 
me know if you want to merge this soon. I will ask on binutils for whether they 
could accept the basic part of the implementation without relaxation first.

Thanks,
Tatsuyuki. 

> jeff



Re: [PATCH v2] RISC-V: Implement TLS Descriptors.

2023-11-15 Thread Tatsuyuki Ishi
> On Nov 16, 2023, at 10:17, Fangrui Song  wrote:
> 
> On Mon, Oct 2, 2023 at 7:10 AM Kito Cheng  > wrote:
>> 
>> Just one nit and one more comment for doc:
>> 
>> Could you add some doc something like that? mostly I grab from other
>> target, so you can just included in the patch.
>> 
>> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
>> index 31f2234640f..39396668da2 100644
>> --- a/gcc/doc/install.texi
>> +++ b/gcc/doc/install.texi
>> @@ -1174,6 +1174,9 @@ Specify the default TLS dialect, for systems
>> were there is a choice.
>> For ARM targets, possible values for @var{dialect} are @code{gnu} or
>> @code{gnu2}, which select between the original GNU dialect and the GNU TLS
>> descriptor-based dialect.
>> +For RISC-V targets, possible values for @var{dialect} are @code{trad} or
>> +@code{desc}, which select between the traditional GNU dialect and the GNU 
>> TLS
>> +descriptor-based dialect.
>> 
>> @item --enable-multiarch
>> Specify whether to enable or disable multiarch support.  The default is
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 4085fc90907..459e266d426 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -1239,7 +1239,8 @@ See RS/6000 and PowerPC Options.
>> -minline-atomics  -mno-inline-atomics
>> -minline-strlen  -mno-inline-strlen
>> -minline-strcmp  -mno-inline-strcmp
>> --minline-strncmp  -mno-inline-strncmp}
>> +-minline-strncmp  -mno-inline-strncmp
>> +-mtls-dialect=desc  -mtls-dialect=trad}
>> 
>> @emph{RL78 Options}
>> @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
>> @@ -29538,6 +29539,17 @@ which register to use as base register for
>> reading the canary,
>> and from what offset from that base register. There is no default
>> register or offset as this is entirely for use within the Linux
>> kernel.
>> +
>> +@opindex mtls-dialect=desc
>> +@item -mtls-dialect=desc
>> +Use TLS descriptors as the thread-local storage mechanism for dynamic 
>> accesses
>> +of TLS variables.  This is the default.
>> +
>> +@opindex mtls-dialect=trad
>> +@item -mtls-dialect=traditional
> 
> -mtls-dialect=trad.
> aarch64-linux-gnu-gcc doesn't support -mtls-dialect=traditional
> 
>> +Use traditional TLS as the thread-local storage mechanism for dynamic 
>> accesses
>> +of TLS variables.
>> +
>> @end table
> 
> This is the default :)
> 
> I am happy that we change the default like AArch64, but probably not
> now when linker support is not widely available yet.
> 
> I cannot comment on the code side as I am not familiar with GCC internals.
> 
>> @node RL78 Options
>> 
>> 
>> 
>> 
>>> +(define_insn "@tlsdesc"
>>> +  [(set (reg:P A0_REGNUM)
>>> +   (unspec:P
>>> +   [(match_operand:P 0 "symbolic_operand" "")
>>> +(match_operand:P 1 "const_int_operand")]
>>> +   UNSPEC_TLSDESC))
>>> +   (clobber (reg:SI T0_REGNUM))]
>> 
>> P rather than SI here.
>> 
>>> +  "TARGET_TLSDESC"
>>> +  {
>>> +return ".LT%1: auipc\ta0, %%tlsdesc_hi(%0)\;"
>>> +   "\tt0,%%tlsdesc_load_lo(.LT%1)(a0)\;"
>>> +   "addi\ta0,a0,%%tlsdesc_add_lo(.LT%1)\;"
>>> +   "jalr\tt0,t0,%%tlsdesc_call(.LT%1)";
>>> +  }
>>> +  [(set_attr "type" "multi")
>>> +   (set_attr "length" "16")
>>> +   (set_attr "mode" "")])
>>> +
>>> (define_insn "auipc"
>>>   [(set (match_operand:P   0 "register_operand" "=r")
>>>(unspec:P
> 
> It seems that x86-64 supports non-adjacent code sequence. Writing the
> pattern this way does not allow interleaving, but I assume
> interleaving doesn't enable much.
> https://reviews.llvm.org/D114416

As mentioned in the commit message, the use of relaxation-only labels does not 
seem well supported in current GCC. Creating a label seems to force a basic 
block and I’m not sure how we can avoid it.

If there’s a better way to implement this I’m happy to adopt.

Tatsuyuki.

> 
> -- 
> 宋方睿



Re: [PATCH v2] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-15 Thread chenglulu



在 2023/11/15 下午7:38, Xi Ruoyao 写道:

Pushed r14-5486.

/* snip */


* gcc.target/loongarch/cas-acquire.c: New test.

This test fails with GCC 12/13 on LA664, and it indicates a correctness
issue.  May I backport this patch to 12/13 as well?


I think we can backport.

Thanks!



Re: [PATCH v2] RISC-V: Implement TLS Descriptors.

2023-11-15 Thread Fangrui Song
On Mon, Oct 2, 2023 at 7:10 AM Kito Cheng  wrote:
>
> Just one nit and one more comment for doc:
>
> Could you add some doc something like that? mostly I grab from other
> target, so you can just included in the patch.
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 31f2234640f..39396668da2 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -1174,6 +1174,9 @@ Specify the default TLS dialect, for systems
> were there is a choice.
> For ARM targets, possible values for @var{dialect} are @code{gnu} or
> @code{gnu2}, which select between the original GNU dialect and the GNU TLS
> descriptor-based dialect.
> +For RISC-V targets, possible values for @var{dialect} are @code{trad} or
> +@code{desc}, which select between the traditional GNU dialect and the GNU TLS
> +descriptor-based dialect.
>
> @item --enable-multiarch
> Specify whether to enable or disable multiarch support.  The default is
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 4085fc90907..459e266d426 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1239,7 +1239,8 @@ See RS/6000 and PowerPC Options.
> -minline-atomics  -mno-inline-atomics
> -minline-strlen  -mno-inline-strlen
> -minline-strcmp  -mno-inline-strcmp
> --minline-strncmp  -mno-inline-strncmp}
> +-minline-strncmp  -mno-inline-strncmp
> +-mtls-dialect=desc  -mtls-dialect=trad}
>
> @emph{RL78 Options}
> @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
> @@ -29538,6 +29539,17 @@ which register to use as base register for
> reading the canary,
> and from what offset from that base register. There is no default
> register or offset as this is entirely for use within the Linux
> kernel.
> +
> +@opindex mtls-dialect=desc
> +@item -mtls-dialect=desc
> +Use TLS descriptors as the thread-local storage mechanism for dynamic 
> accesses
> +of TLS variables.  This is the default.
> +
> +@opindex mtls-dialect=trad
> +@item -mtls-dialect=traditional

-mtls-dialect=trad.
aarch64-linux-gnu-gcc doesn't support -mtls-dialect=traditional

> +Use traditional TLS as the thread-local storage mechanism for dynamic 
> accesses
> +of TLS variables.
> +
> @end table

This is the default :)

I am happy that we change the default like AArch64, but probably not
now when linker support is not widely available yet.

I cannot comment on the code side as I am not familiar with GCC internals.

> @node RL78 Options
>
>
>
>
> > +(define_insn "@tlsdesc"
> > +  [(set (reg:P A0_REGNUM)
> > +   (unspec:P
> > +   [(match_operand:P 0 "symbolic_operand" "")
> > +(match_operand:P 1 "const_int_operand")]
> > +   UNSPEC_TLSDESC))
> > +   (clobber (reg:SI T0_REGNUM))]
>
> P rather than SI here.
>
> > +  "TARGET_TLSDESC"
> > +  {
> > +return ".LT%1: auipc\ta0, %%tlsdesc_hi(%0)\;"
> > +   "\tt0,%%tlsdesc_load_lo(.LT%1)(a0)\;"
> > +   "addi\ta0,a0,%%tlsdesc_add_lo(.LT%1)\;"
> > +   "jalr\tt0,t0,%%tlsdesc_call(.LT%1)";
> > +  }
> > +  [(set_attr "type" "multi")
> > +   (set_attr "length" "16")
> > +   (set_attr "mode" "")])
> > +
> >  (define_insn "auipc"
> >[(set (match_operand:P   0 "register_operand" "=r")
> > (unspec:P

It seems that x86-64 supports non-adjacent code sequence. Writing the
pattern this way does not allow interleaving, but I assume
interleaving doesn't enable much.
https://reviews.llvm.org/D114416


-- 
宋方睿


Re: [PATCH v2] RISC-V: Implement TLS Descriptors.

2023-11-15 Thread Jeff Law




On 9/8/23 04:49, Tatsuyuki Ishi via Gcc-patches wrote:

This implements TLS Descriptors (TLSDESC) as specified in [1].

In TLSDESC instruction sequence, the first instruction relocates against
the target TLS variable, while subsequent instructions relocates against
the address of the first. Such usage of labels are not well-supported
within GCC. Due to this, the 4-instruction sequence is implemented as a
single RTX insn.

The default remains to be the traditional TLS model, but can be configured
with --with_tls={trad,desc}. The choice can be revisited once toolchain
and libc support ships.

[1]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373.

gcc/Changelog:
 * config/riscv/riscv.opt: Add -mtls-dialect to configure TLS flavor.
 * config.gcc: Add --with_tls configuration option to change the default
 TLS flavor.
 * config/riscv/riscv.h: Add TARGET_TLSDESC determined from
 -mtls-dialect and with_tls defaults.
 * config/riscv/riscv-opts.h: Define enum riscv_tls_type for the two TLS
 flavors.
 * config/riscv/riscv-protos.h: Define SYMBOL_TLSDESC symbol type.
 * config/riscv/riscv.md: Add instruction sequence for TLSDESC.
 * config/riscv/riscv.cc (riscv_symbol_insns): Add instruction sequence
 length data for TLSDESC.
 (riscv_legitimize_tls_address): Add lowering of TLSDESC.
---



@@ -4694,6 +4696,17 @@ case "${target}" in
;;
esac
fi
+   # Handle --with-tls.
+   case "$with_tls" in
+"" \
+| trad | desc)
+# OK
+;;
+*)
+echo "Unknown TLS method used in --with-tls=$with_tls" 1>&2
+exit 1
+;;
+esac

Is there a reason why this isn't formatted like the other cases?





@@ -1869,6 +1870,24 @@
[(set_attr "got" "load")
 (set_attr "mode" "")])
  
+(define_insn "@tlsdesc"

+  [(set (reg:P A0_REGNUM)
+   (unspec:P
+   [(match_operand:P 0 "symbolic_operand" "")
+(match_operand:P 1 "const_int_operand")]
+   UNSPEC_TLSDESC))
+   (clobber (reg:SI T0_REGNUM))]
+  "TARGET_TLSDESC"
+  {
+return ".LT%1: auipc\ta0, %%tlsdesc_hi(%0)\;"
+   "\tt0,%%tlsdesc_load_lo(.LT%1)(a0)\;"
+   "addi\ta0,a0,%%tlsdesc_add_lo(.LT%1)\;"
+   "jalr\tt0,t0,%%tlsdesc_call(.LT%1)";
+  }
+  [(set_attr "type" "multi")
+   (set_attr "length" "16")
+   (set_attr "mode" "")])
Hmm, I would be a bit worried about explicitly using $a0 here.  That's 
generally frowned upon, but probably unavoidable in this case since this 
is a call under the hood.



This needs changes to invoke.texi since it introduces new options.  I 
don't think it has to be anything terribly verbose.  A one liner is 
probably sufficient and I wouldn't be surprised if other ports have 
suitable text we could copy.


So overall if Kito's OK, then I am with the trivial doc change and 
perhaps the formatting fix in config.guess.


jeff


[PING] [PATCH] gfortran: Rely on dg-do-what-default to avoid running pr85853.f90, pr107254.f90 and vect-alias-check-1.F90 on non-vector targets

2023-11-15 Thread Patrick O'Neill

Ping.

Testsuite fixup similar to:
https://inbox.sourceware.org/gcc-patches/974e9e5e-8f07-46dd-b9b9-db8aa4685...@gmail.com/T/#t
https://inbox.sourceware.org/gcc-patches/7e78cd70-70c9-41b1-8a98-6977a1034...@rivosinc.com/T/#t

Patrick

On Thu, Nov 2, 2023 at 12:09 PM Patrick O'Neill  
wrote:


Testcases in gfortran.dg/vect/vect.exp rely on
check_vect_support_and_set_flags to set dg-do-what-default and avoid
running vector tests on non-vector targets. The three testcases in this
patch overwrite the default with dg-do run which causes issues
for non-vector targets.

Removing the dg-do run directive resolves this issue for non-vector
targets (while still running the tests on vector targets).

gcc/testsuite/ChangeLog:

* gfortran.dg/vect/pr107254.f90: Remove dg-do run directive.
* gfortran.dg/vect/pr85853.f90: Ditto.
* gfortran.dg/vect/vect-alias-check-1.F90: Ditto.

Signed-off-by: Patrick O'Neill 
---
Tested using rv64gc & rv64gcv to make sure the testcases compile/run
as expected.

These files haven't been changed in a long time so I'm not sure why (or
if) this hasn't been run into by other people before.
---
 gcc/testsuite/gfortran.dg/vect/pr107254.f90   | 2 --
 gcc/testsuite/gfortran.dg/vect/pr85853.f90| 1 -
 gcc/testsuite/gfortran.dg/vect/vect-alias-check-1.F90 | 1 -
 3 files changed, 4 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/vect/pr107254.f90 
b/gcc/testsuite/gfortran.dg/vect/pr107254.f90
index 85bcb5f3fa2..adce6bedc30 100644
--- a/gcc/testsuite/gfortran.dg/vect/pr107254.f90
+++ b/gcc/testsuite/gfortran.dg/vect/pr107254.f90
@@ -1,5 +1,3 @@
-! { dg-do run }
-
 subroutine dlartg( f, g, s, r )
   implicit none
   double precision :: f, g, r, s
diff --git a/gcc/testsuite/gfortran.dg/vect/pr85853.f90 
b/gcc/testsuite/gfortran.dg/vect/pr85853.f90
index 68f4a004324..4c0e3b81a09 100644
--- a/gcc/testsuite/gfortran.dg/vect/pr85853.f90
+++ b/gcc/testsuite/gfortran.dg/vect/pr85853.f90
@@ -1,5 +1,4 @@
 ! Taken from execute/where_2.f90, but with special flags.
-! { dg-do run }
 ! { dg-additional-options "-fno-tree-loop-vectorize" }

 ! Program to test the WHERE constructs
diff --git a/gcc/testsuite/gfortran.dg/vect/vect-alias-check-1.F90 
b/gcc/testsuite/gfortran.dg/vect/vect-alias-check-1.F90
index 3014ff9f3b6..85ae9b151e3 100644
--- a/gcc/testsuite/gfortran.dg/vect/vect-alias-check-1.F90
+++ b/gcc/testsuite/gfortran.dg/vect/vect-alias-check-1.F90
@@ -1,4 +1,3 @@
-! { dg-do run }
 ! { dg-additional-options "-fno-inline" }

 #define N 200
--
2.34.1



Re: [PATCH v2] RISC-V: Implement target attribute

2023-11-15 Thread Christoph Müllner
On Tue, Nov 14, 2023 at 3:15 PM Kito Cheng  wrote:
>
> The target attribute which proposed in [1], target attribute allow user
> to specify a local setting per-function basis.
>
> The syntax of target attribute is `__attribute__((target("")))`.
>
> and the syntax of `` describes below:
> ```
> ATTR-STRING := ATTR-STRING ';' ATTR
>  | ATTR
>
> ATTR:= ARCH-ATTR
>  | CPU-ATTR
>  | TUNE-ATTR
>
> ARCH-ATTR   := 'arch=' EXTENSIONS-OR-FULLARCH
>
> EXTENSIONS-OR-FULLARCH := 
> | 
>
> EXTENSIONS :=  ',' 
> | 
>
> FULLARCHSTR:= 
>
> EXTENSION  :=   
>
> OP := '+'
>
> VERSION:= [0-9]+ 'p' [0-9]+
> | [1-9][0-9]*
> |
>
> EXTENSION-NAME := Naming rule is defined in RISC-V ISA manual
>
> CPU-ATTR:= 'cpu=' 
> TUNE-ATTR   := 'tune=' 
> ```
>
> Changes since v1:
> - Use std::unique_ptr rather than alloca to prevent memory issue.
> - Error rather than warning when attribute duplicated.
>
> [1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/35

I've reviewed with a focus on the utilized backend hooks and macros.

Reviewed-by: Christoph Müllner 

Note, that in the changelog below there are quite many empty entries.

>
> gcc/ChangeLog:
>
> * config.gcc (riscv): Add riscv-target-attr.o.
> * config/riscv/riscv-protos.h (riscv_declare_function_size) New.
> (riscv_option_valid_attribute_p): New.
> (riscv_override_options_internal): New.
> (struct riscv_tune_info): New.
> (riscv_parse_tune): New.
> * config/riscv/riscv-target-attr.cc
> (class riscv_target_attr_parser): New.
> (struct riscv_attribute_info): New.
> (riscv_attributes): New.
> (riscv_target_attr_parser::parse_arch):
> (riscv_target_attr_parser::handle_arch):
> (riscv_target_attr_parser::handle_cpu):
> (riscv_target_attr_parser::handle_tune):
> (riscv_target_attr_parser::update_settings):
> (riscv_process_one_target_attr):
> (num_occurences_in_str):
> (riscv_process_target_attr):
> (riscv_option_valid_attribute_p):
> * config/riscv/riscv.cc: Include target-globals.h and
> riscv-subset.h.
> (struct riscv_tune_info): Move to riscv-protos.h.
> (get_tune_str):
> (riscv_parse_tune):
> (riscv_declare_function_size):
> (riscv_option_override): Build target_option_default_node and
> target_option_current_node.
> (riscv_save_restore_target_globals):
> (riscv_option_restore):
> (riscv_previous_fndecl):
> (riscv_set_current_function): Apply the target attribute.
> (TARGET_OPTION_RESTORE): Define.
> (TARGET_OPTION_VALID_ATTRIBUTE_P): Ditto.
> * config/riscv/riscv.h (SWITCHABLE_TARGET): Define to 1.
> (ASM_DECLARE_FUNCTION_SIZE) Define.
> * config/riscv/riscv.opt (mtune=): Add Save attribute.
> (mcpu=): Ditto.
> (mcmodel=): Ditto.
> * config/riscv/t-riscv: Add build rule for riscv-target-attr.o
> * doc/extend.texi: Add doc for target attribute.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/target-attr-01.c: New.
> * gcc.target/riscv/target-attr-02.c: Ditto.
> * gcc.target/riscv/target-attr-03.c: Ditto.
> * gcc.target/riscv/target-attr-04.c: Ditto.
> * gcc.target/riscv/target-attr-05.c: Ditto.
> * gcc.target/riscv/target-attr-06.c: Ditto.
> * gcc.target/riscv/target-attr-07.c: Ditto.
> * gcc.target/riscv/target-attr-bad-01.c: Ditto.
> * gcc.target/riscv/target-attr-bad-02.c: Ditto.
> * gcc.target/riscv/target-attr-bad-03.c: Ditto.
> * gcc.target/riscv/target-attr-bad-04.c: Ditto.
> * gcc.target/riscv/target-attr-bad-05.c: Ditto.
> * gcc.target/riscv/target-attr-bad-06.c: Ditto.
> * gcc.target/riscv/target-attr-bad-07.c: Ditto.
> * gcc.target/riscv/target-attr-bad-08.c: Ditto.
> * gcc.target/riscv/target-attr-bad-09.c: Ditto.
> * gcc.target/riscv/target-attr-bad-10.c: Ditto.
> ---
>  gcc/config.gcc|   2 +-
>  gcc/config/riscv/riscv-protos.h   |  21 +
>  gcc/config/riscv/riscv-target-attr.cc | 395 ++
>  gcc/config/riscv/riscv.cc | 192 +++--
>  gcc/config/riscv/riscv.h  |   6 +
>  gcc/config/riscv/riscv.opt|   6 +-
>  gcc/config/riscv/t-riscv  |   5 +
>  gcc/doc/extend.texi   |  58 +++
>  .../gcc.target/riscv/target-attr-01.c |  31 ++
>  .../gcc.target/riscv/target-attr-02.c |  31 ++
>  .../gcc.target/riscv/target-attr-03.c |  26 ++
>  .../gcc.target/riscv/target-attr-04.c |  28 ++
>  

[PATCH] rs6000: Disassemble opaque modes using subregs to allow optimizations [PR109116]

2023-11-15 Thread Peter Bergner
PR109116 exposes an issue where using unspecs to access each vector component
of an opaque mode variable leads to unneeded register copies, because our rtl
optimizers cannot handle unspecs.  Instead, use subregs to access each vector
component of the opaque mode variable, which our optimizers know how to handle.

I did not include a test case with the patch, since writing a test case that
attempts to ensure we don't emit unneeded register copies is nearly impossible
since those copies can still be generated for reasons other than the causes
in this patch.  I have verified that this patch does improve code generation
for some unit tests and our AI libraries team has confirmed that performance
of their tests improved when using this patch.

This passed bootstrap and regtesting with no regressions on powerpc64le-linux
and powerpc64-linux.  Ok for trunk?

Peter


gcc/
PR target/109116
* config/rs6000/mma.md (vsx_disassemble_pair): Expand into a vector
register sized subreg.
* config/rs6000/mma.md (*vsx_disassemble_pair): Delete.
(mma_disassemble_acc): Expand into a vector register sized subreg.
(*mma_disassemble_acc): Delete.
* config/rs6000/rs6000.cc (rs6000_modes_tieable_p): Allow vector modes
to tie with OOmode.

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 575751d477e..2ca405469e2 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -398,29 +398,8 @@ (define_expand "vsx_disassemble_pair"
(match_operand 2 "const_0_to_1_operand")]
   "TARGET_MMA"
 {
-  rtx src;
-  int regoff = INTVAL (operands[2]);
-  src = gen_rtx_UNSPEC (V16QImode,
-   gen_rtvec (2, operands[1], GEN_INT (regoff)),
-   UNSPEC_MMA_EXTRACT);
-  emit_move_insn (operands[0], src);
-  DONE;
-})
-
-(define_insn_and_split "*vsx_disassemble_pair"
-  [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa")
-   (unspec:V16QI [(match_operand:OO 1 "vsx_register_operand" "wa")
- (match_operand 2 "const_0_to_1_operand")]
- UNSPEC_MMA_EXTRACT))]
-  "TARGET_MMA
-   && vsx_register_operand (operands[1], OOmode)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  int reg = REGNO (operands[1]);
-  int regoff = INTVAL (operands[2]);
-  rtx src = gen_rtx_REG (V16QImode, reg + regoff);
+  int regoff = INTVAL (operands[2]) * GET_MODE_SIZE (V16QImode);
+  rtx src = simplify_gen_subreg (V16QImode, operands[1], OOmode, regoff);
   emit_move_insn (operands[0], src);
   DONE;
 })
@@ -472,29 +451,8 @@ (define_expand "mma_disassemble_acc"
(match_operand 2 "const_0_to_3_operand")]
   "TARGET_MMA"
 {
-  rtx src;
-  int regoff = INTVAL (operands[2]);
-  src = gen_rtx_UNSPEC (V16QImode,
-   gen_rtvec (2, operands[1], GEN_INT (regoff)),
-   UNSPEC_MMA_EXTRACT);
-  emit_move_insn (operands[0], src);
-  DONE;
-})
-
-(define_insn_and_split "*mma_disassemble_acc"
-  [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa")
-   (unspec:V16QI [(match_operand:XO 1 "fpr_reg_operand" "d")
- (match_operand 2 "const_0_to_3_operand")]
- UNSPEC_MMA_EXTRACT))]
-  "TARGET_MMA
-   && fpr_reg_operand (operands[1], XOmode)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  int reg = REGNO (operands[1]);
-  int regoff = INTVAL (operands[2]);
-  rtx src = gen_rtx_REG (V16QImode, reg + regoff);
+  int regoff = INTVAL (operands[2]) * GET_MODE_SIZE (V16QImode);
+  rtx src = simplify_gen_subreg (V16QImode, operands[1], XOmode, regoff);
   emit_move_insn (operands[0], src);
   DONE;
 })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 5f56c3ed85b..f2efa46c147 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1964,9 +1964,12 @@ rs6000_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
 static bool
 rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 {
-  if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
-  || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode)
-return mode1 == mode2;
+   if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
+   || mode2 == PTImode || mode2 == XOmode)
+ return mode1 == mode2;
+ 
+  if (mode2 == OOmode)
+return ALTIVEC_OR_VSX_VECTOR_MODE (mode1);
 
   if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1))
 return ALTIVEC_OR_VSX_VECTOR_MODE (mode2);


Re: [PATCH 0/3] Option handling: add documentation URLs

2023-11-15 Thread Joseph Myers
On Wed, 15 Nov 2023, David Malcolm wrote:

> As mentioned, I'm currently investigating capturing per-language option
> URLs (to address Iain's and Marc's comments about D and Ada); if I get
> that working, I may need to add a similar note for adding a new
> frontend.
> 
> Hope the overall approach seems reasonable.

Yes, the approach seems reasonable.

I suppose a difficulty with per-language URLs is that a given option has a 
single OPT_* enumeration value; the diagnostic calls don't say whether 
it's being used from a front end or the middle end (though maybe there's 
not much overlap between the two) - though some option handling already 
distinguishes based on what language is being compiled (e.g. 
LangEnabledBy).  For per-architecture URLs you don't have this issue 
because only one architecture is built into GCC at a time.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Fix crash in libcc1

2023-11-15 Thread Jeff Law




On 11/14/23 22:30, Tom Tromey wrote:

The gdb tests of the libcc1 plugin have been failing lately.  I
tracked this down to a crash trying to access an enum's underlying
type.  This patch fixes the crash by setting this type.

libcc1/ChangeLog

* libcc1plugin.cc (plugin_build_enum_type): Set
ENUM_UNDERLYING_TYPE.

OK
jeff


Re: [PATCH 0/4] gcov: Improve -fprofile-update=atomic

2023-11-15 Thread Jeff Law




On 11/14/23 15:08, Sebastian Huber wrote:

Sebastian Huber (4):
   gcov: Remove TARGET_GCOV_TYPE_SIZE target hook
   Add TARGET_HAVE_LIBATOMIC
   gcov: Add gen_counter_update()
   gcov: Improve -fprofile-update=atomic

  gcc/c-family/c-cppbuiltin.cc |   4 +-
  gcc/config/rtems.h   |   2 +
  gcc/config/sparc/rtemself.h  |   2 -
  gcc/config/sparc/sparc.cc|  11 --
  gcc/coverage.cc  |   2 +-
  gcc/doc/invoke.texi  |  19 ++-
  gcc/doc/tm.texi  |  16 +--
  gcc/doc/tm.texi.in   |   4 +-
  gcc/target.def   |  20 ++-
  gcc/targhooks.cc |   7 --
  gcc/targhooks.h  |   2 -
  gcc/tree-profile.cc  | 232 +++
  libgcc/libgcov.h |  16 +--
  13 files changed, 197 insertions(+), 140 deletions(-)
This series as a whole is OK with the targetm.have_atomic instead of 
TARGET_HAVE_LIBATOMIC fix you mentioned after posting the series.


jeff


[PATCH] RISC-V: Change unaligned fast/slow/avoid macros to misaligned [PR111557]

2023-11-15 Thread Edwin Lu
Fix __riscv_unaligned_fast/slow/avoid macro name to
__riscv_misaligned_fast/slow/avoid to be consistent with the RISC-V API Spec

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): update macro name

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-1.c: update macro name
* gcc.target/riscv/attribute-4.c: ditto
* gcc.target/riscv/attribute-5.c: ditto
* gcc.target/riscv/predef-align-1.c: ditto
* gcc.target/riscv/predef-align-2.c: ditto
* gcc.target/riscv/predef-align-3.c: ditto
* gcc.target/riscv/predef-align-4.c: ditto
* gcc.target/riscv/predef-align-5.c: ditto
* gcc.target/riscv/predef-align-6.c: ditto

Signed-off-by: Edwin Lu 
---
 gcc/config/riscv/riscv-c.cc |  6 +++---
 gcc/testsuite/gcc.target/riscv/attribute-1.c| 10 +-
 gcc/testsuite/gcc.target/riscv/attribute-4.c|  8 
 gcc/testsuite/gcc.target/riscv/attribute-5.c| 10 +-
 gcc/testsuite/gcc.target/riscv/predef-align-1.c | 10 +-
 gcc/testsuite/gcc.target/riscv/predef-align-2.c |  8 
 gcc/testsuite/gcc.target/riscv/predef-align-3.c | 10 +-
 gcc/testsuite/gcc.target/riscv/predef-align-4.c | 10 +-
 gcc/testsuite/gcc.target/riscv/predef-align-5.c |  8 
 gcc/testsuite/gcc.target/riscv/predef-align-6.c | 10 +-
 10 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index b7f9ba204f7..dd1bd0596fc 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -109,11 +109,11 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 }
 
   if (riscv_user_wants_strict_align)
-builtin_define_with_int_value ("__riscv_unaligned_avoid", 1);
+builtin_define_with_int_value ("__riscv_misaligned_avoid", 1);
   else if (riscv_slow_unaligned_access_p)
-builtin_define_with_int_value ("__riscv_unaligned_slow", 1);
+builtin_define_with_int_value ("__riscv_misaligned_slow", 1);
   else
-builtin_define_with_int_value ("__riscv_unaligned_fast", 1);
+builtin_define_with_int_value ("__riscv_misaligned_fast", 1);
 
   if (TARGET_MIN_VLEN != 0)
 builtin_define_with_int_value ("__riscv_v_min_vlen", TARGET_MIN_VLEN);
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-1.c 
b/gcc/testsuite/gcc.target/riscv/attribute-1.c
index abfb0b498e0..a39efb3e6ff 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-1.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-1.c
@@ -4,13 +4,13 @@ int foo()
 {
 
 /* In absence of -m[no-]strict-align, default mcpu is currently 
-   set to rocket.  rocket has slow_unaligned_access=true.  */
-#if !defined(__riscv_unaligned_slow)
-#error "__riscv_unaligned_slow is not set"
+   set to rocket.  rocket has slow_misaligned_access=true.  */
+#if !defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_slow is not set"
 #endif
 
-#if defined(__riscv_unaligned_avoid) || defined(__riscv_unaligned_fast)
-#error "__riscv_unaligned_avoid or __riscv_unaligned_fast is unexpectedly set"
+#if defined(__riscv_misaligned_avoid) || defined(__riscv_misaligned_fast)
+#error "__riscv_misaligned_avoid or __riscv_misaligned_fast is unexpectedly 
set"
 #endif
 
 return 0;
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-4.c 
b/gcc/testsuite/gcc.target/riscv/attribute-4.c
index 545f87cb899..a5a95042a31 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-4.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-4.c
@@ -3,12 +3,12 @@
 int foo()
 {
 
-#if !defined(__riscv_unaligned_avoid)
-#error "__riscv_unaligned_avoid is not set"
+#if !defined(__riscv_misaligned_avoid)
+#error "__riscv_misaligned_avoid is not set"
 #endif
 
-#if defined(__riscv_unaligned_fast) || defined(__riscv_unaligned_slow)
-#error "__riscv_unaligned_fast or __riscv_unaligned_slow is unexpectedly set"
+#if defined(__riscv_misaligned_fast) || defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_fast or __riscv_misaligned_slow is unexpectedly set"
 #endif
 
   return 0;
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-5.c 
b/gcc/testsuite/gcc.target/riscv/attribute-5.c
index 753043c31e9..ad1a1811fa3 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-5.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-5.c
@@ -3,13 +3,13 @@
 int foo()
 {
 
-/* Default mcpu is rocket which has slow_unaligned_access=true.  */
-#if !defined(__riscv_unaligned_slow)
-#error "__riscv_unaligned_slow is not set"
+/* Default mcpu is rocket which has slow_misaligned_access=true.  */
+#if !defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_slow is not set"
 #endif
 
-#if defined(__riscv_unaligned_avoid) || defined(__riscv_unaligned_fast)
-#error "__riscv_unaligned_avoid or __riscv_unaligned_fast is unexpectedly set"
+#if defined(__riscv_misaligned_avoid) || defined(__riscv_misaligned_fast)
+#error "__riscv_misaligned_avoid or __riscv_misaligned_fast is unexpectedly 
set"
 #endif
 
 return 0;
diff --git 

Re: [PATCH] c++: constantness of call to function pointer [PR111703]

2023-11-15 Thread Jason Merrill

On 11/15/23 13:03, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/13/12 (to match the PR107939 / r13-6525-ge09bc034d1b4d6 backports)?

-- >8 --

potential_constant_expression for a CALL_EXPR to a non-overload tests
FUNCTION_POINTER_TYPE_P on the callee rather than on the type of the
callee, which means we always pass want_rval=any when recursing and so
may fail to properly treat a non-constant function pointer callee as such.
Fixing this turns out to further work around the PR111703 issue.

PR c++/111703
PR c++/107939

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) :
Fix FUNCTION_POINTER_TYPE_P test.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-fn8.C: Extend test.
* g++.dg/diagnostic/constexpr4.C: New test.
---
  gcc/cp/constexpr.cc  | 4 +++-
  gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C| 2 ++
  gcc/testsuite/g++.dg/diagnostic/constexpr4.C | 9 +
  3 files changed, 14 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/constexpr4.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 8a6b210144a..5ecc30117a1 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9547,7 +9547,9 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
  }
else if (fun)
{
-   if (RECUR (fun, FUNCTION_POINTER_TYPE_P (fun) ? rval : any))
+   if (RECUR (fun, (TREE_TYPE (fun)
+&& FUNCTION_POINTER_TYPE_P (TREE_TYPE (fun))
+? rval : any)))


We might break this ?: out into a variable?  OK either way.


  /* Might end up being a constant function pointer.  But it
 could also be a function object with constexpr op(), so
 we pass 'any' so that the underlying VAR_DECL is deemed
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
index 3f63a5b28d7..c63d26c931d 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
@@ -15,10 +15,12 @@ struct P {
  };
  
  void (*f)(P);

+P (*h)(P);
  
  template

  constexpr bool g() {
P x;
f(x); // { dg-bogus "from here" }
+  f(h(x)); // { dg-bogus "from here" }
return true;
  }
diff --git a/gcc/testsuite/g++.dg/diagnostic/constexpr4.C 
b/gcc/testsuite/g++.dg/diagnostic/constexpr4.C
new file mode 100644
index 000..f971f533b08
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/constexpr4.C
@@ -0,0 +1,9 @@
+// Verify we diagnose a call to a non-constant function pointer ahead of time.
+// { dg-do compile { target c++11 } }
+
+int (*f)(int);
+
+template
+void g() {
+  static_assert(f(N) == 0, ""); // { dg-error "non-constant|'f' is not usable" 
}
+}




Re: [PATCH] RISC-V: Save/restore ra register correctly [PR112478]

2023-11-15 Thread Christoph Müllner
On Tue, Nov 14, 2023 at 3:15 PM Kito Cheng  wrote:
>
> We set ra to fixed register now, but we still need to save/restore that at
> prologue/epilogue if that has used.

So before 71f906498ada9 $ra was neither a fixed nor a used register.
Therefore, riscv_save_reg_p returned true in the first test (not global reg,
not used_or_fixed, and ever_live_p).
After this commit, this does not happen anymore, because the test for
not used_or_fixed fails and we don't test for ever_live_p in the following.
And this patch restores this behavior.

Reviewed-by: Christoph Müllner 
Tested-by: Christoph Müllner 

>
> gcc/ChangeLog:
>
> PR target/112478
> * config/riscv/riscv.cc (riscv_save_return_addr_reg_p): Check ra
> is ever lived.
>
> gcc/testsuite/gcc/ChangeLog:
>
> PR target/112478
> * riscv/pr112478.c: New.
> ---
>  gcc/config/riscv/riscv.cc | 4 
>  gcc/testsuite/gcc.target/riscv/pr112478.c | 8 
>  2 files changed, 12 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr112478.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index ecee7eb4727..f09c4066903 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -5802,6 +5802,10 @@ riscv_save_return_addr_reg_p (void)
>if (riscv_far_jump_used_p ())
>  return true;
>
> +  /* We need to save it if anyone has used that.  */
> +  if (df_regs_ever_live_p (RETURN_ADDR_REGNUM))
> +return true;
> +
>/* Need not to use ra for leaf when frame pointer is turned off by
>   option whatever the omit-leaf-frame's value.  */
>if (frame_pointer_needed && crtl->is_leaf
> diff --git a/gcc/testsuite/gcc.target/riscv/pr112478.c 
> b/gcc/testsuite/gcc.target/riscv/pr112478.c
> new file mode 100644
> index 000..0bbde20b71b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr112478.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-ffat-lto-objects" } */
> +
> +void foo() {
> +asm volatile("# " : ::"ra");
> +}
> +
> +/* { dg-final { scan-assembler "s(w|d)\[ \t\]*ra" } } */
> --
> 2.40.1
>


Re: [PATCH v3] c++: fix parsing with auto(x) [PR112410]

2023-11-15 Thread Jason Merrill

On 11/15/23 17:24, Marek Polacek wrote:

On Tue, Nov 14, 2023 at 05:27:03PM -0500, Jason Merrill wrote:

On 11/14/23 10:58, Marek Polacek wrote:

On Mon, Nov 13, 2023 at 09:26:41PM -0500, Jason Merrill wrote:

On 11/10/23 20:13, Marek Polacek wrote:

On Thu, Nov 09, 2023 at 07:07:03PM -0500, Jason Merrill wrote:

On 11/9/23 14:58, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we are wrongly parsing

  int y(auto(42));

which uses the C++23 cast-to-prvalue feature, and initializes y to 42.
However, we were treating the auto as an implicit template parameter.

Fixing the auto{42} case is easy, but when auto is followed by a (,
I found the fix to be much more involved.  For instance, we cannot
use cp_parser_expression, because that can give hard errors.  It's
also necessary to disambiguate 'auto(i)' as 'auto i', not a cast.
auto(), auto(int), auto(f)(int), auto(*), auto(i[]), auto(...), etc.
are all function declarations.  We have to look at more than one
token to decide.


Yeah, this is a most vexing parse problem.  The code is synthesizing
template parameters before we've resolved whether the auto is a
decl-specifier or not.


In this fix, I'm (ab)using cp_parser_declarator, with member_p=false
so that it doesn't commit.  But it handles even more complicated
cases as

  int fn (auto (*const **)(int) -> char);


But it doesn't seem to handle the extremely vexing

struct A {
 A(int,int);
};

int main()
{
 int a;
 A b(auto(a), 42);
}


Argh.  This test should indeed be accepted and is currently rejected,
but it's a different problem: 'b' is at block scope and you can't
have a template there.  But when I put it into a namespace scope,
it shows that my patch doesn't work correctly.  I've added auto-fncast14.C
for the latter and opened c++/112482 for the block-scope problem.

I think we need to stop synthesizing immediately when we see RID_AUTO, and
instead go back after we successfully parse a declaration and synthesize for
any autos we saw along the way.  :/


That seems very complicated :(.  I had a different idea though; how
about the following patch?  The idea is that if we see that parsing
the parameter-declaration-list didn't work, we undo what synthesize_
did, and let cp_parser_initializer parse "(auto(42))", which should
succeed.  I checked that after cp_finish_decl y is initialized to 42.


Nice, that's much simpler.  Do you also still need the changes to
cp_parser_simple_type_specifier?


I do, otherwise we parse

int f (auto{42});

just as if it had been

int f (auto);

because the {42} is consumed in the cp_parser_simple_type_specifier/RID_AUTO
loop.  :/


It isn't consumed there, that loop is just scanning forward to see if
there's a ->.  The { is still the next token when we expect it to be a
closing ) in cp_parser_direct_declarator:


Ok, the tokens are rolled back after consuming so we can...
  

   /* Parse the parameter-declaration-clause.  */
   params
 = cp_parser_parameter_declaration_clause (parser, flags);
   const location_t parens_end
 = cp_lexer_peek_token (parser->lexer)->location;

   /* Consume the `)'.  */
   parens.require_close (parser);


Maybe we want to abort_fully_implicit_template here rather than in
cp_parser_parameter_declaration_clause?


...do this instead.  Much better.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK, thanks.


-- >8 --
Here we are wrongly parsing

   int y(auto(42));

which uses the C++23 cast-to-prvalue feature, and initializes y to 42.
However, we were treating the auto as an implicit template parameter.

Fixing the auto{42} case is easy, but when auto is followed by a (,
I found the fix to be much more involved.  For instance, we cannot
use cp_parser_expression, because that can give hard errors.  It's
also necessary to disambiguate 'auto(i)' as 'auto i', not a cast.
auto(), auto(int), auto(f)(int), auto(*), auto(i[]), auto(...), etc.
are all function declarations.

This patch rectifies that by undoing the implicit function template
modification.  In the test above, we should notice that the parameter
list is ill-formed, and since we've synthesized an implicit template
parameter, we undo it by calling abort_fully_implicit_template.  Then,
we'll parse the "(auto(42))" as an initializer.

PR c++/112410

gcc/cp/ChangeLog:

* parser.cc (cp_parser_direct_declarator): Maybe call
abort_fully_implicit_template if it turned out the parameter list was
ill-formed.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/auto-fncast13.C: New test.
* g++.dg/cpp23/auto-fncast14.C: New test.
---
  gcc/cp/parser.cc   | 13 +
  gcc/testsuite/g++.dg/cpp23/auto-fncast13.C | 61 ++
  gcc/testsuite/g++.dg/cpp23/auto-fncast14.C |  9 
  3 files changed, 83 insertions(+)
  create mode 100644 

Re: [PATCH 0/3] Option handling: add documentation URLs

2023-11-15 Thread David Malcolm
On Tue, 2023-11-14 at 00:12 +, Joseph Myers wrote:
> On Fri, 10 Nov 2023, David Malcolm wrote:
> 
> > The .opt.urls files it generates become part of the source tree,
> > and
> > would be regenerated by maintainers whenever new options are added.
> > Forgetting to update the files (or not having Python 3 handy)
> > merely
> > means that URLs might be missing or out of date until someone else
> > regenerates them.
> 
> Do I understand correctly that there are no makefile targets to
> regenerate 
> these files; it's up to maintainers to regenerate them manually?
> 
> Advantages:
> 
> * No need to update contrib/gcc_update to handle timestamps for the
> files.
> 
> * No modifications unexpectedly appearing in source trees, if the
> checked 
> in files are out of date and you run a build with the timestamps such
> that 
> the file gets regenerated.

The .opt.urls are generated from the generated HTML.  I think this
needs to be a manually-triggered process, otherwise the optionlist
depends on the generated HTML, and thus the generated HTML would become
a hard early dependency during the build (which I don't think we would
want).

> 
> Disadvantages:
> 
> * You need to know how to do the regeneration manually; "make" is the
> uniform way for generating any file the build system can generate,
> without 
> needing more specific knowledge about that file.

In the patches I posted I merely listed the commands in a comment in
the script, but I'm currently working on adding support for options
from the gdc and gfortran docs, and in doing so found that running the
script with the correct options was a pain.

So to make it easier, I'm currently thinking of adding this convenience
target, so that when a maintainer does decide to regenerate the
.opt.urls, they can simply type "make regenerate-opt-urls" in the gcc
build subdir:

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c3ed960b8f3c..6d24b7b9db34 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3616,6 +3616,12 @@ $(build_htmldir)/gccinstall/index.html: 
$(TEXI_GCCINSTALL_FILES)
DESTDIR=$(@D) \
$(SHELL) $(srcdir)/doc/install.texi2html
 
+# Regenerate the .opt.urls files from the generated html, and from the .opt
+# files.
+.PHONY: regenerate-opt-urls
+regenerate-opt-urls:
+   $(srcdir)/regenerate-opt-urls.py $(build_htmldir) $(shell dirname 
$(srcdir))
+
 MANFILES = doc/gcov.1 doc/cpp.1 doc/gcc.1 doc/gfdl.7 doc/gpl.7 \
doc/fsf-funding.7 doc/gcov-tool.1 doc/gcov-dump.1 \
   $(if $(filter yes,@enable_lto@),doc/lto-dump.1)


> 
> Given the recent discussion starting at 
>  of 
> post-commit CI to detect auto*-generated files that aren't fully up
> to 
> date, maybe it would be appropriate to add a check for .opt.urls
> files 
> being up to date (including making sure that each .opt file does have
> a 
> corresponding .opt.urls file checked in) to that CI?
> 
> Since the Python script has hardcoded information about .opt files
> and 
> corresponding URLs for target options documentation, the patch series
> should update sourcebuild.texi, section "Back End", to identify that 
> script as one of the places to update when adding a new target back
> end.

Thanks, will do.

As mentioned, I'm currently investigating capturing per-language option
URLs (to address Iain's and Marc's comments about D and Ada); if I get
that working, I may need to add a similar note for adding a new
frontend.

Hope the overall approach seems reasonable.
Dave



Re: building GNU gettext on AIX

2023-11-15 Thread Bruno Haible
David Edelsohn wrote:
> I am using my own install of GCC for a reason.

I have built GNU gettext 0.22.3 in various configurations on the AIX 7.1
and 7.3 machines in the compilefarm, and haven't encountered issues with
'max_align_t' nor with 'getpeername'. So, from my point of view, GNU gettext
works fine on AIX with gcc and xlc (but not ibm-clang, which I haven't
tested).

You will surely understand that I cannot test a release against a compiler
that exists only on your hard disk.

The hint I gave you, based on the partial logs that you provided, is to
look at the configure test for intmax_t first.

Bruno





[PATCH v3] c++: fix parsing with auto(x) [PR112410]

2023-11-15 Thread Marek Polacek
On Tue, Nov 14, 2023 at 05:27:03PM -0500, Jason Merrill wrote:
> On 11/14/23 10:58, Marek Polacek wrote:
> > On Mon, Nov 13, 2023 at 09:26:41PM -0500, Jason Merrill wrote:
> > > On 11/10/23 20:13, Marek Polacek wrote:
> > > > On Thu, Nov 09, 2023 at 07:07:03PM -0500, Jason Merrill wrote:
> > > > > On 11/9/23 14:58, Marek Polacek wrote:
> > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > 
> > > > > > -- >8 --
> > > > > > Here we are wrongly parsing
> > > > > > 
> > > > > >  int y(auto(42));
> > > > > > 
> > > > > > which uses the C++23 cast-to-prvalue feature, and initializes y to 
> > > > > > 42.
> > > > > > However, we were treating the auto as an implicit template 
> > > > > > parameter.
> > > > > > 
> > > > > > Fixing the auto{42} case is easy, but when auto is followed by a (,
> > > > > > I found the fix to be much more involved.  For instance, we cannot
> > > > > > use cp_parser_expression, because that can give hard errors.  It's
> > > > > > also necessary to disambiguate 'auto(i)' as 'auto i', not a cast.
> > > > > > auto(), auto(int), auto(f)(int), auto(*), auto(i[]), auto(...), etc.
> > > > > > are all function declarations.  We have to look at more than one
> > > > > > token to decide.
> > > > > 
> > > > > Yeah, this is a most vexing parse problem.  The code is synthesizing
> > > > > template parameters before we've resolved whether the auto is a
> > > > > decl-specifier or not.
> > > > > 
> > > > > > In this fix, I'm (ab)using cp_parser_declarator, with member_p=false
> > > > > > so that it doesn't commit.  But it handles even more complicated
> > > > > > cases as
> > > > > > 
> > > > > >  int fn (auto (*const **)(int) -> char);
> > > > > 
> > > > > But it doesn't seem to handle the extremely vexing
> > > > > 
> > > > > struct A {
> > > > > A(int,int);
> > > > > };
> > > > > 
> > > > > int main()
> > > > > {
> > > > > int a;
> > > > > A b(auto(a), 42);
> > > > > }
> > > > 
> > > > Argh.  This test should indeed be accepted and is currently rejected,
> > > > but it's a different problem: 'b' is at block scope and you can't
> > > > have a template there.  But when I put it into a namespace scope,
> > > > it shows that my patch doesn't work correctly.  I've added 
> > > > auto-fncast14.C
> > > > for the latter and opened c++/112482 for the block-scope problem.
> > > > > I think we need to stop synthesizing immediately when we see 
> > > > > RID_AUTO, and
> > > > > instead go back after we successfully parse a declaration and 
> > > > > synthesize for
> > > > > any autos we saw along the way.  :/
> > > > 
> > > > That seems very complicated :(.  I had a different idea though; how
> > > > about the following patch?  The idea is that if we see that parsing
> > > > the parameter-declaration-list didn't work, we undo what synthesize_
> > > > did, and let cp_parser_initializer parse "(auto(42))", which should
> > > > succeed.  I checked that after cp_finish_decl y is initialized to 42.
> > > 
> > > Nice, that's much simpler.  Do you also still need the changes to
> > > cp_parser_simple_type_specifier?
> > 
> > I do, otherwise we parse
> > 
> >int f (auto{42});
> > 
> > just as if it had been
> > 
> >int f (auto);
> > 
> > because the {42} is consumed in the cp_parser_simple_type_specifier/RID_AUTO
> > loop.  :/
> 
> It isn't consumed there, that loop is just scanning forward to see if
> there's a ->.  The { is still the next token when we expect it to be a
> closing ) in cp_parser_direct_declarator:

Ok, the tokens are rolled back after consuming so we can...
 
> >   /* Parse the parameter-declaration-clause.  */
> >   params
> > = cp_parser_parameter_declaration_clause (parser, flags);
> >   const location_t parens_end
> > = cp_lexer_peek_token (parser->lexer)->location;
> > 
> >   /* Consume the `)'.  */
> >   parens.require_close (parser);
> 
> Maybe we want to abort_fully_implicit_template here rather than in
> cp_parser_parameter_declaration_clause?

...do this instead.  Much better.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we are wrongly parsing

  int y(auto(42));

which uses the C++23 cast-to-prvalue feature, and initializes y to 42.
However, we were treating the auto as an implicit template parameter.

Fixing the auto{42} case is easy, but when auto is followed by a (,
I found the fix to be much more involved.  For instance, we cannot
use cp_parser_expression, because that can give hard errors.  It's
also necessary to disambiguate 'auto(i)' as 'auto i', not a cast.
auto(), auto(int), auto(f)(int), auto(*), auto(i[]), auto(...), etc.
are all function declarations.

This patch rectifies that by undoing the implicit function template
modification.  In the test above, we should notice that the parameter
list is ill-formed, and since we've synthesized an implicit template
parameter, we undo it by calling 

Re: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-15 Thread 钟居哲
OK. Make sense。
LGTM as long as you remove  all
GET_MODE_BITSIZE (GET_MODE_INNER (mode)) <= GET_MODE_BITSIZE (Pmode)



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-16 04:30
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for 
rv32gcv.
On 11/15/23 15:29, 钟居哲 wrote:
> Could you show me the example ?
> 
> It's used by handling SEW = 64 on RV32. I don't know why this patch touch 
> this code.
 
Use gather_load_run-1.c with the 64-bit index patterns disabled
on rv32.  We insert (mem:DI (reg:SI)) into a vector so use the
SEW = 64 demote handler.  There we set vl = vl * 2 (which is correct)
but the mode (i.e. vector) just changes from DI to SI while
keeping the number of elements the same.  Then we get e.g. go
from V8DI to V8SI and slide down 16 elements, losing the lower
half.  
 
Regards
Robin
 


Re: building GNU gettext on AIX

2023-11-15 Thread David Edelsohn
On Wed, Nov 15, 2023 at 4:22 PM Bruno Haible  wrote:

> David Edelsohn wrote:
> > When I try to configure gettext-0.22.3, I receive the following error:
> >
> > checking for socklen_t equivalent... configure: error: Cannot find a type
> > to use in place of socklen_t
> >
> > configure: error:
> > /nasfarm/edelsohn/src/gettext-0.22.3/libtextstyle/configure failed for
> > libtextstyle
> >
> >
> > configure:43943: /nasfarm/edelsohn/install/GCC12/bin/gcc -c -g -O2
> > -D_THREAD_SAFE
> > conftest.c >&5
> >
> > conftest.c:112:18: error: two or more data types in declaration
> specifiers
> >
> >   112 | #define intmax_t long long
> >
> >   |  ^~~~
> >
> > conftest.c:112:23: error: two or more data types in declaration
> specifiers
> >
> >   112 | #define intmax_t long long
> >
> >   |   ^~~~
> >
> > In file included from conftest.c:212:
> >
> > conftest.c:214:24: error: conflicting types for 'ngetpeername'; have
> > 'int(int,  void *, long unsigned int *)'
> >
> >   214 |int getpeername (int, void *, unsigned long
> int
> > *);
> >
> >   |^~~
> >
> >
> /nasfarm/edelsohn/install/GCC12/lib/gcc/powerpc-ibm-aix7.2.5.0/12.1.1/include-fixed/sys/socket.h:647:9:
> > note: previous declaration of 'ngetpeername' with type 'int(int,  struct
> > sockaddr * restrict,  socklen_t * restrict)' {aka 'int(int,  struct
> > sockaddr * restrict,  long unsigned int * restrict)'}
> >
> >   647 | int getpeername(int, struct sockaddr *__restrict__, socklen_t
> > *__restrict__);
> >
> >   | ^~~
> >
> >
> > configure and config.h seems to get itself confused about types.
>
> There seem to be two problems, both related to the include files of
> your compiler:
>
>   - The configure test "checking for intmax_t..." must have found the
> answer "no". But on a modern system,  should be defining
> intmax_t already.
>
>   - This configure test that tries to find the getpeername declaration,
> but cannot find it (maybe because of the first problem?):
>
>
> 
>  for arg2 in "struct sockaddr" void; do
>for t in int size_t "unsigned int" "long int" "unsigned long
> int"; do
>  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> /* end confdefs.h.  */
> #include 
>#include 
>
>int getpeername (int, $arg2 *, $t *);
> int
> main (void)
> {
> $t len;
>   getpeername (0, 0, );
>   ;
>   return 0;
> }
> _ACEOF
> if ac_fn_c_try_compile "$LINENO"
> then :
>   gl_cv_socklen_t_equiv="$t"
> fi
>
> 
>
> I would concentrate on the first problem. If you don't get it fixed, then
> I'd
> suggest to try 'gcc' from the AIX Toolbox [1] or 'xlc' (as an IBM product)
> instead of 'gcc' (that looks like you built it yourself).
>
> Bruno
>
> [1]
> https://public.dhe.ibm.com/aix/freeSoftware/aixtoolbox/SPECS/gcc12-12.3.0-1.spec


Bruno,

I am using my own install of GCC for a reason.  The build of GCC works for
everything else, including bootstrap of GCC, GDB, GMP, etc.  The only
problem is gettext.

Thanks, David


[committed] i386: Optimize strict_low_part QImode insn with high input registers

2023-11-15 Thread Uros Bizjak
Following testcase:

struct S1
{
  unsigned char val;
  unsigned char pad1;
  unsigned short pad2;
};

struct S2
{
  unsigned char pad1;
  unsigned char val;
  unsigned short pad2;
};

struct S1 test_add (struct S1 a, struct S2 b, struct S2 c)
{
  a.val = b.val + c.val;

  return a;
}

compiles with -O2 to:

movl%edi, %eax
movzbl  %dh, %edx
movl%esi, %ecx
movb%dl, %al
addb%ch, %al

The insert to %al can go directly from %dh:

movl%edi, %eax
movl%esi, %ecx
movb%dh, %al
addb%ch, %al

Patch introduces strict_low_part QImode insn patterns with both of
their input arguments extracted from high register.  This invalid
insn is split after reload to a lowpart insert from the high register
and qi_ext_1_slp instruction.

PR target/78904

gcc/ChangeLog:

* config/i386/i386.md (*movstrictqi_ext_1): New insn pattern.
(*addqi_ext_2_slp): New define_insn_and_split pattern.
(*subqi_ext_2_slp): Ditto.
(*qi_ext_2_slp): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr78904-8.c: New test.
* gcc.target/i386/pr78904-8a.c: New test.
* gcc.target/i386/pr78904-8b.c: New test.
* gcc.target/i386/pr78904-9.c: New test.
* gcc.target/i386/pr78904-9a.c: New test.
* gcc.target/i386/pr78904-9b.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 32535621db4..26cdb21d3c0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3335,6 +3335,19 @@ (define_insn "*movstrict_xor"
(set_attr "mode" "")
(set_attr "length_immediate" "0")])
 
+(define_insn "*movstrictqi_ext_1"
+  [(set (strict_low_part
+ (match_operand:QI 0 "register_operand" "+Q"))
+ (subreg:QI
+   (match_operator:SWI248 2 "extract_operator"
+ [(match_operand 1 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0))]
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "mov{b}\t{%h1, %0|%0, %h1}"
+  [(set_attr "type" "imov")
+   (set_attr "mode" "QI")])
+
 (define_expand "extv"
   [(set (match_operand:SWI24 0 "register_operand")
(sign_extract:SWI24 (match_operand:SWI24 1 "register_operand")
@@ -6645,6 +6658,39 @@ (define_insn_and_split "*addqi_ext_1_slp"
   [(set_attr "type" "alu")
(set_attr "mode" "QI")])
 
+(define_insn_and_split "*addqi_ext_2_slp"
+  [(set (strict_low_part (match_operand:QI 0 "register_operand" "+"))
+   (plus:QI
+ (subreg:QI
+   (match_operator:SWI248 3 "extract_operator"
+ [(match_operand 1 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0)
+   (subreg:QI
+ (match_operator:SWI248 4 "extract_operator"
+   [(match_operand 2 "int248_register_operand" "Q")
+(const_int 8)
+(const_int 8)]) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "#"
+  "&& reload_completed"
+  [(set (strict_low_part (match_dup 0))
+   (subreg:QI
+ (match_op_dup 4
+   [(match_dup 2) (const_int 8) (const_int 8)]) 0))
+   (parallel
+ [(set (strict_low_part (match_dup 0))
+  (plus:QI
+(subreg:QI
+  (match_op_dup 3
+[(match_dup 1) (const_int 8) (const_int 8)]) 0)
+  (match_dup 0)))
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
+  [(set_attr "type" "alu")
+   (set_attr "mode" "QI")])
+
 ;; Split non destructive adds if we cannot use lea.
 (define_split
   [(set (match_operand:SWI48 0 "register_operand")
@@ -7688,6 +7734,39 @@ (define_insn_and_split "*subqi_ext_1_slp"
   [(set_attr "type" "alu")
(set_attr "mode" "QI")])
 
+(define_insn_and_split "*subqi_ext_2_slp"
+  [(set (strict_low_part (match_operand:QI 0 "register_operand" "+"))
+   (minus:QI
+ (subreg:QI
+   (match_operator:SWI248 3 "extract_operator"
+ [(match_operand 1 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0)
+   (subreg:QI
+ (match_operator:SWI248 4 "extract_operator"
+   [(match_operand 2 "int248_register_operand" "Q")
+(const_int 8)
+(const_int 8)]) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "#"
+  "&& reload_completed"
+  [(set (strict_low_part (match_dup 0))
+   (subreg:QI
+ (match_op_dup 3
+   [(match_dup 1) (const_int 8) (const_int 8)]) 0))
+   (parallel
+ [(set (strict_low_part (match_dup 0))
+  (minus:QI
+  (match_dup 0)
+(subreg:QI
+  (match_op_dup 4
+[(match_dup 2) (const_int 8) (const_int 8)]) 0)))
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
+  [(set_attr "type" "alu")
+   

Re: building GNU gettext on AIX

2023-11-15 Thread Bruno Haible
David Edelsohn wrote:
> When I try to configure gettext-0.22.3, I receive the following error:
> 
> checking for socklen_t equivalent... configure: error: Cannot find a type
> to use in place of socklen_t
> 
> configure: error:
> /nasfarm/edelsohn/src/gettext-0.22.3/libtextstyle/configure failed for
> libtextstyle
> 
> 
> configure:43943: /nasfarm/edelsohn/install/GCC12/bin/gcc -c -g -O2
> -D_THREAD_SAFE
> conftest.c >&5
> 
> conftest.c:112:18: error: two or more data types in declaration specifiers
> 
>   112 | #define intmax_t long long
> 
>   |  ^~~~
> 
> conftest.c:112:23: error: two or more data types in declaration specifiers
> 
>   112 | #define intmax_t long long
> 
>   |   ^~~~
> 
> In file included from conftest.c:212:
> 
> conftest.c:214:24: error: conflicting types for 'ngetpeername'; have
> 'int(int,  void *, long unsigned int *)'
> 
>   214 |int getpeername (int, void *, unsigned long int
> *);
> 
>   |^~~
> 
> /nasfarm/edelsohn/install/GCC12/lib/gcc/powerpc-ibm-aix7.2.5.0/12.1.1/include-fixed/sys/socket.h:647:9:
> note: previous declaration of 'ngetpeername' with type 'int(int,  struct
> sockaddr * restrict,  socklen_t * restrict)' {aka 'int(int,  struct
> sockaddr * restrict,  long unsigned int * restrict)'}
> 
>   647 | int getpeername(int, struct sockaddr *__restrict__, socklen_t
> *__restrict__);
> 
>   | ^~~
> 
> 
> configure and config.h seems to get itself confused about types.

There seem to be two problems, both related to the include files of
your compiler:

  - The configure test "checking for intmax_t..." must have found the
answer "no". But on a modern system,  should be defining
intmax_t already.

  - This configure test that tries to find the getpeername declaration,
but cannot find it (maybe because of the first problem?):


 for arg2 in "struct sockaddr" void; do
   for t in int size_t "unsigned int" "long int" "unsigned long int"; do
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
/* end confdefs.h.  */
#include 
   #include 

   int getpeername (int, $arg2 *, $t *);
int
main (void)
{
$t len;
  getpeername (0, 0, );
  ;
  return 0;
}
_ACEOF
if ac_fn_c_try_compile "$LINENO"
then :
  gl_cv_socklen_t_equiv="$t"
fi


I would concentrate on the first problem. If you don't get it fixed, then I'd
suggest to try 'gcc' from the AIX Toolbox [1] or 'xlc' (as an IBM product)
instead of 'gcc' (that looks like you built it yourself).

Bruno

[1] 
https://public.dhe.ibm.com/aix/freeSoftware/aixtoolbox/SPECS/gcc12-12.3.0-1.spec





Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-11-15 Thread FX Coudert
> So I currently see the following in my build logs:
> 
>[...]
>mkdir -p -- ./fixincludes
>Configuring in ./fixincludes
>configure: creating cache ./config.cache
>[...]/source-gcc/fixincludes/configure: line 3030: 
> enable_darwin_at_rpath_--srcdir=[...]/source-gcc/fixincludes=no: No such file 
> or directory
>checking build system type... x86_64-pc-linux-gnu
>checking host system type... x86_64-pc-linux-gnu
>checking target system type... nvptx-unknown-none
>[...]
> 
> I'm not convinced that's achieving what it means to achieve?

I’ve tried to understand where that line gets expanded from:

>>> +enable_darwin_at_rpath_$1=no

It comes from:

> _LT_TAGVAR(enable_darwin_at_rpath, $1)=no

in the top-level libtool.m4. I can’t say that I understand why that line is 
there. All the other definitions using this structure are all inside the 
definition of _LT_ prefixed functions, defined by m4_defun. This one line is 
alone, outside of any function.

If I remove the line from libtool.m4 (innocent smile) I see that 
fixincludes/configure is better, and it does not appear to change the 
regenerated files in other directories (I didn’t do a build yet, just tried to 
regenerate with some manual autoconf invocations).

Food for thought.
FX

Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-15 Thread Robin Dapp
On 11/15/23 15:29, 钟居哲 wrote:
> Could you show me the example ?
> 
> It's used by handling SEW = 64 on RV32. I don't know why this patch touch 
> this code.

Use gather_load_run-1.c with the 64-bit index patterns disabled
on rv32.  We insert (mem:DI (reg:SI)) into a vector so use the
SEW = 64 demote handler.  There we set vl = vl * 2 (which is correct)
but the mode (i.e. vector) just changes from DI to SI while
keeping the number of elements the same.  Then we get e.g. go
from V8DI to V8SI and slide down 16 elements, losing the lower
half.  

Regards
 Robin


[COMMITTED] Regenerate libiberty/aclocal.m4 with aclocal 1.15.1

2023-11-15 Thread Mark Wielaard
There is a new buildbot check that all autotool files are generated
with the correct versions (automake 1.15.1 and autoconf 2.69).
https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen

Correct one file that was generated with the wrong version.

libiberty/
* aclocal.m4: Rebuild.
---
 libiberty/aclocal.m4 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libiberty/aclocal.m4 b/libiberty/aclocal.m4
index f327865aaf9..0757688d52a 100644
--- a/libiberty/aclocal.m4
+++ b/libiberty/aclocal.m4
@@ -1,6 +1,6 @@
-# generated automatically by aclocal 1.16.5 -*- Autoconf -*-
+# generated automatically by aclocal 1.15.1 -*- Autoconf -*-
 
-# Copyright (C) 1996-2021 Free Software Foundation, Inc.
+# Copyright (C) 1996-2017 Free Software Foundation, Inc.
 
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
-- 
2.39.3



Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-11-15 Thread Thomas Schwinge
Hi!

On 2023-10-30T19:08:18+, Iain Sandoe  wrote:
>> On 30 Oct 2023, at 16:31, FX Coudert  wrote:
>>
>>> +enable_darwin_at_rpath_$1=no
>>
>> I actually don’t understand why this one would have $1 in the name, unlike 
>> all other regenerated configure files. What value do we expect for $1 at 
>> this point in the file? That’s just plain weird.
>
> I’ve committed the missing hunk - at least that should appease CI.
>
> Agreed, it is weird, (actually, I’ve never quite understood why fixincludes 
> wants libtool.m4 given that it is host-side and not building any libraries) ..

So I currently see the following in my build logs:

[...]
mkdir -p -- ./fixincludes
Configuring in ./fixincludes
configure: creating cache ./config.cache
[...]/source-gcc/fixincludes/configure: line 3030: 
enable_darwin_at_rpath_--srcdir=[...]/source-gcc/fixincludes=no: No such file 
or directory
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target system type... nvptx-unknown-none
[...]

I'm not convinced that's achieving what it means to achieve?


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-15 Thread David Edelsohn
On Wed, Nov 15, 2023 at 9:22 AM Arsen Arsenović  wrote:

>
> David Edelsohn  writes:
>
> > GCC had been working on AIX with NLS, using "--with-included-gettext".
> > --disable-nls gets past the breakage, but GCC does not build for me on
> AIX
> > with NLS enabled.
>
> That should still work with gettext 0.22+ extracted in-tree (it should
> be fetched by download_prerequisites).
>
> > A change in dependencies for GCC should have been announced and more
> widely
> > socialized in the GCC development mailing list, not just GCC patches
> > mailing list.
> >
> > I have tried both the AIX Open Source libiconv and libgettext package,
> and
> > the ones that I previously built.  Both fail because GCC configure
> decides
> > to disable NLS, despite being requested, while libcpp is satisfied, so
> > tools in the gcc subdirectory don't link against libiconv and the build
> > fails.  With the included gettext, I was able to rely on a
> self-consistent
> > solution.
>
> That is interesting.  They should be using the same checks.  I've
> checked trunk and regenerated files on it, and saw no significant diff
> (some whitespace changes only).  Could you post the config.log of both?
>

GCC configured with --with-libintl-prefix and --with-libiconv-prefix

libcpp/config.log:

configure:7610: checking for GNU gettext in libc

configure:7639: /nasfarm/edelsohn/install/GCC12/bin/gcc -std=gnu99 -o
conftest -g  -static-libstdc++ -static-libgcc -Wl,-bbigtoc conftest.c  >&5

conftest.c:71:10: fatal error: libintl.h: No such file or directory

   71 | #include 

  |  ^~~

configure:8318: checking for GNU gettext in libintl

configure:8355: /nasfarm/edelsohn/install/GCC12/bin/gcc -std=gnu99 -o
conftest -g -I/nasfarm/edelsohn/install/include  -static-libstdc++
-static-libgcc -Wl,-bbigtoc conftest.c  /nasfarm/edelsohn/install/lib/
libintl.a >&5

ld: 0711-317 ERROR: Undefined symbol: .libiconv_open

ld: 0711-317 ERROR: Undefined symbol: .libiconv_set_relocation_prefix

ld: 0711-317 ERROR: Undefined symbol: .libiconv_close

ld: 0711-317 ERROR: Undefined symbol: .libiconv

ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more
information.

collect2: error: ld returned 8 exit status

configure:8355: $? = 1

configure:8392: /nasfarm/edelsohn/install/GCC12/bin/gcc -std=gnu99 -o
conftest -g -I/nasfarm/edelsohn/install/include  -static-libstdc++
-static-libgcc -Wl,-bbigtoc conftest.c  /nasfarm/edelsohn/install/lib/
libintl.a /nasfarm/edelsohn/install/lib/libiconv.a >&5

configure:8392: $? = 0

configure:8405: result: yes

configure:8440: checking whether to use NLS

configure:8442: result: yes

configure:8445: checking where the gettext function comes from

configure:8456: result: external libintl

configure:8464: checking how to link with libintl

configure:8466: result: /nasfarm/edelsohn/install/lib/libintl.a
/nasfarm/edelsohn/install/lib/libiconv.a

configure:8525: checking whether NLS is requested

configure:8531: result: yes

gcc/config.log:

configure:14002: checking for GNU gettext in libc

configure:14031: /nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11 -o
conftest

-g-static-libstdc++ -static-libgcc -Wl,-bbigtoc  conftest.cpp  >&5

conftest.cpp:196:10: fatal error: libintl.h: No such file or directory

  196 | #include 

  |  ^~~

configure:14710: checking for GNU gettext in libintl

configure:14747: /nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11 -o
conftest -g-I/nasfarm/edelsohn/install/include -static-libstdc++
-static-libgcc -Wl,-bbigtoc  conftest.cpp  /nasfarm/edelsohn/install/lib/
libintl.a >&5

ld: 0711-317 ERROR: Undefined symbol: .libiconv_open

ld: 0711-317 ERROR: Undefined symbol: .libiconv_set_relocation_prefix

ld: 0711-317 ERROR: Undefined symbol: .libiconv_close

ld: 0711-317 ERROR: Undefined symbol: .libiconv

ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more
information.

collect2: error: ld returned 8 exit status

configure:14747: $? = 1

configure:14797: result: no

configure:14832: checking whether to use NLS

configure:14834: result: no

configure:14917: checking whether NLS is requested

configure:14920: result: no




> I've never used AIX.  Can I reproduce this on one of the cfarm machines
> to poke around?  I've tried cfarm119, but that one lacked git, and I
> haven't poked around much further due to time constraints.
>
> TIA, sorry about the inconvenience.  Have a lovely day.
>
> > The current gettext-0.22.3 fails to build for me on AIX.
> >
> > libcpp configure believes that NLS functions on AIX, but gcc configure
> > fails in its tests of gettext functionality, which leads to an
> inconsistent
> > configuration and build breakage.
> >
> > Thanks, David
>
>
> --
> Arsen Arsenović
>


Re: building GNU gettext on AIX

2023-11-15 Thread David Edelsohn
When I try to configure gettext-0.22.3, I receive the following error:

checking for socklen_t equivalent... configure: error: Cannot find a type
to use in place of socklen_t

configure: error:
/nasfarm/edelsohn/src/gettext-0.22.3/libtextstyle/configure failed for
libtextstyle


configure:43943: /nasfarm/edelsohn/install/GCC12/bin/gcc -c -g -O2
-D_THREAD_SAFE
conftest.c >&5

conftest.c:112:18: error: two or more data types in declaration specifiers

  112 | #define intmax_t long long

  |  ^~~~

conftest.c:112:23: error: two or more data types in declaration specifiers

  112 | #define intmax_t long long

  |   ^~~~

In file included from conftest.c:212:

conftest.c:214:24: error: conflicting types for 'ngetpeername'; have
'int(int,  void *, long unsigned int *)'

  214 |int getpeername (int, void *, unsigned long int
*);

  |^~~

/nasfarm/edelsohn/install/GCC12/lib/gcc/powerpc-ibm-aix7.2.5.0/12.1.1/include-fixed/sys/socket.h:647:9:
note: previous declaration of 'ngetpeername' with type 'int(int,  struct
sockaddr * restrict,  socklen_t * restrict)' {aka 'int(int,  struct
sockaddr * restrict,  long unsigned int * restrict)'}

  647 | int getpeername(int, struct sockaddr *__restrict__, socklen_t
*__restrict__);

  | ^~~


configure and config.h seems to get itself confused about types.


David



On Wed, Nov 15, 2023 at 7:29 AM Bruno Haible  wrote:

> [CCing bug-gettext]
>
> David Edelsohn wrote in
> :
> > The current gettext-0.22.3 fails to build for me on AIX.
>
> Here are some hints to get a successful build of GNU gettext on AIX:
>
> 1. Set the recommended environment variables before running configure:
>https://gitlab.com/ghwiki/gnow-how/-/wikis/Platforms/Configuration
>
>Namely:
>* for a 32-bit build with gcc:
>  CC=gcc
>  CXX=g++
>  CPPFLAGS="-I$PREFIX/include"
>  LDFLAGS="-L$PREFIX/lib"
>  unset AR NM
>* for a 32-bit build with xlc:
>  CC="xlc -qthreaded -qtls"
>  CXX="xlC -qthreaded -qtls"
>  CPPFLAGS="-I$PREFIX/include"
>  LDFLAGS="-L$PREFIX/lib"
>  unset AR NM
>* for a 64-bit build with gcc:
>  CC="gcc -maix64"
>  CXX="g++ -maix64"
>  CPPFLAGS="-I$PREFIX/include"
>  LDFLAGS="-L$PREFIX/lib"
>  AR="ar -X 64"; NM="nm -X 64 -B"
>* for a 64-bit build with xlc:
>  CC="xlc -q64 -qthreaded -qtls"
>  CXX="xlC -q64 -qthreaded -qtls"
>  CPPFLAGS="-I$PREFIX/include"
>  LDFLAGS="-L$PREFIX/lib"
>  AR="ar -X 64"; NM="nm -X 64 -B"
>
>where $PREFIX is the value that you pass to the --prefix configure
> option.
>
>Rationale: you can run into all sorts of problems if you choose compiler
>options at random and haven't experience with compiler options on that
>platform.
>
> 2. Don't use ibm-clang.
>
>Rationale: It's broken.
>
> 3. Don't use -Wall with gcc 10.3.
>
>Rationale: If you specify -Wall, gettext's configure adds -fanalyzer,
> which
>has excessive memory requirements in gcc 10.x. In particular, on AIX, it
>makes cc1 crash while compiling regex.c after it has consumed 1 GiB of
> RAM.
>
> 4. Avoid using a --prefix that contains earlier installations of the same
>package.
>
>Rationale: Because the AIX linker hardcodes directory names in shared
>libraries, GNU libtool has a peculiar configuration on AIX. It ends up
>mixing the in-build-tree libraries with the libraries in the install
>locations, leading to all sorts of errors.
>
>If you really need to use a --prefix that contains an earlier
>installation of the same package:
>  - Either use --disable-shared and remove libgettextlib.a and
>libgettextsrc.a from $PREFIX/lib before starting the build.
>  - Or use a mix of "make -k", "make -k install" and ad-hoc workarounds
>that cannot be described in a general way.
>
> Bruno
>
>
>
>


Re: [PATCH] RISC-V: Fix ICE in non-canonical march parsing

2023-11-15 Thread Patrick O'Neill

Does relax mean no longer enforcing the canonical order of extensions?

Patrick

On 11/14/23 17:52, Kito Cheng wrote:


LGTM, and BTW...I am thinking we could relax the canonical order
during parsing, did you have interesting and time working on that
item?

On Wed, Nov 15, 2023 at 9:35 AM Patrick O'Neill  wrote:

Passing in a base extension in non-canonical order (i, e, g) causes GCC
to ICE:
xgcc: error: '-march=rv64ge': ISA string is not in canonical order. 'e'
xgcc: internal compiler error: in add, at 
common/config/riscv/riscv-common.cc:671
...

This is fixed by skipping to the next extension when a non-canonical
order is detected.

gcc/ChangeLog:

 * common/config/riscv/riscv-common.cc
 (riscv_subset_list::parse_std_ext): Emit an error and skip to
 the next extension when a non-canonical ordering is detected.

[Committed] RISC-V: Fix ICE in non-canonical march parsing

2023-11-15 Thread Patrick O'Neill
Updated testcase names and committed.

Thanks,
Patrick

---

Passing in a base extension in non-canonical order (i, e, g) causes GCC
to ICE:
xgcc: error: '-march=rv64ge': ISA string is not in canonical order. 'e'
xgcc: internal compiler error: in add, at 
common/config/riscv/riscv-common.cc:671
...

This is fixed by skipping to the next extension when a non-canonical
order is detected.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_std_ext): Emit an error and skip to
the next extension when a non-canonical ordering is detected.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-27.c: New test.
* gcc.target/riscv/arch-28.c: New test.

Signed-off-by: Patrick O'Neill 
---
Tested using rv64gc glibc on QEMU.
---
 gcc/common/config/riscv/riscv-common.cc  | 17 +
 gcc/testsuite/gcc.target/riscv/arch-27.c |  7 +++
 gcc/testsuite/gcc.target/riscv/arch-28.c |  7 +++
 3 files changed, 27 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-27.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-28.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 526dbb7603b..57fe856063e 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1017,15 +1017,24 @@ riscv_subset_list::parse_std_ext (const char *p)
   std_ext = *p;
 
   /* Checking canonical order.  */
+  const char *prior_std_exts = std_exts;
+
   while (*std_exts && std_ext != *std_exts)
std_exts++;
 
   subset[0] = std_ext;
   if (std_ext != *std_exts && standard_extensions_p (subset))
-   error_at (m_loc,
- "%<-march=%s%>: ISA string is not in canonical order. "
- "%<%c%>",
- m_arch, *p);
+   {
+ error_at (m_loc,
+   "%<-march=%s%>: ISA string is not in canonical order. "
+   "%<%c%>",
+   m_arch, *p);
+ /* Extension ordering is invalid.  Ignore this extension and keep
+searching for other issues with remaining extensions.  */
+ std_exts = prior_std_exts;
+ p++;
+ continue;
+   }
 
   std_exts++;
 
diff --git a/gcc/testsuite/gcc.target/riscv/arch-27.c 
b/gcc/testsuite/gcc.target/riscv/arch-27.c
new file mode 100644
index 000..70143b2156f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-27.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64ge -mabi=lp64d" } */
+int foo()
+{
+}
+
+/* { dg-error "ISA string is not in canonical order. 'e'" "" { target *-*-* } 
0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-28.c 
b/gcc/testsuite/gcc.target/riscv/arch-28.c
new file mode 100644
index 000..934399a7b3a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-28.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imaefcv -mabi=lp64d" } */
+int foo()
+{
+}
+
+/* { dg-error "ISA string is not in canonical order. 'e'" "" { target *-*-* } 
0 } */
-- 
2.34.1




[PATCH] c++: constantness of call to function pointer [PR111703]

2023-11-15 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/13/12 (to match the PR107939 / r13-6525-ge09bc034d1b4d6 backports)?

-- >8 --

potential_constant_expression for a CALL_EXPR to a non-overload tests
FUNCTION_POINTER_TYPE_P on the callee rather than on the type of the
callee, which means we always pass want_rval=any when recursing and so
may fail to properly treat a non-constant function pointer callee as such.
Fixing this turns out to further work around the PR111703 issue.

PR c++/111703
PR c++/107939

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) :
Fix FUNCTION_POINTER_TYPE_P test.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-fn8.C: Extend test.
* g++.dg/diagnostic/constexpr4.C: New test.
---
 gcc/cp/constexpr.cc  | 4 +++-
 gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C| 2 ++
 gcc/testsuite/g++.dg/diagnostic/constexpr4.C | 9 +
 3 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/constexpr4.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 8a6b210144a..5ecc30117a1 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9547,7 +9547,9 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
  }
else if (fun)
   {
-   if (RECUR (fun, FUNCTION_POINTER_TYPE_P (fun) ? rval : any))
+   if (RECUR (fun, (TREE_TYPE (fun)
+&& FUNCTION_POINTER_TYPE_P (TREE_TYPE (fun))
+? rval : any)))
  /* Might end up being a constant function pointer.  But it
 could also be a function object with constexpr op(), so
 we pass 'any' so that the underlying VAR_DECL is deemed
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
index 3f63a5b28d7..c63d26c931d 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
@@ -15,10 +15,12 @@ struct P {
 };
 
 void (*f)(P);
+P (*h)(P);
 
 template
 constexpr bool g() {
   P x;
   f(x); // { dg-bogus "from here" }
+  f(h(x)); // { dg-bogus "from here" }
   return true;
 }
diff --git a/gcc/testsuite/g++.dg/diagnostic/constexpr4.C 
b/gcc/testsuite/g++.dg/diagnostic/constexpr4.C
new file mode 100644
index 000..f971f533b08
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/constexpr4.C
@@ -0,0 +1,9 @@
+// Verify we diagnose a call to a non-constant function pointer ahead of time.
+// { dg-do compile { target c++11 } }
+
+int (*f)(int);
+
+template
+void g() {
+  static_assert(f(N) == 0, ""); // { dg-error "non-constant|'f' is not usable" 
}
+}
-- 
2.43.0.rc1



Re: [PATCH 04/14] c++: use _P() defines from tree.h

2023-11-15 Thread Bernhard Reutner-Fischer
On Tue, 8 Aug 2023 16:31:39 -0400
Jason Merrill  wrote:

> On 8/2/23 12:51, Patrick Palka via Gcc-patches wrote:
> > On Thu, Jun 1, 2023 at 2:11 PM Bernhard Reutner-Fischer
> >  wrote:  
> >>
> >> Hi David, Patrick,
> >>
> >> On Thu, 1 Jun 2023 18:33:46 +0200
> >> Bernhard Reutner-Fischer  wrote:
> >>  
> >>> On Thu, 1 Jun 2023 11:24:06 -0400
> >>> Patrick Palka  wrote:
> >>>  
>  On Sat, May 13, 2023 at 7:26 PM Bernhard Reutner-Fischer via
>  Gcc-patches  wrote:  
> >>>  
> > diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
> > index 131b212ff73..19dfb3ed782 100644
> > --- a/gcc/cp/tree.cc
> > +++ b/gcc/cp/tree.cc
> > @@ -1173,7 +1173,7 @@ build_cplus_array_type (tree elt_type, tree 
> > index_type, int dependent)
> >   }
> >
> > /* Avoid spurious warnings with VLAs (c++/54583).  */
> > -  if (TYPE_SIZE (t) && EXPR_P (TYPE_SIZE (t)))
> > +  if (CAN_HAVE_LOCATION_P (TYPE_SIZE (t)))  
> 
>  Hmm, this change seems undesirable...  
> >>>
> >>> mhm, yes that is misleading. I'll prepare a patch to revert this.
> >>> Let me have a look if there were other such CAN_HAVE_LOCATION_P changes
> >>> that we'd want to revert.  
> >>
> >> Sorry for that!
> >> I'd revert the hunk above and the one in gcc-rich-location.cc
> >> (maybe_range_label_for_tree_type_mismatch::get_text), please see
> >> attached. Bootstrap running, ok for trunk if it passes?  
> > 
> > LGTM!  
> 
> Yes, OK.

Now applied as r14-5508 (186331063dfbcf1eacb445c473d92634c9baa90f)

thanks


[Committed] RISC-V: fix vsetvli pass testsuite failure [PR/112447]

2023-11-15 Thread Vineet Gupta
From: Juzhe-Zhong 

Fixes: f0e28d8c1371 ("RISC-V: Fix failed hoist in LICM of vmv.v.x instruction")

Since above commit, we have following failure:

  FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions execution test
  FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test

The issue was not the commit but rather it unravelled an issue in the
vsetvli pass.

Here's Juzhe's analysis:

We have 2 types of global vsetvls insertion.
One is earliest fusion of each end of the block.
The other is LCM suggested edge vsetvls.

So before this patch, insertion as follows:

|  (insn 2817 2820 2818 361 (set (reg:SI 67 vtype)
|(unspec:SI [
|(const_int 8 [0x8])
|(const_int 7 [0x7])
|(const_int 1 [0x1]) repeated x2
|] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))
|  (insn 2818 2817 999 361 (set (reg:SI 67 vtype)
|(unspec:SI [
|(const_int 32 [0x20])
|(const_int 1 [0x1]) repeated x3
|] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))

After this patch:

|  (insn 2817 2820 2819 361 (set (reg:SI 67 vtype)
|(unspec:SI [
|(const_int 32 [0x20])
|(const_int 1 [0x1]) repeated x3
|] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))
|  (insn 2819 2817 999 361 (set (reg:SI 67 vtype)
|(unspec:SI [
|(const_int 8 [0x8])
|(const_int 7 [0x7])
|(const_int 1 [0x1]) repeated x2
|] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))

The original insertion order is incorrect.

We should first insert earliest fusion since it is the vsetvls information
already there which was seen by later LCM. We just delay the insertion.
So it should be come before the LCM suggested insertion.

PR target/112447

gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Insert
local vsetvl info before LCM suggested one.

Tested-by: Patrick O'Neill  # pre-commit-CI #679
Co-developed-by: Vineet Gupta 

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv-vsetvl.cc | 70 
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 8466b5d019ea..74367ec8d8e9 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3229,6 +3229,41 @@ pre_vsetvl::emit_vsetvl ()
   remove_vsetvl_insn (item);
 }
 
+  /* Insert vsetvl info that was not deleted after lift up.  */
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  const vsetvl_block_info _info = get_block_info (bb);
+  if (!block_info.has_info ())
+   continue;
+
+  const vsetvl_info _info = block_info.get_exit_info ();
+
+  if (footer_info.delete_p ())
+   continue;
+
+  edge eg;
+  edge_iterator eg_iterator;
+  FOR_EACH_EDGE (eg, eg_iterator, bb->cfg_bb ()->succs)
+   {
+ gcc_assert (!(eg->flags & EDGE_ABNORMAL));
+ if (dump_file)
+   {
+ fprintf (
+   dump_file,
+   "\n  Insert missed vsetvl info at edge(bb %u -> bb %u): ",
+   eg->src->index, eg->dest->index);
+ footer_info.dump (dump_file, "");
+   }
+ start_sequence ();
+ insert_vsetvl_insn (EMIT_DIRECT, footer_info);
+ rtx_insn *rinsn = get_insns ();
+ end_sequence ();
+ default_rtl_profile ();
+ insert_insn_on_edge (rinsn, eg);
+ need_commit = true;
+   }
+}
+
   /* m_insert vsetvl as LCM suggest. */
   for (int ed = 0; ed < NUM_EDGES (m_edges); ed++)
 {
@@ -3267,41 +3302,6 @@ pre_vsetvl::emit_vsetvl ()
   insert_insn_on_edge (rinsn, eg);
 }
 
-  /* Insert vsetvl info that was not deleted after lift up.  */
-  for (const bb_info *bb : crtl->ssa->bbs ())
-{
-  const vsetvl_block_info _info = get_block_info (bb);
-  if (!block_info.has_info ())
-   continue;
-
-  const vsetvl_info _info = block_info.get_exit_info ();
-
-  if (footer_info.delete_p ())
-   continue;
-
-  edge eg;
-  edge_iterator eg_iterator;
-  FOR_EACH_EDGE (eg, eg_iterator, bb->cfg_bb ()->succs)
-   {
- gcc_assert (!(eg->flags & EDGE_ABNORMAL));
- if (dump_file)
-   {
- fprintf (
-   dump_file,
-   "\n  Insert missed vsetvl info at edge(bb %u -> bb %u): ",
-   eg->src->index, eg->dest->index);
- footer_info.dump (dump_file, "");
-   }
- start_sequence ();
- insert_vsetvl_insn (EMIT_DIRECT, footer_info);
- rtx_insn *rinsn = get_insns ();
- end_sequence ();
- default_rtl_profile ();
- insert_insn_on_edge (rinsn, eg);
- 

[Committed] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-11-15 Thread Vineet Gupta
RV64 compare and branch instructions only support 64-bit operands.
At Expand time, the backend conservatively zero/sign extends
its operands even if not needed, such as incoming function args
which ABI/ISA guarantee to be sign-extended already (this is true for
SI, HI, QI operands)

And subsequently REE fails to eliminate them as
   "missing defintion(s)" or "multiple definition(s)
since function args don't have explicit definition.

So during expand riscv_extend_comparands (), if an operand is a
subreg-promoted SI with inner DI, which is representative of a function
arg, just peel away the subreg to expose the DI, eliding the sign
extension. As Jeff noted this routine is also used in if-conversion so
potentially can also help there.

Note there's currently patches floating around to improve REE and also a
new pass to eliminate unneccesary extensions, but it is still beneficial
to not generate those extra extensions in first place. It is obviously
less work for post-reload passes such as REE, but even for earlier
passes, such as combine, having to deal with one less thing and ensuing
fewer combinations is a win too.

Way too many existing tests used to observe this issue.
e.g. gcc.c-torture/compile/20190827-1.c -O2 -march=rv64gc
It elimiates the SEXT.W

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sign_extend_if_not_subreg_prom): New.
* (riscv_extend_comparands): Call New function on operands.

Tested-by: Patrick O'Neill  # pre-commit-CI #676
Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e919850fc6cb..e466d4f168af 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3695,6 +3695,24 @@ riscv_zero_if_equal (rtx cmp0, rtx cmp1)
   cmp0, cmp1, 0, 0, OPTAB_DIRECT);
 }
 
+/* Helper function for riscv_extend_comparands to Sign-extend the OP.
+   However if the OP is SI subreg promoted with an inner DI, such as
+   (subreg/s/v:SI (reg/v:DI) 0)
+   just peel off the SUBREG to get DI, avoiding extraneous extension.  */
+
+static void
+riscv_sign_extend_if_not_subreg_prom (rtx *op)
+{
+  if (GET_CODE (*op) == SUBREG
+  && SUBREG_PROMOTED_VAR_P (*op)
+  && SUBREG_PROMOTED_SIGNED_P (*op)
+  && (GET_MODE_SIZE (GET_MODE (XEXP (*op, 0))).to_constant ()
+ == GET_MODE_SIZE (word_mode)))
+*op = XEXP (*op, 0);
+  else
+*op = gen_rtx_SIGN_EXTEND (word_mode, *op);
+}
+
 /* Sign- or zero-extend OP0 and OP1 for integer comparisons.  */
 
 static void
@@ -3724,9 +3742,10 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx 
*op1)
}
   else
{
- *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+ riscv_sign_extend_if_not_subreg_prom (op0);
+
  if (*op1 != const0_rtx)
-   *op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
+   riscv_sign_extend_if_not_subreg_prom (op1);
}
 }
 }
-- 
2.34.1



Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-15 Thread David Edelsohn
On Wed, Nov 15, 2023 at 9:22 AM Arsen Arsenović  wrote:

>
> David Edelsohn  writes:
>
> > GCC had been working on AIX with NLS, using "--with-included-gettext".
> > --disable-nls gets past the breakage, but GCC does not build for me on
> AIX
> > with NLS enabled.
>
> That should still work with gettext 0.22+ extracted in-tree (it should
> be fetched by download_prerequisites).
>
> > A change in dependencies for GCC should have been announced and more
> widely
> > socialized in the GCC development mailing list, not just GCC patches
> > mailing list.
> >
> > I have tried both the AIX Open Source libiconv and libgettext package,
> and
> > the ones that I previously built.  Both fail because GCC configure
> decides
> > to disable NLS, despite being requested, while libcpp is satisfied, so
> > tools in the gcc subdirectory don't link against libiconv and the build
> > fails.  With the included gettext, I was able to rely on a
> self-consistent
> > solution.
>
> That is interesting.  They should be using the same checks.  I've
> checked trunk and regenerated files on it, and saw no significant diff
> (some whitespace changes only).  Could you post the config.log of both?
>
> I've never used AIX.  Can I reproduce this on one of the cfarm machines
> to poke around?  I've tried cfarm119, but that one lacked git, and I
> haven't poked around much further due to time constraints.
>

The AIX system in the Compile Farm has a complete complement of Open Source
software installed.

Please ensure that /opt/freeware/bin is in your path.  Also, the GCC Wiki
Compile Farm page has build tips that include AIX

https://gcc.gnu.org/wiki/CompileFarm#Services_and_software_installed_on_farm_machines

that recommended --with-included-gettext configuration option.

Thanks, David


>
> TIA, sorry about the inconvenience.  Have a lovely day.
>
> > The current gettext-0.22.3 fails to build for me on AIX.
> >
> > libcpp configure believes that NLS functions on AIX, but gcc configure
> > fails in its tests of gettext functionality, which leads to an
> inconsistent
> > configuration and build breakage.
> >
> > Thanks, David
>
>
> --
> Arsen Arsenović
>


[PATCH 4/6]AArch64: Add new generic-armv9-a CPU and make it the default for Armv9

2023-11-15 Thread Tamar Christina
Hi All,

This patch adds a new generic scheduling model "generic-armv9-a" and makes it
the default for all Armv9 architectures.

-mcpu=generic and -mtune=generic is kept around for those that really want the
deprecated cost model.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/111370
* config/aarch64/aarch64-arches.def (armv9-a, armv9.1-a, armv9.2-a,
armv9.3-a): Update to generic-armv9-a.
* config/aarch64/aarch64-cores.def (generic-armv9-a): New.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc: Include generic_armv9_a.h.
* config/aarch64/tuning_models/generic_armv9_a.h: New file.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 
f89e4ea1f48acc2875c9a834d93d94c94163cddc..6b9a19c490ba0b35082077e877b19906138f039b
 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -40,9 +40,9 @@ AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 
8,  (V8_5A, I8MM, BF
 AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A, LS64))
 AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, MOPS))
 AARCH64_ARCH("armv8-r",   generic_armv8_a,   V8R  , 8,  (V8_4A))
-AARCH64_ARCH("armv9-a",   generic,   V9A  , 9,  (V8_5A, SVE2))
-AARCH64_ARCH("armv9.1-a", generic,   V9_1A, 9,  (V8_6A, V9A))
-AARCH64_ARCH("armv9.2-a", generic,   V9_2A, 9,  (V8_7A, V9_1A))
-AARCH64_ARCH("armv9.3-a", generic,   V9_3A, 9,  (V8_8A, V9_2A))
+AARCH64_ARCH("armv9-a",   generic_armv9_a,   V9A  , 9,  (V8_5A, SVE2))
+AARCH64_ARCH("armv9.1-a", generic_armv9_a,   V9_1A, 9,  (V8_6A, V9A))
+AARCH64_ARCH("armv9.2-a", generic_armv9_a,   V9_2A, 9,  (V8_7A, V9_1A))
+AARCH64_ARCH("armv9.3-a", generic_armv9_a,   V9_3A, 9,  (V8_8A, V9_2A))
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
30f4dd04ed71823bc34c0c405d49963b6b2d1375..16752b77f4baf8d1aa8a5406826aa29e367120c5
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -191,6 +191,7 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, 
BF16, SVE2_BITPERM, RNG,
 
 /* Generic Architecture Processors.  */
 AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
-AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A,  (), 
generic_armv8_a, 0x0, 0x0, -1)
+AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A, (), 
generic_armv8_a, 0x0, 0x0, -1)
+AARCH64_CORE("generic-armv9-a",  generic_armv9_a, cortexa53, V9A, (), 
generic_armv9_a, 0x0, 0x0, -1)
 
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
0a32056f255de455f47a0b7395dfef0af84c6b5e..61bb85211252970f0a0526929d6b88353bdd930f
 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
(const (symbol_ref "((enum 

[PATCH 6/6]AArch64: only emit mismatch error when features would be disabled.

2023-11-15 Thread Tamar Christina
Hi All,

At the moment we emit a warning whenever you specify both -march and -mcpu
and the architecture of them differ.  The idea originally was that the user may
not be aware of this change.

However this has a few problems:

1.  Architecture revisions is not an observable part of the architecture,
extensions are.  Starting with GCC 14 we have therefore relaxed the rule 
that
all extensions can be enabled at any architecture level.  Therefore it's
incorrect, or at least not useful to keep the check on architecture.

2.  It's problematic in Makefiles and other build systems, where you want to
for certain files enable CPU specific builds.  i.e. you may be by default
building for -march=armv8-a but for some file for -mcpu=neoverse-n1.  Since
there's no easy way to remove the earlier options we end up warning and
there's no way to disable just this warning.  Build systems compiling with
-Werror face an issue in this case that compiling with GCC is needlessly
hard.

3. It doesn't actually warn for cases that may lead to issues, so e.g.
   -march=armv8.2-a+sve -mcpu=neoverse-n1 does not give a warning that SVE would
   be disabled.

For this reason I have one of two proposals:

1.  Just remove this warning all together.

2.  Rework the warning based on extensions and only warn when features would be
disabled by the presence of the -mcpu.  This is the approach this patch has
taken.

As examples:

> aarch64-none-linux-gnu-gcc -march=armv8.2-a+sve -mcpu=neoverse-n1
cc1: warning: switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8.2-a+sve’ 
switch and resulted in options +crc+sve+norcpc+nodotprod being added

.arch armv8.2-a+crc+sve

> aarch64-none-linux-gnu-gcc -march=armv8.2-a -mcpu=neoverse-n1
> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n1
> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n2


The one remaining issue here is that if both -march and -mcpu are specified we
pick the -march.  This is not particularly obvious and for the use case to be
more useful I think it makes sense to pick the CPU's arch?

I did not make that change in the patch as it changes semantics.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Note that I can't write a test for this because dg-warning expects warnings to
be at a particular line and doesn't support warnings at the "global" level.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options): Rework warnings.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc010dcc0b138db29caf7f
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16388,12 +16388,22 @@ aarch64_override_options (void)
   if (cpu && arch)
 {
   /* If both -mcpu and -march are specified, warn if they are not
-architecturally compatible and prefer the -march ISA flags.  */
-  if (arch->arch != cpu->arch)
-   {
- warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch",
+feature compatible.  feature compatible means that the inclusion of the
+cpu features would end up disabling an achitecture feature.  In
+otherwords the cpu features need to be a strict superset of the arch
+features and if so prefer the -march ISA flags.  */
+  auto full_arch_flags = arch->flags | arch_isa;
+  auto full_cpu_flags = cpu->flags | cpu_isa;
+  if (~full_cpu_flags & full_arch_flags)
+   {
+ std::string ext_diff
+   = aarch64_get_extension_string_for_isa_flags (full_arch_flags,
+ full_cpu_flags);
+ warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch "
+ "and resulted in options %s being added",
   aarch64_cpu_string,
-  aarch64_arch_string);
+  aarch64_arch_string,
+  ext_diff.c_str ());
}
 
   selected_arch = arch->arch;




-- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc010dcc0b138db29caf7f
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16388,12 +16388,22 @@ aarch64_override_options (void)
   if (cpu && arch)
 {
   /* If both -mcpu and -march are specified, warn if they are not
-architecturally compatible and prefer the -march ISA flags.  */
-  if (arch->arch != cpu->arch)
-   {
- warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch",
+feature compatible.  feature compatible means that the inclusion of 

[PATCH 3/6]AArch64: Add new generic-armv8-a CPU and make it the default.

2023-11-15 Thread Tamar Christina
Hi All,

This patch adds a new generic scheduling model "generic-armv8-a" and makes it
the default for all Armv8 architectures.

-mcpu=generic and -mtune=generic is kept around for those that really want the
deprecated cost model.

This shows on SPECCPU 2017 the following:

generic:  SPECINT 1.0% imporvement in geomean, SPECFP -0.6%.  The SPECFP is due
  to fotonik3d_r where we vectorize an FP calculation that only ever
  needs one lane of the result.  This I believe is a generic costing bug
  but at the moment we can't change costs of FP and INT independently.
  So will defer updating that cost to stage3 after Richard's other
  costing updates land.

generic SVE: SPECINT 1.1% improvement in geomean, SPECFP 0.7% improvement.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/111370
* config/aarch64/aarch64-arches.def (armv8-9, armv8-a, armv8.1-a,
armv8.2-a, armv8.3-a, armv8.4-a, armv8.5-a, armv8.6-a, armv8.7-a,
armv8.8-a): Update to generic_armv8_a.
* config/aarch64/aarch64-cores.def (generic-armv8-a): New.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc: Include generic_armv8_a.h
* config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Change to
TARGET_CPU_generic_armv8_a.
* config/aarch64/tuning_models/generic_armv8_a.h: New file.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 
7ae92aa8e984e0a77efd5c5a5061c4c6f86e0118..f89e4ea1f48acc2875c9a834d93d94c94163cddc
 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -30,19 +30,19 @@
Due to the assumptions about the positions of these fields in config.gcc,
NAME should be kept as the first argument.  */
 
-AARCH64_ARCH("armv8-a",   generic,   V8A,   8,  (SIMD))
-AARCH64_ARCH("armv8.1-a", generic,   V8_1A, 8,  (V8A, LSE, CRC, 
RDMA))
-AARCH64_ARCH("armv8.2-a", generic,   V8_2A, 8,  (V8_1A))
-AARCH64_ARCH("armv8.3-a", generic,   V8_3A, 8,  (V8_2A, PAUTH, 
RCPC))
-AARCH64_ARCH("armv8.4-a", generic,   V8_4A, 8,  (V8_3A, F16FML, 
DOTPROD, FLAGM))
-AARCH64_ARCH("armv8.5-a", generic,   V8_5A, 8,  (V8_4A, SB, SSBS, 
PREDRES))
-AARCH64_ARCH("armv8.6-a", generic,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
-AARCH64_ARCH("armv8.7-a", generic,   V8_7A, 8,  (V8_6A, LS64))
-AARCH64_ARCH("armv8.8-a", generic,   V8_8A, 8,  (V8_7A, MOPS))
-AARCH64_ARCH("armv8-r",   generic,   V8R  , 8,  (V8_4A))
-AARCH64_ARCH("armv9-a",   generic,   V9A  , 9,  (V8_5A, SVE2))
-AARCH64_ARCH("armv9.1-a", generic,   V9_1A, 9,  (V8_6A, V9A))
-AARCH64_ARCH("armv9.2-a", generic,   V9_2A, 9,  (V8_7A, V9_1A))
-AARCH64_ARCH("armv9.3-a", generic,   V9_3A, 9,  (V8_8A, V9_2A))
+AARCH64_ARCH("armv8-a",   generic_armv8_a,   V8A,   8,  (SIMD))
+AARCH64_ARCH("armv8.1-a", generic_armv8_a,   V8_1A, 8,  (V8A, LSE, 
CRC, RDMA))
+AARCH64_ARCH("armv8.2-a", generic_armv8_a,   V8_2A, 8,  (V8_1A))
+AARCH64_ARCH("armv8.3-a", generic_armv8_a,   V8_3A, 8,  (V8_2A, PAUTH, 
RCPC))
+AARCH64_ARCH("armv8.4-a", generic_armv8_a,   V8_4A, 8,  (V8_3A, 
F16FML, DOTPROD, FLAGM))
+AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 8,  (V8_4A, SB, 
SSBS, PREDRES))
+AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
+AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A, LS64))
+AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, MOPS))
+AARCH64_ARCH("armv8-r",   generic_armv8_a,   V8R  , 8,  (V8_4A))
+AARCH64_ARCH("armv9-a",   generic,   V9A  , 9,  (V8_5A, SVE2))
+AARCH64_ARCH("armv9.1-a", generic,   V9_1A, 9,  (V8_6A, V9A))
+AARCH64_ARCH("armv9.2-a", generic,   V9_2A, 9,  (V8_7A, V9_1A))
+AARCH64_ARCH("armv9.3-a", generic,   V9_3A, 9,  (V8_8A, V9_2A))
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
3e363bd0e8bbc10cb5b28d6183647736318e6d40..30f4dd04ed71823bc34c0c405d49963b6b2d1375
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -191,5 +191,6 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, 
BF16, SVE2_BITPERM, RNG,
 
 /* Generic Architecture Processors.  */
 AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
+AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A,  (), 
generic_armv8_a, 0x0, 0x0, -1)
 
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 

[PATCH 2/6]AArch64: Remove special handling of generic cpu.

2023-11-15 Thread Tamar Christina
Hi All,

In anticipation of adding new generic turning values this removes the hardcoding
of the "generic" CPU and instead just specifies it as a normal CPU.

No change in behavior is expected.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/111370
* config/aarch64/aarch64-cores.def: Add generic.
* config/aarch64/aarch64-opts.h (enum aarch64_proc): Remove generic.
* config/aarch64/aarch64-tune.md: Regenerate
* config/aarch64/aarch64.cc (all_cores): Remove generic
* config/aarch64/aarch64.h (enum target_cpus): Remove
TARGET_CPU_generic.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
eae40b29df6f8ae353d168b6f73845846d1da94b..3e363bd0e8bbc10cb5b28d6183647736318e6d40
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -189,4 +189,7 @@ AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, 
(I8MM, BF16, SVE2_BITPER
 AARCH64_CORE("neoverse-v2", neoversev2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, 
RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 
+/* Generic Architecture Processors.  */
+AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
+
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
831e28ab52a4271ef5467965039a32d078755d42..01151e93d17979f499523cabb74a449170483a70
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -32,8 +32,6 @@ enum aarch64_processor
 #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, 
PART, VARIANT) \
   INTERNAL_IDENT,
 #include "aarch64-cores.def"
-  /* Used to indicate that no processor has been specified.  */
-  generic,
   /* Used to mark the end of the processor table.  */
   aarch64_none
 };
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
c969277d617ad5fd070a915bfedb83323eb71e6c..cd5d79ea9c221874578a4d5804e4f618e671ebcd
 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
d74e9116fc56cfa85558cc0810f76479e7280f69..b178bb5b62dbdcb1f5edbad4155416d6093a11f3
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -720,7 +720,6 @@ enum target_cpus
 #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, 
PART, VARIANT) \
   TARGET_CPU_##INTERNAL_IDENT,
 #include "aarch64-cores.def"
-  TARGET_CPU_generic
 };
 
 /* If there is no CPU defined at configure, use generic as default.  */
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
07b1cde39209f5c7740e336b499e9aed31e4c515..086448632700bc97b0d4c75d85cef63f820e9944
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -427,8 +427,6 @@ static const struct processor 

[PATCH]AArch64: only discount MLA for vector and scalar statements

2023-11-15 Thread Tamar Christina
Hi All,

In testcases gcc.dg/tree-ssa/slsr-19.c  and gcc.dg/tree-ssa/slsr-20.c we have a
fairly simple computation.  On the current generic costing we generate:

f:
add w0, w0, 2
maddw1, w0, w1, w1
lsl w0, w1, 1
ret

but on any other cost model but generic (including the new up coming generic)
we generate:

f:
adrpx2, .LC0
dup v31.2s, w0
fmovs30, w1
ldr d29, [x2, #:lo12:.LC0]
add v31.2s, v31.2s, v29.2s
mul v31.2s, v31.2s, v30.s[0]
addpv31.2s, v31.2s, v31.2s
fmovw0, s31
ret
.LC0:
.word   2
.word   4

This seems to be because the vectorizer thinks the vector transfers are free:

x1_4 + x2_6 1 times vector_stmt costs 0 in body
x1_4 + x2_6 1 times vec_to_scalar costs 0 in body  

This happens because the stmt it's using to get the cost of register transfers
for the given type happens to be one feeding into a MUL.  we incorrectly
discount the + for the register transfer.

This is fixed by guarding the check for aarch64_multiply_add_p with a kind
check and only do it for scalar_stmt and vector_stmt.

I'm sending this separate to my patch series but it's required for it.
It also seems to fix overvectorization cases in fotonik3d_r in SPECCPU 2017.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_adjust_stmt_cost): Guard mla.
(aarch64_vector_costs::count_ops): Likewise.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
06ec22057e10fd591710aa4c795a78f34eeaa8e5..0f05877ead3dca6477ebc70f53c632e4eb48d439
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -14587,7 +14587,7 @@ aarch64_adjust_stmt_cost (vec_info *vinfo, 
vect_cost_for_stmt kind,
}
 
   gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
-  if (assign)
+  if ((kind == scalar_stmt || kind == vector_stmt) && assign)
{
  /* For MLA we need to reduce the cost since MLA is 1 instruction.  */
  if (!vect_is_reduction (stmt_info)
@@ -14669,7 +14669,9 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
 }
 
   /* Assume that multiply-adds will become a single operation.  */
-  if (stmt_info && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
+  if (stmt_info
+  && (kind == scalar_stmt || kind == vector_stmt)
+  && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
 return;
 
   /* Assume that bool AND with compare operands will become a single




-- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
06ec22057e10fd591710aa4c795a78f34eeaa8e5..0f05877ead3dca6477ebc70f53c632e4eb48d439
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -14587,7 +14587,7 @@ aarch64_adjust_stmt_cost (vec_info *vinfo, 
vect_cost_for_stmt kind,
}
 
   gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
-  if (assign)
+  if ((kind == scalar_stmt || kind == vector_stmt) && assign)
{
  /* For MLA we need to reduce the cost since MLA is 1 instruction.  */
  if (!vect_is_reduction (stmt_info)
@@ -14669,7 +14669,9 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
 }
 
   /* Assume that multiply-adds will become a single operation.  */
-  if (stmt_info && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
+  if (stmt_info
+  && (kind == scalar_stmt || kind == vector_stmt)
+  && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
 return;
 
   /* Assume that bool AND with compare operands will become a single





Re: [PATCH] Add support for function attributes and variable attributes

2023-11-15 Thread Antoni Boucher
David: another thing I remember you mentioned when you reviewed an
earlier version of this patch is the usage of `std::pair`.
I can't find where you said that, but I remember you mentioned that we
should use a struct instead.
Can you please elaborate again?
Thanks.

On Wed, 2023-11-15 at 17:53 +0100, Guillaume Gomez wrote:
> Hi,
> 
> This patch adds the (incomplete) support for function and variable
> attributes. The added attributes are the ones we're using in
> rustc_codegen_gcc but all the groundwork is done to add more (and we
> will very likely add more as we didn't add all the ones we use in
> rustc_codegen_gcc yet).
> 
> The only big question with this patch is about `inline`. We currently
> handle it as an attribute because it is more convenient for us but is
> it ok or should we create a separate function to mark a function as
> inlined?
> 
> Thanks in advance for the review.



[PATCH] Add support for function attributes and variable attributes

2023-11-15 Thread Guillaume Gomez
Hi,

This patch adds the (incomplete) support for function and variable
attributes. The added attributes are the ones we're using in
rustc_codegen_gcc but all the groundwork is done to add more (and we
will very likely add more as we didn't add all the ones we use in
rustc_codegen_gcc yet).

The only big question with this patch is about `inline`. We currently
handle it as an attribute because it is more convenient for us but is
it ok or should we create a separate function to mark a function as
inlined?

Thanks in advance for the review.
From df75f0eb8aacba249b6e791603752e35778951a4 Mon Sep 17 00:00:00 2001
From: Guillaume Gomez 
Date: Mon, 20 Jun 2022 14:34:39 -0400
Subject: [PATCH] [PATCH] Add support for function attributes and variable
 attributes.

gcc/jit/ChangeLog:

	* dummy-frontend.cc (handle_alias_attribute): New function.
	(handle_always_inline_attribute): New function.
	(handle_cold_attribute): New function.
	(handle_fnspec_attribute): New function.
	(handle_format_arg_attribute): New function.
	(handle_format_attribute): New function.
	(handle_noinline_attribute): New function.
	(handle_target_attribute): New function.
	(handle_used_attribute): New function.
	(handle_visibility_attribute): New function.
	(handle_weak_attribute): New function.
	(handle_alias_ifunc_attribute): New function.
	* jit-playback.cc (fn_attribute_to_string): New function.
	(variable_attribute_to_string): New function.
	(global_new_decl): Add attributes support.
	(set_variable_attribute): New function.
	(new_global): Add attributes support.
	(new_global_initialized): Add attributes support.
	(new_local): Add attributes support.
	* jit-playback.h (fn_attribute_to_string): New function.
	(set_variable_attribute): New function.
	* jit-recording.cc (recording::lvalue::add_attribute): New function.
	(recording::function::function): New function.
	(recording::function::write_to_dump): Add attributes support.
	(recording::function::add_attribute): New function.
	(recording::function::add_string_attribute): New function.
	(recording::function::add_integer_array_attribute): New function.
	(recording::global::replay_into): Add attributes support.
	(recording::local::replay_into): Add attributes support.
	* libgccjit.cc (gcc_jit_function_add_attribute): New function.
	(gcc_jit_function_add_string_attribute): New function.
	(gcc_jit_function_add_integer_array_attribute): New function.
	(gcc_jit_lvalue_add_attribute): New function.
	* libgccjit.h (enum gcc_jit_fn_attribute): New enum.
	(gcc_jit_function_add_attribute): New function.
	(gcc_jit_function_add_string_attribute): New function.
	(gcc_jit_function_add_integer_array_attribute): New function.
	(enum gcc_jit_variable_attribute): New function.
	(gcc_jit_lvalue_add_string_attribute): New function.
	* libgccjit.map: Declare new functions.

gcc/testsuite/ChangeLog:

	* jit.dg/jit.exp: Add `jit-verify-assembler-output-not` test command.
	* jit.dg/test-restrict.c: New test.
	* jit.dg/test-restrict-attribute.c: New test.
	* jit.dg/test-alias-attribute.c: New test.
	* jit.dg/test-always_inline-attribute.c: New test.
	* jit.dg/test-cold-attribute.c: New test.
	* jit.dg/test-const-attribute.c: New test.
	* jit.dg/test-noinline-attribute.c: New test.
	* jit.dg/test-nonnull-attribute.c: New test.
	* jit.dg/test-pure-attribute.c: New test.
	* jit.dg/test-used-attribute.c: New test.
	* jit.dg/test-variable-attribute.c: New test.
	* jit.dg/test-weak-attribute.c: New test.

gcc/jit/ChangeLog:
	* docs/topics/compatibility.rst: Add documentation for LIBGCCJIT_ABI_26.
	* docs/topics/types.rst: Add documentation for new functions.

Co-authored-by: Antoni Boucher 
Signed-off-by: Guillaume Gomez 
---
 gcc/jit/docs/topics/compatibility.rst |  12 +
 gcc/jit/docs/topics/types.rst |  77 +++
 gcc/jit/dummy-frontend.cc | 504 --
 gcc/jit/jit-playback.cc   | 165 +-
 gcc/jit/jit-playback.h|  37 +-
 gcc/jit/jit-recording.cc  | 166 +-
 gcc/jit/jit-recording.h   |  19 +-
 gcc/jit/libgccjit.cc  |  45 ++
 gcc/jit/libgccjit.h   |  49 ++
 gcc/jit/libgccjit.map |   8 +
 gcc/testsuite/jit.dg/jit.exp  |  33 ++
 gcc/testsuite/jit.dg/test-alias-attribute.c   |  50 ++
 .../jit.dg/test-always_inline-attribute.c | 153 ++
 gcc/testsuite/jit.dg/test-cold-attribute.c|  54 ++
 gcc/testsuite/jit.dg/test-const-attribute.c   | 134 +
 .../jit.dg/test-noinline-attribute.c  | 114 
 gcc/testsuite/jit.dg/test-nonnull-attribute.c |  94 
 gcc/testsuite/jit.dg/test-pure-attribute.c| 134 +
 ...t-restrict.c => test-restrict-attribute.c} |   4 +-
 gcc/testsuite/jit.dg/test-used-attribute.c| 112 
 .../jit.dg/test-variable-attribute.c  |  46 ++
 gcc/testsuite/jit.dg/test-weak-attribute.c|  41 ++
 22 files changed, 1986 insertions(+), 65 

[PATCH] Fortran: fix reallocation on assignment of polymorphic variables [PR110415]

2023-11-15 Thread Andrew Jenner

This patch adds the testcase from PR110415 and fixes the bug.

The problem is that in a couple of places in trans_class_assignment in 
trans-expr.cc, we need to get the run-time size of the polymorphic 
object from the vtbl, but we are currently getting that vtbl from the 
lhs of the assignment rather than the rhs. This gives us the old value 
of the size but we need to pass the new size to __builtin_malloc and 
__builtin_realloc.


I'm fixing this by adding a parameter to trans_class_vptr_len_assignment 
to retrieve the tree corresponding the vptr from the object on the rhs 
of the assignment, and then passing this where it is needed. In the case 
where trans_class_vptr_len_assignment returns NULL_TREE for the rhs vptr 
we use the lhs vptr as before.


To get this to work I also needed to change the implementation of 
trans_class_vptr_len_assignment to create a temporary for the assignment 
in more circumstances. Currently, the "a = func()" assignment in MAIN__ 
doesn't hit the "Create a temporary for complication expressions" case 
on line 9951 because "DECL_P (rse->expr)" is true - the expression has 
already been placed into a temporary. That means we don't hit the "if 
(temp_rhs ..." case on line 10038 and go on to get the vptr_expr from 
"gfc_lval_expr_from_sym (gfc_find_vtab (>ts))" on line 10057 which 
is the vtbl of the static type rather than the dynamic one from the rhs. 
So with this fix we create an extra temporary, but that should be 
optimised away in the middle-end so there should be no run-time effect.


I'm not sure if this is the best way to fix this (the Fortran front-end 
is new territory for me) but I've verified that the testcase passes with 
this change, fails without it, and that the change does not introduce 
any FAILs when running the gfortran testcases on x86_64-pc-linux-gnu.


Is this OK for mainline, GCC 13 and OG13?

Thanks,

Andrew

gcc/fortran/
* trans-expr.cc (trans_class_vptr_len_assignment): Add
from_vptrp parameter. Populate it. Don't check for DECL_P
when deciding whether to create temporary.
(trans_class_pointer_fcn, gfc_trans_pointer_assignment): Add
NULL argument to trans_class_vptr_len_assignment calls.
(trans_class_assignment): Get rhs_vptr from
trans_class_vptr_len_assignment and use it for determining size
for allocation/reallocation.

gcc/testsuite/
* gfortran.dg/pr110415.f90: New test.diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 50c4604a025..f1618b55add 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -9936,7 +9936,8 @@ trans_get_upoly_len (stmtblock_t *block, gfc_expr *expr)
 static tree
 trans_class_vptr_len_assignment (stmtblock_t *block, gfc_expr * le,
 gfc_expr * re, gfc_se *rse,
-tree * to_lenp, tree * from_lenp)
+tree * to_lenp, tree * from_lenp,
+tree * from_vptrp)
 {
   gfc_se se;
   gfc_expr * vptr_expr;
@@ -9944,10 +9945,11 @@ trans_class_vptr_len_assignment (stmtblock_t *block, 
gfc_expr * le,
   bool set_vptr = false, temp_rhs = false;
   stmtblock_t *pre = block;
   tree class_expr = NULL_TREE;
+  tree from_vptr = NULL_TREE;
 
   /* Create a temporary for complicated expressions.  */
   if (re->expr_type != EXPR_VARIABLE && re->expr_type != EXPR_NULL
-  && rse->expr != NULL_TREE && !DECL_P (rse->expr))
+  && rse->expr != NULL_TREE)
 {
   if (re->ts.type == BT_CLASS && !GFC_CLASS_TYPE_P (TREE_TYPE (rse->expr)))
class_expr = gfc_get_class_from_expr (rse->expr);
@@ -10044,6 +10046,7 @@ trans_class_vptr_len_assignment (stmtblock_t *block, 
gfc_expr * le,
tmp = rse->expr;
 
  se.expr = gfc_class_vptr_get (tmp);
+ from_vptr = se.expr;
  if (UNLIMITED_POLY (re))
from_len = gfc_class_len_get (tmp);
 
@@ -10065,6 +10068,7 @@ trans_class_vptr_len_assignment (stmtblock_t *block, 
gfc_expr * le,
  gfc_free_expr (vptr_expr);
  gfc_add_block_to_block (block, );
  gcc_assert (se.post.head == NULL_TREE);
+ from_vptr = se.expr;
}
   gfc_add_modify (pre, lhs_vptr, fold_convert (TREE_TYPE (lhs_vptr),
se.expr));
@@ -10093,11 +10097,13 @@ trans_class_vptr_len_assignment (stmtblock_t *block, 
gfc_expr * le,
}
 }
 
-  /* Return the _len trees only, when requested.  */
+  /* Return the _len and _vptr trees only, when requested.  */
   if (to_lenp)
 *to_lenp = to_len;
   if (from_lenp)
 *from_lenp = from_len;
+  if (from_vptrp)
+*from_vptrp = from_vptr;
   return lhs_vptr;
 }
 
@@ -10166,7 +10172,7 @@ trans_class_pointer_fcn (stmtblock_t *block, gfc_se 
*lse, gfc_se *rse,
 {
   expr1_vptr = trans_class_vptr_len_assignment (block, expr1,
expr2, rse,
- 

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-15 Thread Xi Ruoyao
On Wed, 2023-11-15 at 15:14 +0100, Arsen Arsenović wrote:
> That is interesting.  They should be using the same checks.  I've
> checked trunk and regenerated files on it, and saw no significant diff
> (some whitespace changes only).  Could you post the config.log of
> both?

You did not regenerate config.in.  But I've regenerated it in r14-5434
anyway.

The related changes:

+/* Define to 1 if you have the Mac OS X function
+   CFLocaleCopyPreferredLanguages in the CoreFoundation framework. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_CFLOCALECOPYPREFERREDLANGUAGES
+#endif
+
+
+/* Define to 1 if you have the Mac OS X function
CFPreferencesCopyAppValue in
+   the CoreFoundation framework. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_CFPREFERENCESCOPYAPPVALUE
+#endif

+/* Define if the GNU dcgettext() function is already present or preinstalled.
+   */
+#ifndef USED_FOR_TARGET
+#undef HAVE_DCGETTEXT
+#endif

+/* Define if the GNU gettext() function is already present or preinstalled. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GETTEXT
+#endif

I don't know if they are related to the issue on AIX though.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: PR111754

2023-11-15 Thread Prathamesh Kulkarni
On Wed, 8 Nov 2023 at 21:57, Prathamesh Kulkarni
 wrote:
>
> On Thu, 26 Oct 2023 at 09:43, Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 26 Oct 2023 at 04:09, Richard Sandiford
> >  wrote:
> > >
> > > Prathamesh Kulkarni  writes:
> > > > On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
> > > >  wrote:
> > > >>
> > > >> Hi,
> > > >>
> > > >> Sorry the slow review.  I clearly didn't think this through properly
> > > >> when doing the review of the original patch, so I wanted to spend
> > > >> some time working on the code to get a better understanding of
> > > >> the problem.
> > > >>
> > > >> Prathamesh Kulkarni  writes:
> > > >> > Hi,
> > > >> > For the following test-case:
> > > >> >
> > > >> > typedef float __attribute__((__vector_size__ (16))) F;
> > > >> > F foo (F a, F b)
> > > >> > {
> > > >> >   F v = (F) { 9 };
> > > >> >   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > > >> > }
> > > >> >
> > > >> > Compiling with -O2 results in following ICE:
> > > >> > foo.c: In function ‘foo’:
> > > >> > foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> > > >> > 6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > > >> >   |  ^~
> > > >> > 0x7f3185 wi::int_traits
> > > >> >>::decompose(long*, unsigned int, std::pair
> > > >> > const&)
> > > >> > ../../gcc/gcc/rtl.h:2314
> > > >> > 0x7f3185 wide_int_ref_storage > > >> > false>::wide_int_ref_storage
> > > >> >>(std::pair const&)
> > > >> > ../../gcc/gcc/wide-int.h:1089
> > > >> > 0x7f3185 generic_wide_int
> > > >> >>::generic_wide_int
> > > >> >>(std::pair const&)
> > > >> > ../../gcc/gcc/wide-int.h:847
> > > >> > 0x7f3185 poly_int<1u, generic_wide_int > > >> > false> > >::poly_int
> > > >> >>(poly_int_full, std::pair const&)
> > > >> > ../../gcc/gcc/poly-int.h:467
> > > >> > 0x7f3185 poly_int<1u, generic_wide_int > > >> > false> > >::poly_int
> > > >> >>(std::pair const&)
> > > >> > ../../gcc/gcc/poly-int.h:453
> > > >> > 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> > > >> > ../../gcc/gcc/rtl.h:2383
> > > >> > 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> > > >> > ../../gcc/gcc/rtx-vector-builder.h:122
> > > >> > 0xfd4e1b vector_builder > > >> > rtx_vector_builder>::elt(unsigned int) const
> > > >> > ../../gcc/gcc/vector-builder.h:253
> > > >> > 0xfd4d11 rtx_vector_builder::build()
> > > >> > ../../gcc/gcc/rtx-vector-builder.cc:73
> > > >> > 0xc21d9c const_vector_from_tree
> > > >> > ../../gcc/gcc/expr.cc:13487
> > > >> > 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > > >> > expand_modifier, rtx_def**, bool)
> > > >> > ../../gcc/gcc/expr.cc:11059
> > > >> > 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, 
> > > >> > expand_modifier)
> > > >> > ../../gcc/gcc/expr.h:310
> > > >> > 0xaee682 expand_return
> > > >> > ../../gcc/gcc/cfgexpand.cc:3809
> > > >> > 0xaee682 expand_gimple_stmt_1
> > > >> > ../../gcc/gcc/cfgexpand.cc:3918
> > > >> > 0xaee682 expand_gimple_stmt
> > > >> > ../../gcc/gcc/cfgexpand.cc:4044
> > > >> > 0xaf28f0 expand_gimple_basic_block
> > > >> > ../../gcc/gcc/cfgexpand.cc:6100
> > > >> > 0xaf4996 execute
> > > >> > ../../gcc/gcc/cfgexpand.cc:6835
> > > >> >
> > > >> > IIUC, the issue is that fold_vec_perm returns a vector having float 
> > > >> > element
> > > >> > type with res_nelts_per_pattern == 3, and later ICE's when it tries
> > > >> > to derive element v[3], not present in the encoding, while trying to
> > > >> > build rtx vector
> > > >> > in rtx_vector_builder::build():
> > > >> >  for (unsigned int i = 0; i < nelts; ++i)
> > > >> > RTVEC_ELT (v, i) = elt (i);
> > > >> >
> > > >> > The attached patch tries to fix this by returning false from
> > > >> > valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and
> > > >> > input vector has non-integral element type, so for VLA vectors, it
> > > >> > will only build result with dup sequence (nelts_per_pattern < 3) for
> > > >> > non-integral element type.
> > > >> >
> > > >> > For VLS vectors, this will still work for stepped sequence since it
> > > >> > will then use the "VLS exception" in fold_vec_perm_cst, and set:
> > > >> > res_npattern = res_nelts and
> > > >> > res_nelts_per_pattern = 1
> > > >> >
> > > >> > and fold the above case to:
> > > >> > F foo (F a, F b)
> > > >> > {
> > > >> >[local count: 1073741824]:
> > > >> >   return { 0.0, 9.0e+0, 0.0, 0.0 };
> > > >> > }
> > > >> >
> > > >> > But I am not sure if this is entirely correct, since:
> > > >> > tree res = out_elts.build ();
> > > >> > will canonicalize the encoding and may result in a stepped sequence
> > > >> > (vector_builder::finalize() may reduce npatterns at the cost of 
> > > >> > increasing
> > > >> > nelts_per_pattern)  ?
> > > >> >
> > > >> > PS: This issue is now latent after PR111648 fix, since
> > > >> > 

[committed] i386: Fix strict_low_part QImode insn with high input register patterns [PR112540]

2023-11-15 Thread Uros Bizjak
PR target/112540

gcc/ChangeLog:

* config/i386/i386.md (*addqi_ext_1_slp):
Correct operand numbers in split pattern.  Replace !Q constraint
of operand 1 with !qm.  Add insn constrain.
(*subqi_ext_1_slp): Ditto.
(*qi_ext_1_slp): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6136e46b1bc..29ec9425200 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6624,9 +6624,9 @@ (define_insn_and_split "*addqi_ext_1_slp"
  [(match_operand 2 "int248_register_operand" "Q,Q")
   (const_int 8)
   (const_int 8)]) 0)
- (match_operand:QI 1 "nonimmediate_operand" "0,!Q")))
+ (match_operand:QI 1 "nonimmediate_operand" "0,!qm")))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
   "@
add{b}\t{%h2, %0|%0, %h2}
#"
@@ -6638,8 +6638,8 @@ (define_insn_and_split "*addqi_ext_1_slp"
   (plus:QI
 (subreg:QI
   (match_op_dup 3
-[(match_dup 0) (const_int 8) (const_int 8)]) 0)
-  (match_dup 1)))
+[(match_dup 2) (const_int 8) (const_int 8)]) 0)
+  (match_dup 0)))
   (clobber (reg:CC FLAGS_REG))])]
   ""
   [(set_attr "type" "alu")
@@ -7662,14 +7662,14 @@ (define_insn_and_split "*sub_1_slp"
 (define_insn_and_split "*subqi_ext_1_slp"
   [(set (strict_low_part (match_operand:QI 0 "register_operand" "+Q,"))
(minus:QI
- (match_operand:QI 1 "nonimmediate_operand" "0,!Q")
+ (match_operand:QI 1 "nonimmediate_operand" "0,!qm")
  (subreg:QI
(match_operator:SWI248 3 "extract_operator"
  [(match_operand 2 "int248_register_operand" "Q,Q")
   (const_int 8)
   (const_int 8)]) 0)))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
   "@
sub{b}\t{%h2, %0|%0, %h2}
#"
@@ -7679,10 +7679,10 @@ (define_insn_and_split "*subqi_ext_1_slp"
(parallel
  [(set (strict_low_part (match_dup 0))
   (minus:QI
-  (match_dup 1)
+(match_dup 0)
 (subreg:QI
   (match_op_dup 3
-[(match_dup 0) (const_int 8) (const_int 8)]) 0)))
+[(match_dup 2) (const_int 8) (const_int 8)]) 0)))
   (clobber (reg:CC FLAGS_REG))])]
   ""
   [(set_attr "type" "alu")
@@ -11492,9 +11492,9 @@ (define_insn_and_split "*qi_ext_1_slp"
  [(match_operand 2 "int248_register_operand" "Q,Q")
   (const_int 8)
   (const_int 8)]) 0)
- (match_operand:QI 1 "nonimmediate_operand" "0,!Q")))
+ (match_operand:QI 1 "nonimmediate_operand" "0,!qm")))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
   "@
{b}\t{%h2, %0|%0, %h2}
#"
@@ -11504,10 +11504,10 @@ (define_insn_and_split "*qi_ext_1_slp"
(parallel
  [(set (strict_low_part (match_dup 0))
   (any_logic:QI
-  (match_dup 1)
 (subreg:QI
   (match_op_dup 3
-[(match_dup 0) (const_int 8) (const_int 8)]) 0)))
+[(match_dup 2) (const_int 8) (const_int 8)]) 0)))
+  (match_dup 0)
   (clobber (reg:CC FLAGS_REG))])]
   ""
   [(set_attr "type" "alu")


[PATCH]AArch64 Add pattern for unsigned widenings (uxtl) to zip{1,2}

2023-11-15 Thread Tamar Christina
Hi All,

This changes unpack instructions to use zip{1,2} when doing a zero-extending
widening operation.  Permutes generally have a higher throughput than the
widening operations. Zeros are shuffled into the top half of the registers.

The testcase

void d2 (unsigned * restrict a, unsigned short *b, int n)
{
for (int i = 0; i < (n & -8); i++)
  a[i] = b[i];
}

now generates:

moviv1.4s, 0
.L3:
ldr q0, [x1], 16
zip1v2.8h, v0.8h, v1.8h
zip2v0.8h, v0.8h, v1.8h
stp q2, q0, [x0]
add x0, x0, 32
cmp x1, x2
bne .L3


instead of:

.L3:
ldr q0, [x1], 16
uxtlv1.4s, v0.4h
uxtl2   v0.4s, v0.8h
stp q1, q0, [x0]
add x0, x0, 32
cmp x1, x2
bne .L3

Since we need the extra 0 register we do this only for the vectorizer's lo/hi
pairs when we know the 0 will be floated outside of the loop.

This gives an 8% speed-up in Imagick in SPECCPU 2017 on Neoverse V2.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (vec_unpack_lo__lo___zip): New.
(aarch64_uaddw__zip): New.
* config/aarch64/iterators.md (PERM_EXTEND, perm_index): New.
(perm_hilo): Add UNSPEC_ZIP1, UNSPEC_ZIP2.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/vmovl_high_1.c: Update codegen.
* gcc.target/aarch64/uxtl-combine-1.c: New test.
* gcc.target/aarch64/uxtl-combine-2.c: New test.
* gcc.target/aarch64/uxtl-combine-3.c: New test.
* gcc.target/aarch64/uxtl-combine-4.c: New test.
* gcc.target/aarch64/uxtl-combine-5.c: New test.
* gcc.target/aarch64/uxtl-combine-6.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
81ff5bad03d598fa0d48df93d172a28bc0d1d92e..3d811007dd94dcd9176d6021a41a196c12fe9c3f
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1988,26 +1988,60 @@ (define_insn "aarch64_simd_vec_unpack_hi_"
   [(set_attr "type" "neon_shift_imm_long")]
 )
 
-(define_expand "vec_unpack_hi_"
+(define_expand "vec_unpacku_hi_"
   [(match_operand: 0 "register_operand")
-   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))]
+   (match_operand:VQW 1 "register_operand")]
+  "TARGET_SIMD"
+  {
+rtx res = gen_reg_rtx (mode);
+rtx tmp = aarch64_gen_shareable_zero (mode);
+if (BYTES_BIG_ENDIAN)
+  emit_insn (gen_aarch64_zip2 (res, tmp, operands[1]));
+else
+ emit_insn (gen_aarch64_zip2 (res, operands[1], tmp));
+emit_move_insn (operands[0],
+  simplify_gen_subreg (mode, res, mode, 0));
+DONE;
+  }
+)
+
+(define_expand "vec_unpacks_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")]
   "TARGET_SIMD"
   {
 rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
-emit_insn (gen_aarch64_simd_vec_unpack_hi_ (operands[0],
- operands[1], p));
+emit_insn (gen_aarch64_simd_vec_unpacks_hi_ (operands[0],
+  operands[1], p));
+DONE;
+  }
+)
+
+(define_expand "vec_unpacku_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")]
+  "TARGET_SIMD"
+  {
+rtx res = gen_reg_rtx (mode);
+rtx tmp = aarch64_gen_shareable_zero (mode);
+if (BYTES_BIG_ENDIAN)
+   emit_insn (gen_aarch64_zip1 (res, tmp, operands[1]));
+else
+   emit_insn (gen_aarch64_zip1 (res, operands[1], tmp));
+emit_move_insn (operands[0],
+  simplify_gen_subreg (mode, res, mode, 0));
 DONE;
   }
 )
 
-(define_expand "vec_unpack_lo_"
+(define_expand "vec_unpacks_lo_"
   [(match_operand: 0 "register_operand")
-   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))]
+   (match_operand:VQW 1 "register_operand")]
   "TARGET_SIMD"
   {
 rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
-emit_insn (gen_aarch64_simd_vec_unpack_lo_ (operands[0],
- operands[1], p));
+emit_insn (gen_aarch64_simd_vec_unpacks_lo_ (operands[0],
+  operands[1], p));
 DONE;
   }
 )
@@ -4735,6 +4769,34 @@ (define_insn 
"aarch64_subw2_internal"
   [(set_attr "type" "neon_sub_widen")]
 )
 
+(define_insn "aarch64_usubw__zip"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (minus:
+ (match_operand: 1 "register_operand" "w")
+ (subreg:
+   (unspec: [
+   (match_operand:VQW 2 "register_operand" "w")
+   (match_operand:VQW 3 "aarch64_simd_imm_zero" "Dz")
+  ] PERM_EXTEND) 0)))]
+  "TARGET_SIMD"
+  "usubw\\t%0., %1., %2."
+  [(set_attr "type" "neon_sub_widen")]
+)
+
+(define_insn 

[PATCH]middle-end: skip checking loop exits if loop malformed [PR111878]

2023-11-15 Thread Tamar Christina
Hi All,

Before my refactoring if the loop->latch was incorrect then find_loop_location
skipped checking the edges and would eventually return a dummy location.

It turns out that a loop can have
loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS) but also not have a latch
in which case get_loop_exit_edges traps.

This restores the old behavior.

Bootstrapped Regtested on x86_64-pc-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/111878
* tree-vect-loop-manip.cc (find_loop_location): Skip edges check if
latch incorrect.

gcc/testsuite/ChangeLog:

PR tree-optimization/111878
* gcc.dg/graphite/pr111878.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/graphite/pr111878.c 
b/gcc/testsuite/gcc.dg/graphite/pr111878.c
new file mode 100644
index 
..6722910062e43c827e94c53b43f106af1848852a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr111878.c
@@ -0,0 +1,19 @@
+/* { dg-options "-O3 -fgraphite-identity -fsave-optimization-record" } */
+
+int long_c2i_ltmp;
+int *long_c2i_cont;
+
+void
+long_c2i (long utmp, int i)
+{
+  int neg = 1;
+  switch (long_c2i_cont[0])
+case 0:
+neg = 0;
+  for (; i; i++)
+if (neg)
+  utmp |= long_c2i_cont[i] ^ 5;
+else
+  utmp |= long_c2i_cont[i];
+  long_c2i_ltmp = utmp;
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
b9161274ce401a7307f3e61ad23aa036701190d7..ff188840c1762d0b5fb6655cb93b5a8662b31343
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1792,7 +1792,8 @@ find_loop_location (class loop *loop)
   if (!loop)
 return dump_user_location_t ();
 
-  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
+  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS)
+  && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun))
 {
   /* We only care about the loop location, so use any exit with location
 information.  */




-- 
diff --git a/gcc/testsuite/gcc.dg/graphite/pr111878.c 
b/gcc/testsuite/gcc.dg/graphite/pr111878.c
new file mode 100644
index 
..6722910062e43c827e94c53b43f106af1848852a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr111878.c
@@ -0,0 +1,19 @@
+/* { dg-options "-O3 -fgraphite-identity -fsave-optimization-record" } */
+
+int long_c2i_ltmp;
+int *long_c2i_cont;
+
+void
+long_c2i (long utmp, int i)
+{
+  int neg = 1;
+  switch (long_c2i_cont[0])
+case 0:
+neg = 0;
+  for (; i; i++)
+if (neg)
+  utmp |= long_c2i_cont[i] ^ 5;
+else
+  utmp |= long_c2i_cont[i];
+  long_c2i_ltmp = utmp;
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
b9161274ce401a7307f3e61ad23aa036701190d7..ff188840c1762d0b5fb6655cb93b5a8662b31343
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1792,7 +1792,8 @@ find_loop_location (class loop *loop)
   if (!loop)
 return dump_user_location_t ();
 
-  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
+  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS)
+  && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun))
 {
   /* We only care about the loop location, so use any exit with location
 information.  */





nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev' description (was: [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic)

2023-11-15 Thread Thomas Schwinge
Hi!

On 2023-05-06T17:04:57+0100, "Roger Sayle"  wrote:
> This patch adds support for (a pair of) bit reversal intrinsics
> __builtin_nvptx_brev and __builtin_nvptx_brevll which perform 32-bit
> and 64-bit bit reversal (using nvptx's brev instruction) matching
> the __brev and __brevll instrinsics provided by NVidia's nvcc compiler.
> https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html

(That got pushed in commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".)

> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi

> @@ -17941,6 +17942,20 @@ Enable global interrupt.
>  Disable global interrupt.
>  @enddefbuiltin
>
> +@node Nvidia PTX Built-in Functions
> +@subsection Nvidia PTX Built-in Functions
> +
> +These built-in functions are available for the Nvidia PTX target:
> +
> +@defbuiltin{unsigned int __builtin_nvptx_brev (unsigned int @var{x})}
> +Reverse the bit order of a 32-bit unsigned integer.
> +Disable global interrupt.

Pushed to master branch commit 4450984d0a18cd4e352d396231ba2c457d20feea
"nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev' description", see
attached.

> +@enddefbuiltin
> +
> +@defbuiltin{unsigned long long __builtin_nvptx_brevll (unsigned long long 
> @var{x})}
> +Reverse the bit order of a 64-bit unsigned integer.
> +@enddefbuiltin
> +
>  @node Basic PowerPC Built-in Functions
>  @subsection Basic PowerPC Built-in Functions


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 4450984d0a18cd4e352d396231ba2c457d20feea Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 4 Sep 2023 17:20:28 +0200
Subject: [PATCH] nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev'
 description

Minor fix-up for commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".

	gcc/
	* doc/extend.texi (Nvidia PTX Built-in Functions): Fix
	copy'n'paste-o in '__builtin_nvptx_brev' description.
---
 gcc/doc/extend.texi | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 406ccc9bc75..a95121b0124 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18471,7 +18471,6 @@ These built-in functions are available for the Nvidia PTX target:
 
 @defbuiltin{unsigned int __builtin_nvptx_brev (unsigned int @var{x})}
 Reverse the bit order of a 32-bit unsigned integer.
-Disable global interrupt.
 @enddefbuiltin
 
 @defbuiltin{unsigned long long __builtin_nvptx_brevll (unsigned long long @var{x})}
-- 
2.34.1



Re: [nvptx PATCH] Update nvptx's bitrev2 pattern to use BITREVERSE rtx.

2023-11-15 Thread Thomas Schwinge
Hi!

On 2023-06-08T00:09:00+0100, "Roger Sayle"  wrote:
> This minor tweak to the nvptx backend switches the representation of
> of the brev instruction from an UNSPEC to instead use the new BITREVERSE
> rtx.

ACK.

> This allows various RTL optimizations including evaluation (constant
> folding) of integer constant arguments at compile-time.

..., which we're then observing via
commit 61c45c055a5ccfc59463c21ab057dece822d973c
"nvptx: Extend 'brev' test cases" that I just pushed;

"nvptx: Extend 'brev' test cases".

> This patch has been tested on nvptx-none with make and make -k check
> with no new failures.  Ok for mainline?

I've thus updated the test cases for these changes here, and pushed to
master branch commit 75c20a99b3a242121eef8a532f5224c00c471b56
"Update nvptx's bitrev2 pattern to use BITREVERSE rtx.", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 75c20a99b3a242121eef8a532f5224c00c471b56 Mon Sep 17 00:00:00 2001
From: Roger Sayle 
Date: Thu, 8 Jun 2023 00:09:00 +0100
Subject: [PATCH] Update nvptx's bitrev2 pattern to use BITREVERSE rtx.

This minor tweak to the nvptx backend switches the representation of
of the brev instruction from an UNSPEC to instead use the new BITREVERSE
rtx.  This allows various RTL optimizations including evaluation (constant
folding) of integer constant arguments at compile-time.

	gcc/
	* config/nvptx/nvptx.md (UNSPEC_BITREV): Delete.
	(bitrev2): Represent using bitreverse.
	gcc/testsuite/
	* gcc.target/nvptx/brev-2-O2.c: Adjust.
	* gcc.target/nvptx/brevll-2-O2.c: Likewise.

Co-authored-by: Thomas Schwinge 
---
 gcc/config/nvptx/nvptx.md|  5 +---
 gcc/testsuite/gcc.target/nvptx/brev-2-O2.c   | 25 ++--
 gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c | 25 ++--
 3 files changed, 5 insertions(+), 50 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 1bb93045403..7a7c9948f45 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -34,8 +34,6 @@
UNSPEC_FPINT_CEIL
UNSPEC_FPINT_NEARBYINT
 
-   UNSPEC_BITREV
-
UNSPEC_ALLOCA
 
UNSPEC_SET_SOFTSTACK
@@ -636,8 +634,7 @@
 
 (define_insn "bitrev2"
   [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
-	(unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")]
-		 UNSPEC_BITREV))]
+	(bitreverse:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")))]
   ""
   "%.\\tbrev.b%T0\\t%0, %1;")
 
diff --git a/gcc/testsuite/gcc.target/nvptx/brev-2-O2.c b/gcc/testsuite/gcc.target/nvptx/brev-2-O2.c
index e35052208d0..c707a87f356 100644
--- a/gcc/testsuite/gcc.target/nvptx/brev-2-O2.c
+++ b/gcc/testsuite/gcc.target/nvptx/brev-2-O2.c
@@ -1,7 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
 /* { dg-additional-options -save-temps } */
-/* { dg-final { check-function-bodies {**} {} } } */
 
 inline __attribute__((always_inline))
 unsigned int bitreverse32(unsigned int x)
@@ -96,26 +95,6 @@ int main(void)
 
   return 0;
 }
-/*
-** main:
-**	...
-**	mov\.u32	(%r[0-9]+), 0;
-**	brev\.b32	(%r[0-9]+), \1;
-**	setp\.[^.]+\.u32	%r[0-9]+, \2, 0;
-**	...
-**	mov\.u32	(%r[0-9]+), -1;
-**	brev\.b32	(%r[0-9]+), \3;
-**	setp\.[^.]+\.u32	%r[0-9]+, \4, -1;
-**	...
-**	mov\.u32	(%r[0-9]+), 1;
-**	brev\.b32	(%r[0-9]+), \5;
-**	setp\.[^.]+\.u32	%r[0-9]+, \6, -2147483648;
-**	...
-**	mov\.u32	(%r[0-9]+), 2;
-**	brev\.b32	(%r[0-9]+), \7;
-**	setp\.[^.]+\.u32	%r[0-9]+, \8, 1073741824;
-**	...
-*/
 
-/* { dg-final { scan-assembler-times {\tbrev\.b32\t} 40 } } */
-/* { dg-final { scan-assembler {\mabort\M} } } */
+/* { dg-final { scan-assembler-not {\tbrev\.b32\t} } } */
+/* { dg-final { scan-assembler-not {\mabort\M} } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c b/gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c
index cbfda1b9601..c89be9627f8 100644
--- a/gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c
+++ b/gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c
@@ -1,7 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
 /* { dg-additional-options -save-temps } */
-/* { dg-final { check-function-bodies {**} {} } } */
 
 inline __attribute__((always_inline))
 unsigned long long bitreverse64(unsigned long long x)
@@ -156,26 +155,6 @@ int main(void)
 
   return 0;
 }
-/*
-** main:
-**	...
-**	mov\.u64	(%r[0-9]+), 0;
-**	brev\.b64	(%r[0-9]+), \1;
-**	setp\.[^.]+\.u64	%r[0-9]+, \2, 0;
-**	...
-**	mov\.u64	(%r[0-9]+), -1;
-**	brev\.b64	(%r[0-9]+), \3;
-**	setp\.[^.]+\.u64	%r[0-9]+, \4, -1;
-**	...
-**	mov\.u64	(%r[0-9]+), 1;
-**	brev\.b64	(%r[0-9]+), \5;
-**	setp\.[^.]+\.u64	%r[0-9]+, \6, -9223372036854775808;
-**	...
-**	mov\.u64	(%r[0-9]+), 2;
-**	brev\.b64	(%r[0-9]+), \7;
-**	setp\.[^.]+\.u64	

Re: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-15 Thread 钟居哲
Could you show me the example ?

It's used by handling SEW = 64 on RV32. I don't know why this patch touch this 
code.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-15 22:27
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for 
rv32gcv.
> Looks wrong. Recover back.
 
When we demote we use two elements where there was one before.
Therefore the vector needs to be able to hold twice as many
elements.  We adjust vl correctly but the mode is not here.
 
Regards
Robin
 


nvptx: Extend 'brev' test cases (was: [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic)

2023-11-15 Thread Thomas Schwinge
Hi!

On 2023-05-06T17:04:57+0100, "Roger Sayle"  wrote:
> This patch adds support for (a pair of) bit reversal intrinsics
> __builtin_nvptx_brev and __builtin_nvptx_brevll which perform 32-bit
> and 64-bit bit reversal (using nvptx's brev instruction) matching
> the __brev and __brevll instrinsics provided by NVidia's nvcc compiler.
> https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html
>
> This patch has been tested on nvptx-none which make and make -k check
> with no new failures.  Ok for mainline?

(That got pushed in commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/brev-1.c
> +[...]

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/brev-2.c
> +[...]

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/brevll-1.c
> +[...]

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/brevll-2.c
> +[...]

Pushed to master branch commit 61c45c055a5ccfc59463c21ab057dece822d973c
"nvptx: Extend 'brev' test cases", see attached.  That's in order to
observe effects of a later patch, and also to exercise the new nvptx
'check-function-bodies' a bit.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 61c45c055a5ccfc59463c21ab057dece822d973c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 4 Sep 2023 23:06:27 +0200
Subject: [PATCH] nvptx: Extend 'brev' test cases

In order to observe effects of a later patch, extend the 'brev' test cases
added in commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".

	gcc/testsuite/
	* gcc.target/nvptx/brev-1.c: Extend.
	* gcc.target/nvptx/brev-2.c: Rename to...
	* gcc.target/nvptx/brev-2-O2.c: ... this, and extend.  Copy to...
	* gcc.target/nvptx/brev-2-O0.c: ... this, and adapt for '-O0'.
	* gcc.target/nvptx/brevll-1.c: Extend.
	* gcc.target/nvptx/brevll-2.c: Rename to...
	* gcc.target/nvptx/brevll-2-O2.c: ... this, and extend.  Copy to...
	* gcc.target/nvptx/brevll-2-O0.c: ... this, and adapt for '-O0'.
---
 gcc/testsuite/gcc.target/nvptx/brev-1.c   |  12 +-
 gcc/testsuite/gcc.target/nvptx/brev-2-O0.c| 129 
 .../nvptx/{brev-2.c => brev-2-O2.c}   |  27 +++
 gcc/testsuite/gcc.target/nvptx/brevll-1.c |  12 +-
 gcc/testsuite/gcc.target/nvptx/brevll-2-O0.c  | 189 ++
 .../nvptx/{brevll-2.c => brevll-2-O2.c}   |  27 +++
 6 files changed, 392 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/brev-2-O0.c
 rename gcc/testsuite/gcc.target/nvptx/{brev-2.c => brev-2-O2.c} (80%)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/brevll-2-O0.c
 rename gcc/testsuite/gcc.target/nvptx/{brevll-2.c => brevll-2-O2.c} (90%)

diff --git a/gcc/testsuite/gcc.target/nvptx/brev-1.c b/gcc/testsuite/gcc.target/nvptx/brev-1.c
index fbb4fff1e59..af875dd4dcc 100644
--- a/gcc/testsuite/gcc.target/nvptx/brev-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/brev-1.c
@@ -1,8 +1,16 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies {**} {} } } */
+
 unsigned int foo(unsigned int x)
 {
   return __builtin_nvptx_brev(x);
 }
-
-/* { dg-final { scan-assembler "brev.b32" } } */
+/*
+** foo:
+**	...
+**	mov\.u32	(%r[0-9]+), %ar0;
+**	brev\.b32	%value, \1;
+**	st\.param\.u32	\[%value_out\], %value;
+**	ret;
+*/
diff --git a/gcc/testsuite/gcc.target/nvptx/brev-2-O0.c b/gcc/testsuite/gcc.target/nvptx/brev-2-O0.c
new file mode 100644
index 000..ca011ebf472
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/brev-2-O0.c
@@ -0,0 +1,129 @@
+/* { dg-do run } */
+/* { dg-options "-O0" } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { check-function-bodies {**} {} } } */
+
+inline __attribute__((always_inline))
+unsigned int bitreverse32(unsigned int x)
+{
+  return __builtin_nvptx_brev(x);
+}
+
+int main(void)
+{
+  if (bitreverse32(0x) != 0x)
+__builtin_abort();
+  if (bitreverse32(0x) != 0x)
+__builtin_abort();
+
+  if (bitreverse32(0x0001) != 0x8000)
+__builtin_abort();
+  if (bitreverse32(0x0002) != 0x4000)
+__builtin_abort();
+  if (bitreverse32(0x0004) != 0x2000)
+__builtin_abort();
+  if (bitreverse32(0x0008) != 0x1000)
+__builtin_abort();
+  if (bitreverse32(0x0010) != 0x0800)
+__builtin_abort();
+  if (bitreverse32(0x0020) != 0x0400)
+__builtin_abort();
+  if (bitreverse32(0x0040) != 0x0200)
+__builtin_abort();
+  if (bitreverse32(0x0080) != 0x0100)
+__builtin_abort();
+  if (bitreverse32(0x0100) != 0x0080)
+__builtin_abort();
+  if (bitreverse32(0x0200) != 0x0040)
+__builtin_abort();

Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-15 Thread Robin Dapp
> Looks wrong. Recover back.

When we demote we use two elements where there was one before.
Therefore the vector needs to be able to hold twice as many
elements.  We adjust vl correctly but the mode is not here.

Regards
 Robin


RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits

2023-11-15 Thread Tamar Christina



> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, November 15, 2023 1:42 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction
> with support for multiple exits and different exits
> 
> On Wed, 15 Nov 2023, Tamar Christina wrote:
> 
> > Patch updated to trunk.
> >
> > This adds support to vectorizable_live_reduction to handle multiple
> > exits by
> 
> vectorizable_live_operation, but I do wonder how you handle reductions?

In the testcases I have reductions all seem to work fine, since reductions are
Placed in the merge block between the two loops and always have the
"value so far from full loop iterations".  These will just be used as seed for 
the
Scalar loop for any partial iterations.

> 
> > doing a search for which exit the live value should be materialized in.
> >
> > Additinally which value in the index we're after depends on whether
> > the exit it's materialized in is an early exit or whether the loop's
> > main exit is different from the loop's natural one (i.e. the one with
> > the same src block as the latch).
> >
> > In those two cases we want the first rather than the last value as
> > we're going to restart the iteration in the scalar loop.  For VLA this
> > means we need to reverse both the mask and vector since there's only a
> > way to get the last active element and not the first.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
> > * tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> > * tree-vectorizer.h (perm_mask_for_reverse): Expose.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> >
> 4cf7f65dc164db27a498b31fe7ce0d9af3f3e299..2476e59ef488fd0a3b296c
> ed7b0d
> > 4d3e76a3634f 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -10627,12 +10627,60 @@ vectorizable_live_operation (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >lhs' = new_tree;  */
> >
> >class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > -  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > +  /* A value can only be live in one exit.  So figure out which
> > + one.  */
> 
> Well, a value can be live across multiple exits!

The same value can only be live across multiple early exits no?  In which
case they'll all still be I the same block as all the early exits end In the 
same
merge block.

So this code is essentially just figuring out if you're an early or normal exit.
Perhaps the comment is inclear..

> 
> > +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > +  /* Check if we have a loop where the chosen exit is not the main 
> > exit,
> > +in these cases for an early break we restart the iteration the vector
> code
> > +did.  For the live values we want the value at the start of the 
> > iteration
> > +rather than at the end.  */
> > +  bool restart_loop = false;
> > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +   {
> > + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> > +   if (!is_gimple_debug (use_stmt)
> > +   && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> > + {
> 
> In fact when you get here you know the use is in a LC PHI.  Use
> FOR_EACH_IMM_USE_FAST and you can get at the edge via
> phi_arg_index_from_use and gimple_phi_arg_edge.
> 
> As said you have to process all exits the value is live on, not only the 
> first.
> 
> > +   basic_block use_bb = gimple_bb (use_stmt);
> > +   for (auto edge : get_loop_exit_edges (loop))
> > + {
> > +   /* Alternative exits can have an intermediate BB in
> > +  between to update the IV.  In those cases we need to
> > +  look one block further.  */
> > +   if (use_bb == edge->dest
> > +   || (single_succ_p (edge->dest)
> > +   && use_bb == single_succ (edge->dest)))
> > + {
> > +   exit_e = edge;
> > +   goto found;
> > + }
> > + }
> > + }
> > +found:
> > + /* If the edge isn't a single pred then split the edge so we have a
> > +location to place the live operations.  Perhaps we should always
> > +split during IV updating.  But this way the CFG is cleaner to
> > +follow.  */
> > + restart_loop = !vect_is_loop_exit_latch_pred (exit_e, loop);
> > + if (!single_pred_p (exit_e->dest))
> > +   exit_e = single_pred_edge (split_edge (exit_e));
> > +
> > + /* For early exit where the exit is not in the BB that leads to the
> > +latch then we're restarting the iteration in the scalar loop. So
> > +get the first 

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-15 Thread Arsen Arsenović

David Edelsohn  writes:

> GCC had been working on AIX with NLS, using "--with-included-gettext".
> --disable-nls gets past the breakage, but GCC does not build for me on AIX
> with NLS enabled.

That should still work with gettext 0.22+ extracted in-tree (it should
be fetched by download_prerequisites).

> A change in dependencies for GCC should have been announced and more widely
> socialized in the GCC development mailing list, not just GCC patches
> mailing list.
>
> I have tried both the AIX Open Source libiconv and libgettext package, and
> the ones that I previously built.  Both fail because GCC configure decides
> to disable NLS, despite being requested, while libcpp is satisfied, so
> tools in the gcc subdirectory don't link against libiconv and the build
> fails.  With the included gettext, I was able to rely on a self-consistent
> solution.

That is interesting.  They should be using the same checks.  I've
checked trunk and regenerated files on it, and saw no significant diff
(some whitespace changes only).  Could you post the config.log of both?

I've never used AIX.  Can I reproduce this on one of the cfarm machines
to poke around?  I've tried cfarm119, but that one lacked git, and I
haven't poked around much further due to time constraints.

TIA, sorry about the inconvenience.  Have a lovely day.

> The current gettext-0.22.3 fails to build for me on AIX.
>
> libcpp configure believes that NLS functions on AIX, but gcc configure
> fails in its tests of gettext functionality, which leads to an inconsistent
> configuration and build breakage.
>
> Thanks, David


-- 
Arsen Arsenović


signature.asc
Description: PGP signature


RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

2023-11-15 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, November 15, 2023 1:23 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> On Wed, 15 Nov 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Wednesday, November 15, 2023 1:01 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ;
> j...@ventanamicro.com
> > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > support early breaks and arbitrary exits
> > >
> > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > >
> > > > Patch updated to latest trunk:
> > > >
> > > > Hi All,
> > > >
> > > > This changes the PHI node updates to support early breaks.
> > > > It has to support both the case where the loop's exit matches the
> > > > normal loop exit and one where the early exit is "inverted", i.e.
> > > > it's an early
> > > exit edge.
> > > >
> > > > In the latter case we must always restart the loop for VF iterations.
> > > > For an early exit the reason is obvious, but there are cases where
> > > > the "normal" exit is located before the early one.  This exit then
> > > > does a check on ivtmp resulting in us leaving the loop since it thinks 
> > > > we're
> done.
> > > >
> > > > In these case we may still have side-effects to perform so we also
> > > > go to the scalar loop.
> > > >
> > > > For the "normal" exit niters has already been adjusted for
> > > > peeling, for the early exits we must find out how many iterations
> > > > we actually did.  So we have to recalculate the new position for each 
> > > > exit.
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide
> > > unused.
> > > > (vect_update_ivs_after_vectorizer): Support early break.
> > > > (vect_do_peeling): Use it.
> > > >
> > > > --- inline copy of patch ---
> > > >
> > > > diff --git a/gcc/tree-vect-loop-manip.cc
> > > > b/gcc/tree-vect-loop-manip.cc index
> > > >
> > >
> d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> > > d2654cf1
> > > > c842baac58f5 100644
> > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > @@ -1200,7 +1200,7 @@
> > > > vect_set_loop_condition_partial_vectors_avx512
> > > (class loop *loop,
> > > > loop handles exactly VF scalars per iteration.  */
> > > >
> > > >  static gcond *
> > > > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > > > exit_edge,
> > > > +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */,
> > > > +edge exit_edge,
> > > > class loop *loop, tree niters, tree 
> > > > step,
> > > > tree final_iv, bool niters_maybe_zero,
> > > > gimple_stmt_iterator loop_cond_gsi) @@ -
> > > 1412,7 +1412,7 @@
> > > > vect_set_loop_condition (class loop *loop, edge loop_e,
> > > > loop_vec_info
> > > loop_vinfo
> > > > When this happens we need to flip the understanding of main and
> other
> > > > exits by peeling and IV updates.  */
> > > >
> > > > -bool inline
> > > > +bool
> > > >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> > > >return single_pred (loop->latch) == loop_exit->src; @@ -2142,6
> > > > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> > > >   Input:
> > > >   - LOOP - a loop that is going to be vectorized. The last few 
> > > > iterations
> > > >of LOOP were peeled.
> > > > + - VF   - The chosen vectorization factor for LOOP.
> > > >   - NITERS - the number of iterations that LOOP executes (before it 
> > > > is
> > > >  vectorized). i.e, the number of times the ivs should 
> > > > be bumped.
> > > >   - UPDATE_E - a successor edge of LOOP->exit that is on the
> > > > (only) path
> > >
> > > the comment on this is now a bit misleading, can you try to update
> > > it and/or move the comment bits to the docs on EARLY_EXIT?
> > >
> > > > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info
> > > loop_vinfo)
> > > >The phi args associated with the edge UPDATE_E in 
> > > > the bb
> > > >UPDATE_E->dest are updated accordingly.
> > > >
> > > > + - restart_loop - Indicates whether the scalar loop needs to
> > > > + restart the
> > >
> > > params are ALL_CAPS
> > >
> > > > + iteration count where the vector loop began.
> > > > +
> > > >   Assumption 1: Like the rest of the vectorizer, this function 
> > > > assumes
> > > >   a single loop exit that has a single predecessor.
> > > >
> > > > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info
> > > loop_vinfo)
> > > >   */
> > > >
> > > >  static void
> > > > 

[committed] amdgcn: Add Accelerator VGPR registers

2023-11-15 Thread Andrew Stubbs
AMD GPUs since CDNA1 have had a new register file with an additional 256 
32-bit-by-64-lane vector registers.  This doubles the number of vector 
registers on the device, compared to previous models.  The way the 
hardware works is that the register file is divided between all the 
running threads, so a single thread cannot use all this capacity without 
limiting parallism; doubling the number makes this much nicer.


The new registers can only be used for selected operations (mostly 
related to matrices), none of which GCC supports easily, but we can use 
them as spill space and avoid costly stack accesses for very large 
registers.


In CDNA2 there were additional instruction encodings added for load and 
store to and from these new registers, so that opens up more 
possibilities for optimzations.


This patch adds the new registers as CALL_USED (so they will never add 
to function call overhead), configures them as spill space and 
load/store targets (CDNA2 only), and provides the necessary move 
instructions. There are many tweaks to the target hooks to handle the 
new cases, but there are not intended to be any functional changes to 
any other registers or instructions.


The original work was done by Andrew Jenner, and I've finished off the 
task with debug and tidy-up.


Andrew

amdgcn: Add Accelerator VGPR registers

Add the new CDNA register file.  We don't support any of the specialized
instructions that use these registers, but they're useful to relieve
register pressure without spilling to stack.

Co-authored-by: Andrew Jenner  

gcc/ChangeLog:

* config/gcn/constraints.md: Add "a" AVGPR constraint.
* config/gcn/gcn-valu.md (*mov): Add AVGPR alternatives.
(*mov_4reg): Likewise.
(@mov_sgprbase): Likewise.
(gather_insn_1offset): Likewise.
(gather_insn_1offset_ds): Likewise.
(gather_insn_2offsets): Likewise.
(scatter_expr): Likewise.
(scatter_insn_1offset_ds): Likewise.
(scatter_insn_2offsets): Likewise.
* config/gcn/gcn.cc (MAX_NORMAL_AVGPR_COUNT): Define.
(gcn_class_max_nregs): Handle AVGPR_REGS and ALL_VGPR_REGS.
(gcn_hard_regno_mode_ok): Likewise.
(gcn_regno_reg_class): Likewise.
(gcn_spill_class): Allow spilling to AVGPRs on TARGET_CDNA1_PLUS.
(gcn_sgpr_move_p): Handle AVGPRs.
(gcn_secondary_reload): Reload AVGPRs via VGPRs.
(gcn_conditional_register_usage): Handle AVGPRs.
(gcn_vgpr_equivalent_register_operand): New function.
(gcn_valid_move_p): Check for validity of AVGPR moves.
(gcn_compute_frame_offsets): Handle AVGPRs.
(gcn_memory_move_cost): Likewise.
(gcn_register_move_cost): Likewise.
(gcn_vmem_insn_p): Handle TYPE_VOP3P_MAI.
(gcn_md_reorg): Handle AVGPRs.
(gcn_hsa_declare_function_name): Likewise.
(print_reg): Likewise.
(gcn_dwarf_register_number): Likewise.
* config/gcn/gcn.h (FIRST_AVGPR_REG): Define.
(AVGPR_REGNO): Define.
(LAST_AVGPR_REG): Define.
(SOFT_ARG_REG): Update.
(FRAME_POINTER_REGNUM): Update.
(DWARF_LINK_REGISTER): Update.
(FIRST_PSEUDO_REGISTER): Update.
(AVGPR_REGNO_P): Define.
(enum reg_class): Add AVGPR_REGS and ALL_VGPR_REGS.
(REG_CLASS_CONTENTS): Add new register classes and add entries for
AVGPRs to all classes.
(REGISTER_NAMES): Add AVGPRs.
* config/gcn/gcn.md (FIRST_AVGPR_REG, LAST_AVGPR_REG): Define.
(AP_REGNUM, FP_REGNUM): Update.
(define_attr "type"): Add vop3p_mai.
(define_attr "unit"): Handle vop3p_mai.
(define_attr "gcn_version"): Add "cdna2".
(define_attr "enabled"): Handle cdna2.
(*mov_insn): Add AVGPR alternatives.
(*movti_insn): Likewise.
* config/gcn/mkoffload.cc (isa_has_combined_avgprs): New.
(process_asm): Process avgpr_count.
* config/gcn/predicates.md (gcn_avgpr_register_operand): New.
(gcn_avgpr_hard_register_operand): New.
* doc/md.texi: Document the "a" constraint.

gcc/testsuite/ChangeLog:

* gcc.target/gcn/avgpr-mem-double.c: New test.
* gcc.target/gcn/avgpr-mem-int.c: New test.
* gcc.target/gcn/avgpr-mem-long.c: New test.
* gcc.target/gcn/avgpr-mem-short.c: New test.
* gcc.target/gcn/avgpr-spill-double.c: New test.
* gcc.target/gcn/avgpr-spill-int.c: New test.
* gcc.target/gcn/avgpr-spill-long.c: New test.
* gcc.target/gcn/avgpr-spill-short.c: New test.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (max_isa_vgprs): New.
(run_kernel): CDNA2 devices have more VGPRs.

diff --git a/gcc/config/gcn/constraints.md b/gcc/config/gcn/constraints.md
index efe462a0bd6..b29dc5b6643 100644
--- a/gcc/config/gcn/constraints.md
+++ b/gcc/config/gcn/constraints.md
@@ -77,6 +77,9 @@ (define_constraint "Y"
 (define_register_constraint "v" "VGPR_REGS"
   "VGPR 

[committed] amdgcn: simplify secondary reload patterns

2023-11-15 Thread Andrew Stubbs
This patch makes no functional changes, but cleans up the code a little 
to make way for my next patch.


The confusung "reload_in" and "reload_out" define_expand were used 
solely for secondary reload and were nothing more than aliases for the 
"sgprbase" instructions.  I've now learned that the constraints on these 
patterns were active (unusually for define_expand) so having them hide 
or duplicate the constraints from the real insns is pointless.


Also, whatever restriction previously prevented use of the "@" feature, 
and led to creating the "CODE_FOR" macros, no longer exists (maybe 
moving to C++ fixed it?), so that can get cleaned up too.


Andrew


amdgcn: simplify secondary reload patterns

Remove some unnecessary complexity; no functional change is intended,
although LRA appears to use the constraints from the reload_in/out
patterns, so it's probably an improvement for it to see the real sgprbase
constraints.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (mov_sgprbase): Add @ modifier.
(reload_in): Delete.
(reload_out): Delete.
* config/gcn/gcn.cc (CODE_FOR): Delete.
(get_code_for_##PREFIX##vN##SUFFIX): Delete.
(CODE_FOR_OP): Delete.
(get_code_for_##PREFIX): Delete.
(gcn_secondary_reload): Replace "get_code_for" with "code_for".

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 8c441696ca4..8dc93e8c82e 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -641,7 +641,7 @@ (define_insn "mov_exec"
 ;   vT += Sv
 ;   flat_load v, vT
 
-(define_insn "mov_sgprbase"
+(define_insn "@mov_sgprbase"
   [(set (match_operand:V_1REG 0 "nonimmediate_operand")
(unspec:V_1REG
  [(match_operand:V_1REG 1 "general_operand")]
@@ -655,7 +655,7 @@ (define_insn "mov_sgprbase"
   [m,v ,*   ,12] #
   })
 
-(define_insn "mov_sgprbase"
+(define_insn "@mov_sgprbase"
   [(set (match_operand:V_2REG 0 "nonimmediate_operand" "= v, v, m")
(unspec:V_2REG
  [(match_operand:V_2REG 1 "general_operand"   "vDB, m, v")]
@@ -672,7 +672,7 @@ (define_insn "mov_sgprbase"
   [(set_attr "type" "vmult,*,*")
(set_attr "length" "8,12,12")])
 
-(define_insn "mov_sgprbase"
+(define_insn "@mov_sgprbase"
   [(set (match_operand:V_4REG 0 "nonimmediate_operand")
(unspec:V_4REG
  [(match_operand:V_4REG 1 "general_operand")]
@@ -685,31 +685,6 @@ (define_insn "mov_sgprbase"
   [m,v  ,*,12] #
   })
 
-; reload_in was once a standard name, but here it's only referenced by
-; gcn_secondary_reload.  It allows a reload with a scratch register.
-
-(define_expand "reload_in"
-  [(set (match_operand:V_MOV 0 "register_operand" "= v")
-   (match_operand:V_MOV 1 "memory_operand"   "  m"))
-   (clobber (match_operand: 2 "register_operand" "="))]
-  ""
-  {
-emit_insn (gen_mov_sgprbase (operands[0], operands[1], operands[2]));
-DONE;
-  })
-
-; reload_out is similar to reload_in, above.
-
-(define_expand "reload_out"
-  [(set (match_operand:V_MOV 0 "memory_operand"  "= m")
-   (match_operand:V_MOV 1 "register_operand" "  v"))
-   (clobber (match_operand: 2 "register_operand" "="))]
-  ""
-  {
-emit_insn (gen_mov_sgprbase (operands[0], operands[1], operands[2]));
-DONE;
-  })
-
 ; Expand scalar addresses into gather/scatter patterns
 
 (define_split
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index ac299259213..28065c50bfd 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -1388,64 +1388,6 @@ GEN_VN_NOEXEC (vec_series,si, A(rtx dest, rtx x, rtx c), 
A(dest, x, c))
 #undef GET_VN_FN
 #undef A
 
-/* Get icode for vector instructions without an optab.  */
-
-#define CODE_FOR(PREFIX, SUFFIX) \
-static int \
-get_code_for_##PREFIX##vN##SUFFIX (int nunits) \
-{ \
-  switch (nunits) \
-{ \
-case 2: return CODE_FOR_##PREFIX##v2##SUFFIX; \
-case 4: return CODE_FOR_##PREFIX##v4##SUFFIX; \
-case 8: return CODE_FOR_##PREFIX##v8##SUFFIX; \
-case 16: return CODE_FOR_##PREFIX##v16##SUFFIX; \
-case 32: return CODE_FOR_##PREFIX##v32##SUFFIX; \
-case 64: return CODE_FOR_##PREFIX##v64##SUFFIX; \
-} \
-  \
-  gcc_unreachable (); \
-  return CODE_FOR_nothing; \
-}
-
-#define CODE_FOR_OP(PREFIX) \
- CODE_FOR (PREFIX, qi) \
-   CODE_FOR (PREFIX, hi) \
-   CODE_FOR (PREFIX, hf) \
-   CODE_FOR (PREFIX, si) \
-   CODE_FOR (PREFIX, sf) \
-   CODE_FOR (PREFIX, di) \
-   CODE_FOR (PREFIX, df) \
-   CODE_FOR (PREFIX, ti) \
-static int \
-get_code_for_##PREFIX (machine_mode mode) \
-{ \
-  int vf = GET_MODE_NUNITS (mode); \
-  machine_mode smode = GET_MODE_INNER (mode); \
-  \
-  switch (smode) \
-{ \
-case E_QImode: return get_code_for_##PREFIX##vNqi (vf); \
-case E_HImode: return get_code_for_##PREFIX##vNhi (vf); \
-case E_HFmode: return get_code_for_##PREFIX##vNhf (vf); \
-case E_SImode: return get_code_for_##PREFIX##vNsi (vf); \
-case E_SFmode: return 

RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits

2023-11-15 Thread Richard Biener
On Wed, 15 Nov 2023, Tamar Christina wrote:

> Patch updated to trunk.
> 
> This adds support to vectorizable_live_reduction to handle multiple exits by

vectorizable_live_operation, but I do wonder how you handle reductions?

> doing a search for which exit the live value should be materialized in.
> 
> Additinally which value in the index we're after depends on whether the exit
> it's materialized in is an early exit or whether the loop's main exit is
> different from the loop's natural one (i.e. the one with the same src block as
> the latch).
> 
> In those two cases we want the first rather than the last value as we're going
> to restart the iteration in the scalar loop.  For VLA this means we need to
> reverse both the mask and vector since there's only a way to get the last
> active element and not the first.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
>   * tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
>   * tree-vectorizer.h (perm_mask_for_reverse): Expose.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> 4cf7f65dc164db27a498b31fe7ce0d9af3f3e299..2476e59ef488fd0a3b296ced7b0d4d3e76a3634f
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10627,12 +10627,60 @@ vectorizable_live_operation (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>  lhs' = new_tree;  */
>  
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +  /* A value can only be live in one exit.  So figure out which one.  */

Well, a value can be live across multiple exits!

> +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +  /* Check if we have a loop where the chosen exit is not the main exit,
> +  in these cases for an early break we restart the iteration the vector 
> code
> +  did.  For the live values we want the value at the start of the 
> iteration
> +  rather than at the end.  */
> +  bool restart_loop = false;
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> + {
> +   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> + if (!is_gimple_debug (use_stmt)
> + && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> +   {

In fact when you get here you know the use is in a LC PHI.  Use
FOR_EACH_IMM_USE_FAST and you can get at the edge
via phi_arg_index_from_use and gimple_phi_arg_edge.

As said you have to process all exits the value is live on, not only
the first.

> + basic_block use_bb = gimple_bb (use_stmt);
> + for (auto edge : get_loop_exit_edges (loop))
> +   {
> + /* Alternative exits can have an intermediate BB in
> +between to update the IV.  In those cases we need to
> +look one block further.  */
> + if (use_bb == edge->dest
> + || (single_succ_p (edge->dest)
> + && use_bb == single_succ (edge->dest)))
> +   {
> + exit_e = edge;
> + goto found;
> +   }
> +   }
> +   }
> +found:
> +   /* If the edge isn't a single pred then split the edge so we have a
> +  location to place the live operations.  Perhaps we should always
> +  split during IV updating.  But this way the CFG is cleaner to
> +  follow.  */
> +   restart_loop = !vect_is_loop_exit_latch_pred (exit_e, loop);
> +   if (!single_pred_p (exit_e->dest))
> + exit_e = single_pred_edge (split_edge (exit_e));
> +
> +   /* For early exit where the exit is not in the BB that leads to the
> +  latch then we're restarting the iteration in the scalar loop. So
> +  get the first live value.  */
> +   if (restart_loop)
> + {
> +   vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> +   vec_lhs = gimple_get_lhs (vec_stmt);
> +   bitstart = build_zero_cst (TREE_TYPE (bitstart));

No, this doesn't work for SLP.  Note this also gets you the "first" live
value _after_ the vector iteration.  Btw, I fail to see why you need
to handle STMT_VINFO_LIVE at all for the early exits - this is
scalar values live _after_ all iterations of the loop, thus it's
provided by the scalar epilog that always runs when we exit the vector
loop early.

The story is different for reductions though (unless we fail to support
early breaks for those at the moment).

Richard.


> + }
> + }
> +
> +  basic_block exit_bb = exit_e->dest;
>gcc_assert (single_pred_p (exit_bb));
>  
>tree vec_lhs_phi = copy_ssa_name (vec_lhs);
>gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> -  SET_PHI_ARG_DEF (phi, 

Re: [PATCH] s390: Fix generation of s390-gen-builtins.h

2023-11-15 Thread Andreas Krebbel
On 11/15/23 14:29, Stefan Schulze Frielinghaus wrote:
> By default the preprocessed output includes linemarkers.  This leads to
> an error if -pedantic is used as e.g. during bootstrap:
> 
> s390-gen-builtins.h:1:3: error: style of line directive is a GCC extension 
> [-Werror]
> 
> Fixed by omitting linemarkers while generating s390-gen-builtins.h.
> 
> gcc/ChangeLog:
> 
>   * config/s390/t-s390: Generate s390-gen-builtins.h without
>   linemarkers.

Ok, Thanks!

Andreas


> ---
>  gcc/config/s390/t-s390 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/s390/t-s390 b/gcc/config/s390/t-s390
> index 4ab9718f6e2..2e884c367de 100644
> --- a/gcc/config/s390/t-s390
> +++ b/gcc/config/s390/t-s390
> @@ -33,4 +33,4 @@ s390-d.o: $(srcdir)/config/s390/s390-d.cc
>   $(POSTCOMPILE)
>  
>  s390-gen-builtins.h: $(srcdir)/config/s390/s390-builtins.h
> - $(COMPILER) -E $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $< > $@
> + $(COMPILER) -E -P $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $< > 
> $@



[PATCH] s390: Fix generation of s390-gen-builtins.h

2023-11-15 Thread Stefan Schulze Frielinghaus
By default the preprocessed output includes linemarkers.  This leads to
an error if -pedantic is used as e.g. during bootstrap:

s390-gen-builtins.h:1:3: error: style of line directive is a GCC extension 
[-Werror]

Fixed by omitting linemarkers while generating s390-gen-builtins.h.

gcc/ChangeLog:

* config/s390/t-s390: Generate s390-gen-builtins.h without
linemarkers.
---
 gcc/config/s390/t-s390 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/t-s390 b/gcc/config/s390/t-s390
index 4ab9718f6e2..2e884c367de 100644
--- a/gcc/config/s390/t-s390
+++ b/gcc/config/s390/t-s390
@@ -33,4 +33,4 @@ s390-d.o: $(srcdir)/config/s390/s390-d.cc
$(POSTCOMPILE)
 
 s390-gen-builtins.h: $(srcdir)/config/s390/s390-builtins.h
-   $(COMPILER) -E $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $< > $@
+   $(COMPILER) -E -P $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $< > 
$@
-- 
2.41.0



RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

2023-11-15 Thread Richard Biener
On Wed, 15 Nov 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, November 15, 2023 1:01 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> > breaks and arbitrary exits
> > 
> > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > 
> > > Patch updated to latest trunk:
> > >
> > > Hi All,
> > >
> > > This changes the PHI node updates to support early breaks.
> > > It has to support both the case where the loop's exit matches the
> > > normal loop exit and one where the early exit is "inverted", i.e. it's an 
> > > early
> > exit edge.
> > >
> > > In the latter case we must always restart the loop for VF iterations.
> > > For an early exit the reason is obvious, but there are cases where the
> > > "normal" exit is located before the early one.  This exit then does a
> > > check on ivtmp resulting in us leaving the loop since it thinks we're 
> > > done.
> > >
> > > In these case we may still have side-effects to perform so we also go
> > > to the scalar loop.
> > >
> > > For the "normal" exit niters has already been adjusted for peeling,
> > > for the early exits we must find out how many iterations we actually
> > > did.  So we have to recalculate the new position for each exit.
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide
> > unused.
> > >   (vect_update_ivs_after_vectorizer): Support early break.
> > >   (vect_do_peeling): Use it.
> > >
> > > --- inline copy of patch ---
> > >
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > >
> > d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> > d2654cf1
> > > c842baac58f5 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512
> > (class loop *loop,
> > > loop handles exactly VF scalars per iteration.  */
> > >
> > >  static gcond *
> > > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > > exit_edge,
> > > +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> > > +exit_edge,
> > >   class loop *loop, tree niters, tree step,
> > >   tree final_iv, bool niters_maybe_zero,
> > >   gimple_stmt_iterator loop_cond_gsi) @@ -
> > 1412,7 +1412,7 @@
> > > vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info
> > loop_vinfo
> > > When this happens we need to flip the understanding of main and other
> > > exits by peeling and IV updates.  */
> > >
> > > -bool inline
> > > +bool
> > >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> > >return single_pred (loop->latch) == loop_exit->src; @@ -2142,6
> > > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> > >   Input:
> > >   - LOOP - a loop that is going to be vectorized. The last few 
> > > iterations
> > >of LOOP were peeled.
> > > + - VF   - The chosen vectorization factor for LOOP.
> > >   - NITERS - the number of iterations that LOOP executes (before it is
> > >  vectorized). i.e, the number of times the ivs should be 
> > > bumped.
> > >   - UPDATE_E - a successor edge of LOOP->exit that is on the
> > > (only) path
> > 
> > the comment on this is now a bit misleading, can you try to update it and/or
> > move the comment bits to the docs on EARLY_EXIT?
> > 
> > > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info
> > loop_vinfo)
> > >The phi args associated with the edge UPDATE_E in the 
> > > bb
> > >UPDATE_E->dest are updated accordingly.
> > >
> > > + - restart_loop - Indicates whether the scalar loop needs to
> > > + restart the
> > 
> > params are ALL_CAPS
> > 
> > > +   iteration count where the vector loop began.
> > > +
> > >   Assumption 1: Like the rest of the vectorizer, this function assumes
> > >   a single loop exit that has a single predecessor.
> > >
> > > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info
> > loop_vinfo)
> > >   */
> > >
> > >  static void
> > > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > -   tree niters, edge update_e)
> > > +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > +poly_uint64 vf,
> > 
> > LOOP_VINFO_VECT_FACTOR?
> > 
> > > +   tree niters, edge update_e, bool
> > restart_loop)
> > 
> > I think 'bool early_exit' is better here?  I wonder if we have an "early"
> > exit after the main exit we are probably sure there are no side-effects to 
> > re-
> > execute and could avoid this restarting?
> 
> Side effects yes, but the actual check may not have been performed yet.
> If 

[PATCH] s390: implement flags output

2023-11-15 Thread Juergen Christ
Implement flags output for inline assemblies.  Only use one output constraint
that captures the whole condition code.  No breakout into different condition
codes is allowed.  Also, only one condition code variable is allowed.

Add further logic to canonicalize various cases where we combine different
cases of possible condition codes.

Bootstrapped and tested on s390.  OK for mainline?

gcc/ChangeLog:

* config/s390/s390-c.cc (s390_cpu_cpp_builtins): Define
__GCC_ASM_FLAG_OUTPUTS__.
* config/s390/s390.cc (s390_canonicalize_comparison): More
UNSPEC_CC_TO_INT cases.
(s390_md_asm_adjust): Implement flags output.
* config/s390/s390.md (ccstore4): Allow mask operands.
* doc/extend.texi: Document flags output.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ccor.c: New test.

Signed-off-by: Juergen Christ 
---
 gcc/config/s390/s390-c.cc|   1 +
 gcc/config/s390/s390.cc  | 139 ++-
 gcc/config/s390/s390.md  |   8 +-
 gcc/doc/extend.texi  |   5 +
 gcc/testsuite/gcc.target/s390/ccor.c |  88 +
 5 files changed, 232 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/ccor.c

diff --git a/gcc/config/s390/s390-c.cc b/gcc/config/s390/s390-c.cc
index 269f4f8e978d..c126e6d323d7 100644
--- a/gcc/config/s390/s390-c.cc
+++ b/gcc/config/s390/s390-c.cc
@@ -409,6 +409,7 @@ s390_cpu_cpp_builtins (cpp_reader *pfile)
 cpp_define (pfile, "__LONG_DOUBLE_128__");
   cl_target_option_save (, _options, _options_set);
   s390_cpu_cpp_builtins_internal (pfile, , NULL);
+  cpp_define (pfile, "__GCC_ASM_FLAG_OUTPUTS__");
 }
 
 #if S390_USE_TARGET_ATTRIBUTE
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 61c5f88de8af..a19dd7849b84 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -1877,6 +1877,97 @@ s390_canonicalize_comparison (int *code, rtx *op0, rtx 
*op1,
  *code = new_code;
}
 }
+  /* Remove UNSPEC_CC_TO_INT from connectives.  This happens for
+ checks against multiple condition codes. */
+  if (GET_CODE (*op0) == AND
+  && GET_CODE (XEXP (*op0, 0)) == UNSPEC
+  && XINT (XEXP (*op0, 0), 1) == UNSPEC_CC_TO_INT
+  && XVECLEN (XEXP (*op0, 0), 0) == 1
+  && REGNO (XVECEXP (XEXP (*op0, 0), 0, 0)) == CC_REGNUM
+  && CONST_INT_P (XEXP (*op0, 1))
+  && CONST_INT_P (*op1)
+  && INTVAL (XEXP (*op0, 1)) == -3
+  && *code == EQ)
+{
+  if (INTVAL (*op1) == 0)
+   {
+ /* case cc == 0 || cc = 2 => mask = 0xa */
+ *op0 = XVECEXP (XEXP (*op0, 0), 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0xa);
+   }
+  else if (INTVAL (*op1) == 1)
+   {
+ /* case cc == 1 || cc == 3 => mask = 0x5 */
+ *op0 = XVECEXP (XEXP (*op0, 0), 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0x5);
+   }
+}
+  if (GET_CODE (*op0) == PLUS
+  && GET_CODE (XEXP (*op0, 0)) == UNSPEC
+  && XINT (XEXP (*op0, 0), 1) == UNSPEC_CC_TO_INT
+  && XVECLEN (XEXP (*op0, 0), 0) == 1
+  && REGNO (XVECEXP (XEXP (*op0, 0), 0, 0)) == CC_REGNUM
+  && CONST_INT_P (XEXP (*op0, 1))
+  && CONST_INT_P (*op1)
+  && (*code == LEU || *code == GTU))
+{
+  if (INTVAL (*op1) == 1)
+   {
+ if (INTVAL (XEXP (*op0, 1)) == -1)
+   {
+ /* case cc == 1 || cc == 2 => mask = 0x6 */
+ *op0 = XVECEXP (XEXP (*op0, 0), 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0x6);
+ *code = *code == GTU ? NE : EQ;
+   }
+ else if (INTVAL (XEXP (*op0, 1)) == -2)
+   {
+ /* case cc == 2 || cc == 3 => mask = 0x3 */
+ *op0 = XVECEXP (XEXP (*op0, 0), 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0x3);
+ *code = *code == GTU ? NE : EQ;
+   }
+   }
+  else if (INTVAL (*op1) == 2
+  && INTVAL (XEXP (*op0, 1)) == -1)
+   {
+ /* case cc == 1 || cc == 2 || cc == 3 => mask = 0x7 */
+ *op0 = XVECEXP (XEXP (*op0, 0), 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0x7);
+ *code = *code == GTU ? NE : EQ;
+   }
+}
+  else if (*code == LEU || *code == GTU)
+{
+  if (GET_CODE (*op0) == UNSPEC
+ && XINT (*op0, 1) == UNSPEC_CC_TO_INT
+ && XVECLEN (*op0, 0) == 1
+ && REGNO (XVECEXP (*op0, 0, 0)) == CC_REGNUM
+ && CONST_INT_P (*op1))
+   {
+ if (INTVAL (*op1) == 1)
+   {
+ /* case cc == 0 || cc == 1 => mask = 0xc */
+ *op0 = XVECEXP (*op0, 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0xc);
+ *code = *code == GTU ? NE : EQ;
+   }
+ else if (INTVAL (*op1) == 2)
+   {
+ /* case cc == 0 || cc == 1 || cc == 2 => mask = 0xd */
+ *op0 = XVECEXP (*op0, 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0xd);
+ 

[PATCH] s390: split int128 load

2023-11-15 Thread Juergen Christ
Issue two loads when using GPRs instead of one load-multiple.

Bootstrapped and tested on s390.  OK for mainline?

gcc/ChangeLog:

* config/s390/s390.md: Split TImode loads.

gcc/testsuite/ChangeLog:

* gcc.target/s390/int128load.c: New test.

Signed-off-by: Juergen Christ 
---
 gcc/config/s390/s390.md|  4 
 gcc/testsuite/gcc.target/s390/int128load.c | 14 ++
 2 files changed, 14 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/int128load.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 3f29ba214427..5bff69aeb350 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -1687,8 +1687,6 @@
   [(set (match_operand:TI 0 "nonimmediate_operand" "")
 (match_operand:TI 1 "general_operand" ""))]
   "TARGET_ZARCH && reload_completed
-   && !s_operand (operands[0], TImode)
-   && !s_operand (operands[1], TImode)
&& s390_split_ok_p (operands[0], operands[1], TImode, 0)"
   [(set (match_dup 2) (match_dup 4))
(set (match_dup 3) (match_dup 5))]
@@ -1703,8 +1701,6 @@
   [(set (match_operand:TI 0 "nonimmediate_operand" "")
 (match_operand:TI 1 "general_operand" ""))]
   "TARGET_ZARCH && reload_completed
-   && !s_operand (operands[0], TImode)
-   && !s_operand (operands[1], TImode)
&& s390_split_ok_p (operands[0], operands[1], TImode, 1)"
   [(set (match_dup 2) (match_dup 4))
(set (match_dup 3) (match_dup 5))]
diff --git a/gcc/testsuite/gcc.target/s390/int128load.c 
b/gcc/testsuite/gcc.target/s390/int128load.c
new file mode 100644
index ..35d5380704b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/int128load.c
@@ -0,0 +1,14 @@
+/* Check that int128 loads and stores are split.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -mzarch -march=zEC12" } */
+
+__int128 global;
+
+void f(__int128 x)
+{
+  global = x;
+}
+
+/* { dg-final { scan-assembler-times "lg\t" 2 } } */
+/* { dg-final { scan-assembler-times "stg\t" 2 } } */
-- 
2.39.3



[PATCH] s390: Fix ICE in testcase pr89233

2023-11-15 Thread Juergen Christ
When using GNU vector extensions, an access outside of the vector size
caused an ICE on s390.  Fix this by aligning with the vec_extract
builtin, i.e., computing constant index modulo number of lanes.

Fixes testcase gcc.target/s390/pr89233.c.

Bootstrapped and tested on s390.  OK for mainline?

gcc/ChangeLog:

* config/s390/vector.md: (*vec_extract) Fix.

Signed-off-by: Juergen Christ 
---
 gcc/config/s390/vector.md | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 7d1eb36e8446..deda5990a035 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -532,12 +532,14 @@
  (match_operand:V1 "nonmemory_operand"  "v,v")
  (parallel
   [(match_operand:SI 2 "nonmemory_operand" "an,I")])))]
-  "TARGET_VX
-   && (!CONST_INT_P (operands[2])
-   || UINTVAL (operands[2]) < GET_MODE_NUNITS (mode))"
-  "@
-   vlgv\t%0,%v1,%Y2
-   vste\t%v1,%0,%2"
+  "TARGET_VX"
+  {
+if (CONST_INT_P (operands[2]))
+ operands[2] = GEN_INT (UINTVAL (operands[2]) & (GET_MODE_NUNITS 
(mode) - 1));
+if (which_alternative == 0)
+  return "vlgv\t%0,%v1,%Y2";
+   return "vste\t%v1,%0,%2";
+  }
   [(set_attr "op_type" "VRS,VRX")])
 
 ; vlgvb, vlgvh, vlgvf, vlgvg
-- 
2.39.3



RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

2023-11-15 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, November 15, 2023 1:01 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> On Wed, 15 Nov 2023, Tamar Christina wrote:
> 
> > Patch updated to latest trunk:
> >
> > Hi All,
> >
> > This changes the PHI node updates to support early breaks.
> > It has to support both the case where the loop's exit matches the
> > normal loop exit and one where the early exit is "inverted", i.e. it's an 
> > early
> exit edge.
> >
> > In the latter case we must always restart the loop for VF iterations.
> > For an early exit the reason is obvious, but there are cases where the
> > "normal" exit is located before the early one.  This exit then does a
> > check on ivtmp resulting in us leaving the loop since it thinks we're done.
> >
> > In these case we may still have side-effects to perform so we also go
> > to the scalar loop.
> >
> > For the "normal" exit niters has already been adjusted for peeling,
> > for the early exits we must find out how many iterations we actually
> > did.  So we have to recalculate the new position for each exit.
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide
> unused.
> > (vect_update_ivs_after_vectorizer): Support early break.
> > (vect_do_peeling): Use it.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> >
> d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> d2654cf1
> > c842baac58f5 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512
> (class loop *loop,
> > loop handles exactly VF scalars per iteration.  */
> >
> >  static gcond *
> > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > exit_edge,
> > +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> > +exit_edge,
> > class loop *loop, tree niters, tree step,
> > tree final_iv, bool niters_maybe_zero,
> > gimple_stmt_iterator loop_cond_gsi) @@ -
> 1412,7 +1412,7 @@
> > vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info
> loop_vinfo
> > When this happens we need to flip the understanding of main and other
> > exits by peeling and IV updates.  */
> >
> > -bool inline
> > +bool
> >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> >return single_pred (loop->latch) == loop_exit->src; @@ -2142,6
> > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> >   Input:
> >   - LOOP - a loop that is going to be vectorized. The last few 
> > iterations
> >of LOOP were peeled.
> > + - VF   - The chosen vectorization factor for LOOP.
> >   - NITERS - the number of iterations that LOOP executes (before it is
> >  vectorized). i.e, the number of times the ivs should be 
> > bumped.
> >   - UPDATE_E - a successor edge of LOOP->exit that is on the
> > (only) path
> 
> the comment on this is now a bit misleading, can you try to update it and/or
> move the comment bits to the docs on EARLY_EXIT?
> 
> > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info
> loop_vinfo)
> >The phi args associated with the edge UPDATE_E in the bb
> >UPDATE_E->dest are updated accordingly.
> >
> > + - restart_loop - Indicates whether the scalar loop needs to
> > + restart the
> 
> params are ALL_CAPS
> 
> > + iteration count where the vector loop began.
> > +
> >   Assumption 1: Like the rest of the vectorizer, this function assumes
> >   a single loop exit that has a single predecessor.
> >
> > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info
> loop_vinfo)
> >   */
> >
> >  static void
> > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > - tree niters, edge update_e)
> > +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > +poly_uint64 vf,
> 
> LOOP_VINFO_VECT_FACTOR?
> 
> > + tree niters, edge update_e, bool
> restart_loop)
> 
> I think 'bool early_exit' is better here?  I wonder if we have an "early"
> exit after the main exit we are probably sure there are no side-effects to re-
> execute and could avoid this restarting?

Side effects yes, but the actual check may not have been performed yet.
If you remember https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
There in the clz loop through the "main" exit you still have to see if that 
iteration
did not contain the entry.  This is because the loop counter is incremented
before you iterate.

> 
> >  {
> >gphi_iterator gsi, 

RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

2023-11-15 Thread Richard Biener
On Wed, 15 Nov 2023, Tamar Christina wrote:

> Patch updated to latest trunk:
> 
> Hi All,
> 
> This changes the PHI node updates to support early breaks.
> It has to support both the case where the loop's exit matches the normal loop
> exit and one where the early exit is "inverted", i.e. it's an early exit edge.
> 
> In the latter case we must always restart the loop for VF iterations.  For an
> early exit the reason is obvious, but there are cases where the "normal" exit
> is located before the early one.  This exit then does a check on ivtmp 
> resulting
> in us leaving the loop since it thinks we're done.
> 
> In these case we may still have side-effects to perform so we also go to the
> scalar loop.
> 
> For the "normal" exit niters has already been adjusted for peeling, for the
> early exits we must find out how many iterations we actually did.  So we have
> to recalculate the new position for each exit.
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide unused.
>   (vect_update_ivs_after_vectorizer): Support early break.
>   (vect_do_peeling): Use it.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3d2654cf1c842baac58f5
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class 
> loop *loop,
> loop handles exactly VF scalars per iteration.  */
>  
>  static gcond *
> -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
> +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge 
> exit_edge,
>   class loop *loop, tree niters, tree step,
>   tree final_iv, bool niters_maybe_zero,
>   gimple_stmt_iterator loop_cond_gsi)
> @@ -1412,7 +1412,7 @@ vect_set_loop_condition (class loop *loop, edge loop_e, 
> loop_vec_info loop_vinfo
> When this happens we need to flip the understanding of main and other
> exits by peeling and IV updates.  */
>  
> -bool inline
> +bool
>  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
>  {
>return single_pred (loop->latch) == loop_exit->src;
> @@ -2142,6 +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
>   Input:
>   - LOOP - a loop that is going to be vectorized. The last few iterations
>of LOOP were peeled.
> + - VF   - The chosen vectorization factor for LOOP.
>   - NITERS - the number of iterations that LOOP executes (before it is
>  vectorized). i.e, the number of times the ivs should be 
> bumped.
>   - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path

the comment on this is now a bit misleading, can you try to update it
and/or move the comment bits to the docs on EARLY_EXIT?

> @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
>The phi args associated with the edge UPDATE_E in the bb
>UPDATE_E->dest are updated accordingly.
>  
> + - restart_loop - Indicates whether the scalar loop needs to restart the

params are ALL_CAPS

> +   iteration count where the vector loop began.
> +
>   Assumption 1: Like the rest of the vectorizer, this function assumes
>   a single loop exit that has a single predecessor.
>  
> @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
>   */
>  
>  static void
> -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> -   tree niters, edge update_e)
> +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, poly_uint64 vf,

LOOP_VINFO_VECT_FACTOR?

> +   tree niters, edge update_e, bool restart_loop)

I think 'bool early_exit' is better here?  I wonder if we have an "early"
exit after the main exit we are probably sure there are no side-effects
to re-execute and could avoid this restarting?

>  {
>gphi_iterator gsi, gsi1;
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>basic_block update_bb = update_e->dest;
> -
> -  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> -
> -  /* Make sure there exists a single-predecessor exit bb:  */
> -  gcc_assert (single_pred_p (exit_bb));
> -  gcc_assert (single_succ_edge (exit_bb) == update_e);
> +  bool inversed_iv
> + = !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> +  LOOP_VINFO_LOOP (loop_vinfo));
> +  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> + && flow_bb_inside_loop_p (loop, update_e->src);
> +  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +  gcond *cond = get_loop_exit_condition (loop_e);
> +  basic_block exit_bb = loop_e->dest;
> +  basic_block iv_block 

RE: [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks

2023-11-15 Thread Richard Biener
On Wed, 15 Nov 2023, Tamar Christina wrote:

> Patch updated to latest trunk,
> 
> This splits the part of the function that does peeling for loops at exits to
> a different function.  In this new function we also peel for early breaks.
> 
> Peeling for early breaks works by redirecting all early break exits to a
> single "early break" block and combine them and the normal exit edge together
> later in a different block which then goes into the epilog preheader.
> 
> This allows us to re-use all the existing code for IV updates, Additionally 
> this
> also enables correct linking for multiple vector epilogues.
> 
> flush_pending_stmts cannot be used in this scenario since it updates the PHI
> nodes in the order that they are in the exit destination blocks.  This means
> they are in CFG visit order.  With a single exit this doesn't matter but with
> multiple exits with different live values through the different exits the 
> order
> usually does not line up.
> 
> Additionally the vectorizer helper functions expect to be able to iterate over
> the nodes in the order that they occur in the loop header blocks.  This is an
> invariant we must maintain.  To do this we just inline the work
> flush_pending_stmts but maintain the order by using the header blocks to guide
> the work.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop-manip.cc (vect_is_loop_exit_latch_pred): New.
>   (slpeel_tree_duplicate_loop_for_vectorization): New.
>   (slpeel_tree_duplicate_loop_to_edge_cfg): use it.
>   * tree-vectorizer.h (is_loop_header_bb_p): Drop assert.
>   (slpeel_tree_duplicate_loop_to_edge_cfg): Update signature.
>   (vect_is_loop_exit_latch_pred): New.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> b9161274ce401a7307f3e61ad23aa036701190d7..fafbf924e8db18eb4eec7a4a1906d10f6ce9812f
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1392,6 +1392,153 @@ vect_set_loop_condition (class loop *loop, edge 
> loop_e, loop_vec_info loop_vinfo
>(gimple *) cond_stmt);
>  }
>  
> +/* Determine if the exit choosen by the loop vectorizer differs from the
> +   natural loop exit.  i.e. if the exit leads to the loop patch or not.
> +   When this happens we need to flip the understanding of main and other
> +   exits by peeling and IV updates.  */
> +
> +bool inline
> +vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)

Ick, bad name - didn't see its use(s) in this patch?


> +{
> +  return single_pred (loop->latch) == loop_exit->src;
> +}
> +
> +/* Perform peeling for when the peeled loop is placed after the original 
> loop.
> +   This maintains LCSSA and creates the appropriate blocks for multiple exit
> +   vectorization.   */
> +
> +void static
> +slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge 
> loop_exit,
> +   vec _exits,
> +   class loop *new_loop,
> +   bool flow_loops,
> +   basic_block new_preheader)

also bad name ;)  I don't see a strong reason to factor this out.

> +{
> +  bool multiple_exits_p = loop_exits.length () > 1;
> +  basic_block main_loop_exit_block = new_preheader;
> +  if (multiple_exits_p)
> +{
> +  edge loop_entry = single_succ_edge (new_preheader);
> +  new_preheader = split_edge (loop_entry);
> +}
> +
> +  auto_vec  new_phis;
> +  hash_map  new_phi_args;
> +  /* First create the empty phi nodes so that when we flush the
> + statements they can be filled in.   However because there is no order
> + between the PHI nodes in the exits and the loop headers we need to
> + order them base on the order of the two headers.  First record the new
> + phi nodes. Then redirect the edges and flush the changes.  This writes 
> out
> + the new SSA names.  */
> +  for (auto gsi_from = gsi_start_phis (loop_exit->dest);
> +   !gsi_end_p (gsi_from); gsi_next (_from))
> +{
> +  gimple *from_phi = gsi_stmt (gsi_from);
> +  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> +  gphi *res = create_phi_node (new_res, main_loop_exit_block);
> +  new_phis.safe_push (res);
> +}
> +
> +  for (auto exit : loop_exits)
> +{
> +  basic_block dest
> + = exit == loop_exit ? main_loop_exit_block : new_preheader;
> +  redirect_edge_and_branch (exit, dest);
> +}
> +
> +  /* Only fush the main exit, the remaining exits we need to match the order
> + in the loop->header which with multiple exits may not be the same.  */
> +  flush_pending_stmts (loop_exit);
> +
> +  /* Record the new SSA names in the cache so that we can skip materializing
> + them again when we fill in the rest of the LCSSA 

Re: building GNU gettext on AIX

2023-11-15 Thread Bruno Haible
[CCing bug-gettext]

David Edelsohn wrote in
:
> The current gettext-0.22.3 fails to build for me on AIX.

Here are some hints to get a successful build of GNU gettext on AIX:

1. Set the recommended environment variables before running configure:
   https://gitlab.com/ghwiki/gnow-how/-/wikis/Platforms/Configuration

   Namely:
   * for a 32-bit build with gcc:
 CC=gcc
 CXX=g++
 CPPFLAGS="-I$PREFIX/include"
 LDFLAGS="-L$PREFIX/lib"
 unset AR NM
   * for a 32-bit build with xlc:
 CC="xlc -qthreaded -qtls"
 CXX="xlC -qthreaded -qtls"
 CPPFLAGS="-I$PREFIX/include"
 LDFLAGS="-L$PREFIX/lib"
 unset AR NM
   * for a 64-bit build with gcc:
 CC="gcc -maix64"
 CXX="g++ -maix64"
 CPPFLAGS="-I$PREFIX/include"
 LDFLAGS="-L$PREFIX/lib"
 AR="ar -X 64"; NM="nm -X 64 -B"
   * for a 64-bit build with xlc:
 CC="xlc -q64 -qthreaded -qtls"
 CXX="xlC -q64 -qthreaded -qtls"
 CPPFLAGS="-I$PREFIX/include"
 LDFLAGS="-L$PREFIX/lib"
 AR="ar -X 64"; NM="nm -X 64 -B"

   where $PREFIX is the value that you pass to the --prefix configure option.

   Rationale: you can run into all sorts of problems if you choose compiler
   options at random and haven't experience with compiler options on that
   platform.

2. Don't use ibm-clang.

   Rationale: It's broken.

3. Don't use -Wall with gcc 10.3.

   Rationale: If you specify -Wall, gettext's configure adds -fanalyzer, which
   has excessive memory requirements in gcc 10.x. In particular, on AIX, it
   makes cc1 crash while compiling regex.c after it has consumed 1 GiB of RAM.

4. Avoid using a --prefix that contains earlier installations of the same
   package.

   Rationale: Because the AIX linker hardcodes directory names in shared
   libraries, GNU libtool has a peculiar configuration on AIX. It ends up
   mixing the in-build-tree libraries with the libraries in the install
   locations, leading to all sorts of errors.

   If you really need to use a --prefix that contains an earlier
   installation of the same package:
 - Either use --disable-shared and remove libgettextlib.a and
   libgettextsrc.a from $PREFIX/lib before starting the build.
 - Or use a mix of "make -k", "make -k install" and ad-hoc workarounds
   that cannot be described in a general way.

Bruno





Re: [PATCH v4] gcc: Introduce -fhardened

2023-11-15 Thread Jakub Jelinek
On Fri, Nov 03, 2023 at 06:51:16PM -0400, Marek Polacek wrote:
> +  if (flag_hardened)
> + {
> +   if (!fortify_seen_p && optimize > 0)
> + {
> +   if (TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 35)
> + cpp_define (parse_in, "_FORTIFY_SOURCE=3");
> +   else
> + cpp_define (parse_in, "_FORTIFY_SOURCE=2");
> + }

I don't like the above in generic code, the fact that gcc was configured
against glibc target headers doesn't mean it is targetting glibc.
E.g. for most *-linux* targets, config/linux.opt provides the
-mbionic/-mglibc/-muclibc/-mmusl options.

One ugly way around would be to do
#ifdef OPTION_GLIBC
  if (OPTION_GLIBC && TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 35)
cpp_define (parse_in, "_FORTIFY_SOURCE=3");
  else
#endif
cpp_define (parse_in, "_FORTIFY_SOURCE=2");
(assuming OPTION_GLIBC at that point is already computed); a cleaner way
would be to introduce a target hook for that, say
fortify_source_default_level or something similar, where the default hook
would return 2 and next to linux_libc_has_function one would override it
for OPTION_GLIBC && TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 35
to 3.  That way, in the future other targets (say *BSD) can choose to do
something similar more easily.

The rest LGTM.

Jakub



  1   2   >