Re: Repost [PATCH 1/6] Add -mcpu=future

2024-02-20 Thread Michael Meissner
On Tue, Feb 20, 2024 at 06:35:34PM +0800, Kewen.Lin wrote:
> Hi Mike,
> 
> Sorry for late reply (just back from vacation).
> 
> on 2024/2/8 03:58, Michael Meissner wrote:
> > On Wed, Feb 07, 2024 at 05:21:10PM +0800, Kewen.Lin wrote:
> >> on 2024/2/6 14:01, Michael Meissner wrote:
> >> Sorry for the possible confusion here, the "tune_proc" that I referred to 
> >> is
> >> the variable in the above else branch:
> >>
> >>enum processor_type tune_proc = (TARGET_POWERPC64 ? PROCESSOR_DEFAULT64 
> >> : PROCESSOR_DEFAULT);
> >>
> >> It's either PROCESSOR_DEFAULT64 or PROCESSOR_DEFAULT, so it doesn't have a
> >> chance to be PROCESSOR_FUTURE, so the checking "tune_proc == 
> >> PROCESSOR_FUTURE"
> >> is useless.
> > 
> > PROCESSOR_DEFAULT can be PROCESSOR_FUTURE if somebody configures GCC with
> > --with-cpu=future.  While in general it shouldn't occur, it is helpful to
> > consider all of the corner cases.
> 
> But it sounds not true, I think you meant TARGET_CPU_DEFAULT instead?
> 
> On one local ppc64le machine I tried to configure with --with-cpu=power10,
> I got {,OPTION_}TARGET_CPU_DEFAULT "power10" but PROCESSOR_DEFAULT is still
> PROCESSOR_POWER7 (PROCESSOR_DEFAULT64 is PROCESSOR_POWER8).  I think these
> PROCESSOR_DEFAULT{,64} are defined by various headers:

Yes, I was mistaken.  You are correct TARGET_CPU_DEFAULT is set.  I will change
the comments.

> gcc/config/rs6000/aix71.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7
> gcc/config/rs6000/aix71.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER7
> gcc/config/rs6000/aix72.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7
> gcc/config/rs6000/aix72.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER7
> gcc/config/rs6000/aix73.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER8
> gcc/config/rs6000/aix73.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8
> gcc/config/rs6000/darwin.h:#define PROCESSOR_DEFAULT  PROCESSOR_PPC7400
> gcc/config/rs6000/darwin.h:#define PROCESSOR_DEFAULT64  PROCESSOR_POWER4
> gcc/config/rs6000/freebsd64.h:#define PROCESSOR_DEFAULT PROCESSOR_PPC7450
> gcc/config/rs6000/freebsd64.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8
> gcc/config/rs6000/linux64.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7
> gcc/config/rs6000/linux64.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8
> gcc/config/rs6000/rs6000.h:#define PROCESSOR_DEFAULT   PROCESSOR_PPC603
> gcc/config/rs6000/rs6000.h:#define PROCESSOR_DEFAULT64 PROCESSOR_RS64A
> gcc/config/rs6000/vxworks.h:#define PROCESSOR_DEFAULT PROCESSOR_PPC604
> 
> , and they are unlikely to be updated later, no?
> 
> btw, the given --with-cpu=future will make cpu_index never negative so
> 
>   ...
>   else if (cpu_index >= 0)
> rs6000_tune_index = tune_index = cpu_index;
>   else
> ... 
> 
> so there is no chance to enter "else" arm, that is, that arm only takes
> effect when no cpu/tune is given (neither -m{cpu,tune} nor --with-cpu=).

Note, this is existing code.  I didn't modify it.  If we want to change it, we
should do it as another patch.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] rs6000: Neuter option -mpower{8,9}-vector [PR109987]

2024-02-20 Thread Kewen.Lin
on 2024/2/21 09:37, Peter Bergner wrote:
> On 2/20/24 3:27 AM, Kewen.Lin wrote:
>> on 2024/2/20 02:45, Segher Boessenkool wrote:
>>> On Tue, Jan 16, 2024 at 10:50:01AM +0800, Kewen.Lin wrote:
 it consists of some aspects:
   - effective target powerpc_p{8,9}vector_ok are removed
 and replaced with powerpc_vsx_ok.
>>>
>>> So all such testcases already arrange to have p8 or p9 some other way?
> 
> Shouldn't that be replaced with powerpc_vsx instead of powerpc_vsx_ok?
> That way we know VSX code gen is enabled for the options being used,
> even those in RUNTESTFLAGS.
> 
> I thought we agreed that powerpc_vsx_ok was almost always useless and
> we always want to use powerpc_vsx.  ...or did I miss that we removed
> the old powerpc_vsx_ok and renamed powerpc_vsx to powerpc_vsx_ok?

Yes, I think we all agreed that powerpc_vsx matches more with what we
expect, but I'm hesitating to make such change at this stage because:

  1. if testing on an env without vsx support, the test results on these
 affected test cases may change a lot, as many test cases would
 become unsupported (they pass before with explicit -mvsx before).

  2. teach current powerpc_vsx to make use of current_compiler_flags
 just like some existing practices on has_arch_* may help mitigate
 it as not few test cases already have explicit -mvsx.  But AIUI
 current_complier_flags requires dg-options line comes first before
 the effective target line to make options in dg-options take
 effect, it means we need some work to adjust line order for the
 affected test cases.  On the other hand, some enhancement is needed
 for current_compiler_flags as powerpc_vsx (old powerpc_vsx_ok) isn't
 only used in test case but can be also used in some exp check
 where no expected flags exist.

  3. there may be some other similar effective target checks which we
 want to update as well, it means we need a re-visit on the existing
 effective target checks (rs6000 specific).

  4. powerpc_vsx_ok has been there for a long long time, and -mno-vsx
 is rarely used in RUNTESTFLAGS, this only affects testing, so it
 is not that urgent.

so I'm inclined to work on this in next stage 1.  What do you think?

> 
   - Some test cases are updated with explicit -mvsx.
   - Some test cases with those two option mixed are adjusted
 to keep the test points, like -mpower8-vector
 -mno-power9-vector are updated with -mdejagnu-cpu=power8
 -mvsx etc.
>>>
>>> -mcpu=power8 implies -mvsx already.
> 
> Then we can omit the explicit -msx option, correct?  Ie, if the
> user forces -mno-vsx in RUNTESTFLAGS, then we'll just skip the
> test case as UNSUPPORTED rather than trying to compile some
> vsx test case with vsx disabled via the options.

Yes, we can strip any -mvsx then, but if we want the test case
to be tested when it's able to, we can still append an extra
-mvsx.  Even if -mno-vsx is specified but if the option order
makes it like "-mno-vsx... -mvsx", powerpc_vsx is supported
so that the test case can be still tested well with -mvsx
enabled, while if the order is like "-mvsx ... -mno-vsx",
powerpc_vsx fails and it becomes unsupported.

BR,
Kewen



[PATCH v11 23/24] c++: Implement __is_invocable built-in trait

2024-02-20 Thread Ken Matsui
This patch implements built-in trait for std::is_invocable.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_invocable.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_INVOCABLE.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.
* cp-tree.h (build_invoke): New function.
* method.cc (build_invoke): New function.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_invocable.
* g++.dg/ext/is_invocable1.C: New test.
* g++.dg/ext/is_invocable2.C: New test.
* g++.dg/ext/is_invocable3.C: New test.
* g++.dg/ext/is_invocable4.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |   6 +
 gcc/cp/cp-trait.def  |   1 +
 gcc/cp/cp-tree.h |   2 +
 gcc/cp/method.cc | 132 +
 gcc/cp/semantics.cc  |   4 +
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |   3 +
 gcc/testsuite/g++.dg/ext/is_invocable1.C | 349 +++
 gcc/testsuite/g++.dg/ext/is_invocable2.C | 139 +
 gcc/testsuite/g++.dg/ext/is_invocable3.C |  51 
 gcc/testsuite/g++.dg/ext/is_invocable4.C |  33 +++
 10 files changed, 720 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable2.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable3.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable4.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 23ea66d9c12..c87b126fdb1 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3791,6 +3791,12 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_FUNCTION:
   inform (loc, "  %qT is not a function", t1);
   break;
+case CPTK_IS_INVOCABLE:
+  if (!t2)
+inform (loc, "  %qT is not invocable", t1);
+  else
+inform (loc, "  %qT is not invocable by %qE", t1, t2);
+  break;
 case CPTK_IS_LAYOUT_COMPATIBLE:
   inform (loc, "  %qT is not layout compatible with %qT", t1, t2);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 85056c8140b..6cb2b55f4ea 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -75,6 +75,7 @@ DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
 DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
 DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
 DEFTRAIT_EXPR (IS_FUNCTION, "__is_function", 1)
+DEFTRAIT_EXPR (IS_INVOCABLE, "__is_invocable", -1)
 DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
 DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
 DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 1)
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 334c11396c2..261d3a71faa 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7334,6 +7334,8 @@ extern tree get_copy_assign   (tree);
 extern tree get_default_ctor   (tree);
 extern tree get_dtor   (tree, tsubst_flags_t);
 extern tree build_stub_object  (tree);
+extern tree build_invoke   (tree, const_tree,
+tsubst_flags_t);
 extern tree strip_inheriting_ctors (tree);
 extern tree inherited_ctor_binfo   (tree);
 extern bool base_ctor_omit_inherited_parms (tree);
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 98c10e6a8b5..953f1bed6fc 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -1928,6 +1928,138 @@ build_trait_object (tree type)
   return build_stub_object (type);
 }
 
+/* [func.require] Build an expression of INVOKE(FN_TYPE, ARG_TYPES...).  If the
+   given is not invocable, returns error_mark_node.  */
+
+tree
+build_invoke (tree fn_type, const_tree arg_types, tsubst_flags_t complain)
+{
+  if (fn_type == error_mark_node || arg_types == error_mark_node)
+return error_mark_node;
+
+  gcc_assert (TYPE_P (fn_type));
+  gcc_assert (TREE_CODE (arg_types) == TREE_VEC);
+
+  /* Access check is required to determine if the given is invocable.  */
+  deferring_access_check_sentinel acs (dk_no_deferred);
+
+  /* INVOKE is an unevaluated context.  */
+  cp_unevaluated cp_uneval_guard;
+
+  bool is_ptrdatamem;
+  bool is_ptrmemfunc;
+  if (TREE_CODE (fn_type) == REFERENCE_TYPE)
+{
+  tree deref_fn_type = TREE_TYPE (fn_type);
+  is_ptrdatamem = TYPE_PTRDATAMEM_P (deref_fn_type);
+  is_ptrmemfunc = TYPE_PTRMEMFUNC_P (deref_fn_type);
+
+  /* Dereference fn_type if it is a pointer to member.  */
+  if (is_ptrdatamem || is_ptrmemfunc)
+   fn_type = deref_fn_type;
+}
+  else
+{
+  is_ptrdatamem = TYPE_PTRDATAMEM_P (fn_type);
+  is_ptrmemfunc = TYPE_PTRMEMFUNC_P (fn_type);
+}
+
+  if (is_ptrdatamem && TREE_VEC_LENGTH (arg_types) != 1)
+/* Only a pointer to data member with one argument is invocable.  */
+return error_mark_node;

Re: [PATCH] rs6000: Neuter option -mpower{8,9}-vector [PR109987]

2024-02-20 Thread Kewen.Lin
on 2024/2/20 19:19, Segher Boessenkool wrote:
> On Tue, Feb 20, 2024 at 05:27:07PM +0800, Kewen.Lin wrote:
>> Good question, it mainly follows the practice of option direct-move here.
>> IMHO at least for power8-vector we want WarnRemoved for now as it's
>> documented before, and we can probably make it (or them) removed later on
>> trunk once all active branch releases don't support it any more.
>>
>> What's your opinion on this?
> 
> Originally I did
>   Warn(%qs is deprecated)
> which already was a mistake.  It then changed to
>   Deprecated
> and then to
>   WarnRemoved
> which make it clearer that it is a bad plan.
> 
> If it is okay to remove an option, we should not talk about it at all
> anymore.  Well maybe warn about it for another release or so, but not
> longer.

OK, thanks for the suggestion.

> 
  (define_register_constraint "we" 
 "rs6000_constraints[RS6000_CONSTRAINT_we]"
 -  "@internal Like @code{wa}, if @option{-mpower9-vector} and 
 @option{-m64} are
 -   used; otherwise, @code{NO_REGS}.")
 +  "@internal Like @code{wa}, if the cpu type is power9 or up, meanwhile
 +   @option{-mvsx} and @option{-m64} are used; otherwise, @code{NO_REGS}.")
>>>
>>> "if this is a POWER9 or later and @option{-mvsx} and @option{-m64} are
>>> used".  How clumsy.  Maybe we should make the patterns that use "we"
>>> work without mtvsrdd as well?  Hrm, they will still require 64-bit GPRs
>>> of course, unless we can do something tricky.
>>>
>>> We do not need the special constraint at all of course (we can add these
>>> conditions to all patterns that use it: all *two* patterns).  So maybe
>>> that's what we should do :-)
>>
>> Not sure the original intention introducing it (Mike might know it best), but
>> removing it sounds doable.
> 
> It is for mtvsrdd.

Yes, I meant to say not sure if there was some obstacle which made us introduce
a new constraint, or just because it's simple.

> 
>>  btw, it seems more than two patterns using it?
>> like (if I didn't miss something):
>>   - vsx_concat_
>>   - vsx_splat__reg
>>   - vsx_splat_v4si_di
>>   - vsx_mov_64bit
> 
> Yes, it isn't clear we should use this contraint in those last two.  It
> looks like those do not even need the restriction to 64 bit systems.
> Well the last one obviously has that already, but then it could just use
> "wa", no?

For vsx_splat_v4si_di, it's for mtvsrws, ISA notes GPR[RA].bit[32:63] which
implies the context has 64bit GPR?  The last one seems still to distinguish
there is power9 support or not, just use "wa" which only implies power7
doesn't fit with it?

btw, the actual guard for "we" is TARGET_POWERPC64 rather than TARGET_64BIT,
the documentation isn't accurate enough.  Just filed internal issue #1345
for further tracking on this.

> 
>>> -mcpu=power8 implies -mvsx (power7 already).  You can disable VSX, or
>>> VMX as well, but by default it is enabled.
>>
>> Yes, it's meant to consider the explicitly -mno-vsx, which suffers the option
>> order issue.  But considering we raise error for -mno-vsx -mpower{8,9}-vector
>> before, without specifying -mvsx is closer to the previous.
>>
>> I'll adjust it and the below similar ones, thanks!
> 
> It is never supported to do unsupported things :-)
> 
> We need to be able to rely on defaults.  Otherwise, we will have to
> implement all of GCC recursively, in itself, in the testsuite, and in
> individual tests.  Let's not :-)

OK, fair enough.  Thanks!

BR,
Kewen



RE: [PATCH v1] RISC-V: Upgrade RVV intrinsic version to 0.12

2024-02-20 Thread Li, Pan2
Hi kito and juzhe.

There may be 2 items for double-confirm. Thanks a lot.

1. Not very sure if we need to upgrade the version for __riscv_th_v_intrinsic.
2. Do we need to upgrade the even a newer version (like 1.0) for the GCC 14 
release, or we can do it later.

Pan

-Original Message-
From: Li, Pan2  
Sent: Wednesday, February 21, 2024 12:27 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Li, Pan2 ; Wang, Yanzhang 
; kito.ch...@gmail.com
Subject: [PATCH v1] RISC-V: Upgrade RVV intrinsic version to 0.12

From: Pan Li 

Upgrade the version of RVV intrinsic from 0.11 to 0.12.

PR target/114017

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Upgrade
the version to 0.12.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-__riscv_v_intrinsic.c: Update the
version to 0.12.
* gcc.target/riscv/rvv/base/pr114017-1.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-c.cc   |  2 +-
 .../riscv/predef-__riscv_v_intrinsic.c|  2 +-
 .../gcc.target/riscv/rvv/base/pr114017-1.c| 19 +++
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 3ef06dcfd2d..3755ec0b8ef 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -139,7 +139,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 {
   builtin_define ("__riscv_vector");
   builtin_define_with_int_value ("__riscv_v_intrinsic",
-riscv_ext_version_value (0, 11));
+riscv_ext_version_value (0, 12));
 }
 
if (TARGET_XTHEADVECTOR)
diff --git a/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
index dbbedf54f87..07f1f159a8f 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
@@ -3,7 +3,7 @@
 
 int main () {
 
-#if __riscv_v_intrinsic != 11000
+#if __riscv_v_intrinsic != 12000
 #error "__riscv_v_intrinsic"
 #endif
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c
new file mode 100644
index 000..8eee7c68f71
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+
+vuint8mf2_t
+test (vuint16m1_t val, size_t shift, size_t vl)
+{
+#if __riscv_v_intrinsic == 11000
+  #warning "RVV Intrinsics v0.11"
+  return __riscv_vnclipu (val, shift, vl);
+#endif
+
+#if __riscv_v_intrinsic == 12000
+  #warning "RVV Intrinsics v0.12" /* { dg-warning "RVV Intrinsics v0.12" } */
+  return __riscv_vnclipu (val, shift, 0, vl);
+#endif
+}
+
-- 
2.34.1



[PATCH v1] RISC-V: Upgrade RVV intrinsic version to 0.12

2024-02-20 Thread pan2 . li
From: Pan Li 

Upgrade the version of RVV intrinsic from 0.11 to 0.12.

PR target/114017

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Upgrade
the version to 0.12.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-__riscv_v_intrinsic.c: Update the
version to 0.12.
* gcc.target/riscv/rvv/base/pr114017-1.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-c.cc   |  2 +-
 .../riscv/predef-__riscv_v_intrinsic.c|  2 +-
 .../gcc.target/riscv/rvv/base/pr114017-1.c| 19 +++
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 3ef06dcfd2d..3755ec0b8ef 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -139,7 +139,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 {
   builtin_define ("__riscv_vector");
   builtin_define_with_int_value ("__riscv_v_intrinsic",
-riscv_ext_version_value (0, 11));
+riscv_ext_version_value (0, 12));
 }
 
if (TARGET_XTHEADVECTOR)
diff --git a/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
index dbbedf54f87..07f1f159a8f 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-__riscv_v_intrinsic.c
@@ -3,7 +3,7 @@
 
 int main () {
 
-#if __riscv_v_intrinsic != 11000
+#if __riscv_v_intrinsic != 12000
 #error "__riscv_v_intrinsic"
 #endif
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c
new file mode 100644
index 000..8eee7c68f71
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114017-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+
+vuint8mf2_t
+test (vuint16m1_t val, size_t shift, size_t vl)
+{
+#if __riscv_v_intrinsic == 11000
+  #warning "RVV Intrinsics v0.11"
+  return __riscv_vnclipu (val, shift, vl);
+#endif
+
+#if __riscv_v_intrinsic == 12000
+  #warning "RVV Intrinsics v0.12" /* { dg-warning "RVV Intrinsics v0.12" } */
+  return __riscv_vnclipu (val, shift, 0, vl);
+#endif
+}
+
-- 
2.34.1



Re: [PATCH v2] LoongArch: Split loongarch_option_override_internal into smaller procedures

2024-02-20 Thread Yang Yujie
v1 -> v2:
- Rebased to master.
- Specifies "(void)" for the empty parameter list of loongarch_global_init.



[PATCH v2] LoongArch: Split loongarch_option_override_internal into smaller procedures

2024-02-20 Thread Yang Yujie
gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in: Mark -m[no-]recip as
aliases to -mrecip={all,none}.
* config/loongarch/loongarch.opt: Same.
* config/loongarch/loongarch-def.h: Modify ABI condition macros for
convenience.
* config/loongarch/loongarch-opts.cc: Define option-handling
procedures split from the original loongarch_option_override_internal.
* config/loongarch/loongarch-opts.h: Same.
* config/loongarch/loongarch.cc: Clean up
loongarch_option_override_internal.
---
 gcc/config/loongarch/genopts/loongarch.opt.in |   8 +-
 gcc/config/loongarch/loongarch-def.h  |  11 +-
 gcc/config/loongarch/loongarch-opts.cc| 248 +
 gcc/config/loongarch/loongarch-opts.h |  27 +-
 gcc/config/loongarch/loongarch.cc | 253 +++---
 gcc/config/loongarch/loongarch.opt|   8 +-
 6 files changed, 325 insertions(+), 230 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 02f918053f5..a77893d31d9 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -197,14 +197,14 @@ mexplicit-relocs
 Target Alias(mexplicit-relocs=, always, none)
 Use %reloc() assembly operators (for backward compatibility).
 
-mrecip
-Target RejectNegative Var(la_recip) Save
-Generate approximate reciprocal divide and square root for better throughput.
-
 mrecip=
 Target RejectNegative Joined Var(la_recip_name) Save
 Control generation of reciprocal estimates.
 
+mrecip
+Target Alias(mrecip=, all, none)
+Generate approximate reciprocal divide and square root for better throughput.
+
 ; The code model option names for -mcmodel.
 Enum
 Name(cmodel) Type(int)
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index 2dbf006d013..0cbf9476690 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -90,11 +90,16 @@ extern loongarch_def_array
 
 #define TO_LP64_ABI_BASE(C) (C)
 
-#define ABI_FPU_64(abi_base) \
+#define ABI_LP64_P(abi_base) \
+  (abi_base == ABI_BASE_LP64D \
+   || abi_base == ABI_BASE_LP64F \
+   || abi_base == ABI_BASE_LP64S)
+
+#define ABI_FPU64_P(abi_base) \
   (abi_base == ABI_BASE_LP64D)
-#define ABI_FPU_32(abi_base) \
+#define ABI_FPU32_P(abi_base) \
   (abi_base == ABI_BASE_LP64F)
-#define ABI_FPU_NONE(abi_base) \
+#define ABI_NOFPU_P(abi_base) \
   (abi_base == ABI_BASE_LP64S)
 
 
diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index 7eeac43ed2f..380208f38bf 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "tm.h"
 #include "obstack.h"
+#include "opts.h"
 #include "diagnostic-core.h"
 
 #include "loongarch-cpu.h"
@@ -32,8 +33,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "loongarch-str.h"
 #include "loongarch-def.h"
 
+/* Target configuration */
 struct loongarch_target la_target;
 
+/* RTL cost information */
+const struct loongarch_rtx_cost_data *loongarch_cost;
+
 /* ABI-related configuration.  */
 #define ABI_COUNT (sizeof(abi_priority_list)/sizeof(struct loongarch_abi))
 static const struct loongarch_abi
@@ -795,3 +800,246 @@ loongarch_update_gcc_opt_status (struct loongarch_target 
*target,
   /* ISA evolution features */
   opts->x_la_isa_evolution = target->isa.evolution;
 }
+
+/* -mrecip= handling */
+static struct
+  {
+const char *string;/* option name.  */
+unsigned int mask; /* mask bits to set.  */
+  }
+const recip_options[] = {
+  { "all",   RECIP_MASK_ALL },
+  { "none",  RECIP_MASK_NONE },
+  { "div",   RECIP_MASK_DIV },
+  { "sqrt",  RECIP_MASK_SQRT },
+  { "rsqrt", RECIP_MASK_RSQRT },
+  { "vec-div",   RECIP_MASK_VEC_DIV },
+  { "vec-sqrt",  RECIP_MASK_VEC_SQRT },
+  { "vec-rsqrt", RECIP_MASK_VEC_RSQRT },
+};
+
+/* Parser for -mrecip=.  */
+unsigned int
+loongarch_parse_mrecip_scheme (const char *recip_string)
+{
+  unsigned int result_mask = RECIP_MASK_NONE;
+
+  if (recip_string)
+{
+  char *p = ASTRDUP (recip_string);
+  char *q;
+  unsigned int mask, i;
+  bool invert;
+
+  while ((q = strtok (p, ",")) != NULL)
+   {
+ p = NULL;
+ if (*q == '!')
+   {
+ invert = true;
+ q++;
+   }
+ else
+   invert = false;
+
+ if (!strcmp (q, "default"))
+   mask = RECIP_MASK_ALL;
+ else
+   {
+ for (i = 0; i < ARRAY_SIZE (recip_options); i++)
+   if (!strcmp (q, recip_options[i].string))
+ {
+   mask = recip_options[i].mask;
+   break;
+ }
+
+

[PATCH v1] LoongArch: When checking whether the assembler supports conditional branch relaxation, add compilation parameter "--fatal-warnings" to the assembler.

2024-02-20 Thread Lulu Cheng
In binutils 2.40 and earlier versions, only a warning will be reported
when a relocation immediate value is out of bounds. As a result,
the value of the macro HAVE_AS_COND_BRANCH_RELAXATION will also be
defined as 1 when the assembler does not support conditional branch
relaxation. Therefore, add the compilation option "--fatal-warnings"
to avoid this problem.

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Add parameter "--fatal-warnings" to assemble
when checking whether the assemble support conditional branch
relaxation.
---
 gcc/configure| 2 +-
 gcc/configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 41b978b0380..f1d434fede0 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -31136,7 +31136,7 @@ else
nop
.endr
beq $a0,$a1,a' > conftest.s
-if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags --fatal-warnings -o conftest.o 
conftest.s >&5'
   { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
   (eval $ac_try) 2>&5
   ac_status=$?
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 72012d61e67..9ebc578e4cc 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -5486,7 +5486,7 @@ x:
[Define if your assembler supports -mrelax option.])])
 gcc_GAS_CHECK_FEATURE([conditional branch relaxation support],
   gcc_cv_as_loongarch_cond_branch_relax,
-  [],
+  [--fatal-warnings],
   [a:
.rept 32769
nop
-- 
2.39.3



Re: [PATCH v1 0/4] Fix a series of problems caused by

2024-02-20 Thread chenglulu

Sorry, this title is incomplete and has been resent.

在 2024/2/21 上午11:08, Lulu Cheng 写道:

Because binutils2.42 corrects the implementation of
".align [abs-expr,[abs-expr[,abs-expr]]]".
The macro ASM_OUTPUT_ALIGN_WITH_NOP in GCC uses this assembler directive,
and an error occurs. See link below for detailed description.
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645067.html

In order to solve the above problems, do the following operations:

1. Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP. (cherry pick r14-4674)
2. Check whether binutils supports the relax function. (cherry pick r14-4160)
3. Disable relaxation if the assembler don't support
   conditional branch relaxation. (cherry pick r14-5434)

PR112299 is also fixed here.

Lulu Cheng (2):
   LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.
   LoongArch: Check whether binutils supports the relax function. If
 supported, explicit relocs are turned off by default.

Xi Ruoyao (2):
   LoongArch: Disable relaxation if the assembler don't support
 conditional branch relaxation [PR112330]
   LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

  gcc/config.in | 18 +
  gcc/config/loongarch/genopts/loongarch.opt.in |  9 +++
  gcc/config/loongarch/gnu-user.h   |  4 +-
  gcc/config/loongarch/loongarch-opts.h | 12 
  gcc/config/loongarch/loongarch.h  | 22 +--
  gcc/config/loongarch/loongarch.opt|  9 +++
  gcc/configure | 66 +++
  gcc/configure.ac  | 14 
  gcc/doc/invoke.texi   | 24 ++-
  9 files changed, 169 insertions(+), 9 deletions(-)





[PATCH v1 4/4] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

2024-02-20 Thread Lulu Cheng
From: Xi Ruoyao 

Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
building a cross compiler if the cross assembler is not installed yet.

gcc/ChangeLog:

PR target/112299
* config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0
if not defined yet.

(cherry picked from commit 6bf2cebe2bf49919c78814cb447d3aa6e3550d89)
---
 gcc/config/loongarch/loongarch-opts.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/loongarch/loongarch-opts.h 
b/gcc/config/loongarch/loongarch-opts.h
index bdf79ecc193..b4115dd7f85 100644
--- a/gcc/config/loongarch/loongarch-opts.h
+++ b/gcc/config/loongarch/loongarch-opts.h
@@ -95,4 +95,8 @@ loongarch_config_target (struct loongarch_target *target,
 #define HAVE_AS_COND_BRANCH_RELAXATION 0
 #endif
 
+#ifndef HAVE_AS_TLS
+#define HAVE_AS_TLS 0
+#endif
+
 #endif /* LOONGARCH_OPTS_H */
-- 
2.39.3



[PATCH v1 3/4] LoongArch: Disable relaxation if the assembler don't support conditional branch relaxation [PR112330]

2024-02-20 Thread Lulu Cheng
From: Xi Ruoyao 

As the commit message of r14-4674 has indicated, if the assembler does
not support conditional branch relaxation, a relocation overflow may
happen on conditional branches when relaxation is enabled because the
number of NOP instructions inserted by the assembler will be more than
the number estimated by GCC.

To work around this issue, disable relaxation by default if the
assembler is detected incapable to perform conditional branch relaxation
at GCC build time.  We also need to pass -mno-relax to the assembler to
really disable relaxation.  But, if the assembler does not support
-mrelax option at all, we should not pass -mno-relax to the assembler or
it will immediately error out.  Also handle this with the build time
assembler capability probing, and add a pair of options
-m[no-]pass-mrelax-to-as to allow using a different assembler from the
build-time one.

With this change, if GCC is built with GAS 2.41, relaxation will be
disabled by default.  So the default value of -mexplicit-relocs= is also
changed to 'always' if -mno-relax is specified or implied by the
build-time default, because using assembler macros for symbol addresses
produces no benefit when relaxation is disabled.

gcc/ChangeLog:

PR target/112330
* config/loongarch/genopts/loongarch.opt.in: Add
-m[no]-pass-relax-to-as.  Change the default of -m[no]-relax to
account conditional branch relaxation support status.
* config/loongarch/loongarch.opt: Regenerate.
* configure.ac (gcc_cv_as_loongarch_cond_branch_relax): Check if
the assembler supports conditional branch relaxation.
* configure: Regenerate.
* config.in: Regenerate.  Note that there are some unrelated
changes introduced by r14-5424 (which does not contain a
config.in regeneration).
* config/loongarch/loongarch-opts.h
(HAVE_AS_COND_BRANCH_RELAXATION): Define to 0 if not defined.
* config/loongarch/loongarch-driver.h (ASM_MRELAX_DEFAULT):
Define.
(ASM_MRELAX_SPEC): Define.
(ASM_SPEC): Use ASM_MRELAX_SPEC instead of "%{mno-relax}".
* config/loongarch/loongarch.cc: Take the setting of
-m[no-]relax into account when determining the default of
-mexplicit-relocs=.
* doc/invoke.texi: Document -m[no-]relax and
-m[no-]pass-mrelax-to-as for LoongArch.  Update the default
value of -mexplicit-relocs=.

(cherry picked from commit fe23a2ff1f5072559552be0e41ab55bf72f5c79f)
---
 gcc/config.in |  6 
 gcc/config/loongarch/genopts/loongarch.opt.in |  6 +++-
 gcc/config/loongarch/loongarch-opts.h |  4 +++
 gcc/config/loongarch/loongarch.h  | 17 -
 gcc/config/loongarch/loongarch.opt|  6 +++-
 gcc/configure | 35 +++
 gcc/configure.ac  | 10 ++
 gcc/doc/invoke.texi   | 24 -
 8 files changed, 104 insertions(+), 4 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index f5b6287a96a..f3bdcb4cdda 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -367,6 +367,12 @@
 #endif
 
 
+/* Define if your assembler supports conditional branch relaxation. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_COND_BRANCH_RELAXATION
+#endif
+
+
 /* Define if your assembler supports the --debug-prefix-map option. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_DEBUG_PREFIX_MAP
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index edc2ed045d7..420a3941b3b 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -179,6 +179,10 @@ Target RejectNegative Joined Enum(cmodel) 
Var(la_opt_cmodel) Init(CMODEL_NORMAL)
 Specify the code model.
 
 mrelax
-Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
+Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION && 
HAVE_AS_COND_BRANCH_RELAXATION)
 Take advantage of linker relaxations to reduce the number of instructions
 required to materialize symbol addresses.
+
+mpass-mrelax-to-as
+Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION)
+Pass -mrelax or -mno-relax option to the assembler.
diff --git a/gcc/config/loongarch/loongarch-opts.h 
b/gcc/config/loongarch/loongarch-opts.h
index 60e682f57a0..bdf79ecc193 100644
--- a/gcc/config/loongarch/loongarch-opts.h
+++ b/gcc/config/loongarch/loongarch-opts.h
@@ -91,4 +91,8 @@ loongarch_config_target (struct loongarch_target *target,
 #define HAVE_AS_MRELAX_OPTION 0
 #endif
 
+#ifndef HAVE_AS_COND_BRANCH_RELAXATION
+#define HAVE_AS_COND_BRANCH_RELAXATION 0
+#endif
+
 #endif /* LOONGARCH_OPTS_H */
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8d08b84c8eb..28ab87eb660 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -69,8 +69,23 @@ along with GCC; see the 

[PATCH v1 2/4] LoongArch: Check whether binutils supports the relax function. If supported, explicit relocs are turned off by default.

2024-02-20 Thread Lulu Cheng
gcc/ChangeLog:

* config.in: Regenerate.
* config/loongarch/genopts/loongarch.opt.in: Add compilation option
mrelax. And set the initial value of explicit-relocs according to the
detection status.
* config/loongarch/gnu-user.h: When compiling with -mno-relax, pass the
--no-relax option to the linker.
* config/loongarch/loongarch-driver.h (ASM_SPEC): When compiling with
-mno-relax, pass the -mno-relax option to the assembler.
* config/loongarch/loongarch-opts.h (HAVE_AS_MRELAX_OPTION): Define 
macro.
* config/loongarch/loongarch.opt: Regenerate.
* configure: Regenerate.
* configure.ac: Add detection of support for binutils relax function.

(cherry picked from commint 9bab65a77049edcc7afc59532173206ee816e726)
---
 gcc/config.in | 12 +++
 gcc/config/loongarch/genopts/loongarch.opt.in |  5 +++
 gcc/config/loongarch/gnu-user.h   |  4 +--
 gcc/config/loongarch/loongarch-opts.h |  4 +++
 gcc/config/loongarch/loongarch.opt|  5 +++
 gcc/configure | 31 +++
 gcc/configure.ac  |  4 +++
 7 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index cc638759a40..f5b6287a96a 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -630,6 +630,12 @@
 #endif
 
 
+/* Define if your assembler supports -mrelax option. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_MRELAX_OPTION
+#endif
+
+
 /* Define if your assembler supports .mspabi_attribute. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_MSPABI_ATTRIBUTE
@@ -2214,6 +2220,12 @@
 #endif
 
 
+/* Define which stat syscall is able to handle 64bit indodes. */
+#ifndef USED_FOR_TARGET
+#undef HOST_STAT_FOR_64BIT_INODES
+#endif
+
+
 /* Define as const if the declaration of iconv() needs const. */
 #ifndef USED_FOR_TARGET
 #undef ICONV_CONST
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 61e7d72a0a1..edc2ed045d7 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -177,3 +177,8 @@ Enum(cmodel) String(@@STR_CMODEL_EXTREME@@) 
Value(CMODEL_EXTREME)
 mcmodel=
 Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) 
Init(CMODEL_NORMAL)
 Specify the code model.
+
+mrelax
+Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
+Take advantage of linker relaxations to reduce the number of instructions
+required to materialize symbol addresses.
diff --git a/gcc/config/loongarch/gnu-user.h b/gcc/config/loongarch/gnu-user.h
index f050078da52..28ac8b0e1f6 100644
--- a/gcc/config/loongarch/gnu-user.h
+++ b/gcc/config/loongarch/gnu-user.h
@@ -46,8 +46,8 @@ along with GCC; see the file COPYING3.  If not see
 #define GNU_USER_TARGET_LINK_SPEC \
   "%{G*} %{shared} -m " GNU_USER_LINK_EMULATION \
   "%{!shared: %{static} %{!static: %{rdynamic:-export-dynamic} " \
-  "-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}"
-
+  "-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}" \
+  "%{mno-relax: --no-relax}"
 
 /* Similar to standard Linux, but adding -ffast-math support.  */
 #undef GNU_USER_TARGET_MATHFILE_SPEC
diff --git a/gcc/config/loongarch/loongarch-opts.h 
b/gcc/config/loongarch/loongarch-opts.h
index eaa6fc07448..60e682f57a0 100644
--- a/gcc/config/loongarch/loongarch-opts.h
+++ b/gcc/config/loongarch/loongarch-opts.h
@@ -87,4 +87,8 @@ loongarch_config_target (struct loongarch_target *target,
while -m[no]-memcpy imposes a global constraint.  */
 #define TARGET_DO_OPTIMIZE_BLOCK_MOVE_P  loongarch_do_optimize_block_move_p()
 
+#ifndef HAVE_AS_MRELAX_OPTION
+#define HAVE_AS_MRELAX_OPTION 0
+#endif
+
 #endif /* LOONGARCH_OPTS_H */
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index 3ff0d860413..78b5e0cc452 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -184,3 +184,8 @@ Enum(cmodel) String(extreme) Value(CMODEL_EXTREME)
 mcmodel=
 Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) 
Init(CMODEL_NORMAL)
 Specify the code model.
+
+mrelax
+Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
+Take advantage of linker relaxations to reduce the number of instructions
+required to materialize symbol addresses.
diff --git a/gcc/configure b/gcc/configure
index b4907d258be..67cdd92a4f3 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -28871,6 +28871,37 @@ if test $gcc_cv_as_loongarch_dtprelword != yes; then
 $as_echo "#define HAVE_AS_DTPRELWORD 1" >>confdefs.h
 
 fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for -mrelax 
option" >&5
+$as_echo_n "checking assembler for -mrelax option... " >&6; }
+if ${gcc_cv_as_loongarch_relax+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_loongarch_relax=no
+  if test x$gcc_cv_as != x; then
+$as_echo '.text' > conftest.s
+if { 

[PATCH v1 1/4] LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.

2024-02-20 Thread Lulu Cheng
There are two reasons for removing this macro definition:
1. The default in the assembler is to use the nop instruction for filling.
2. For assembly directives: .align [abs-expr[, abs-expr[, abs-expr]]]
   The third expression it is the maximum number of bytes that should be
   skipped by this alignment directive.
   Therefore, it will affect the display of the specified alignment rules
   and affect the operating efficiency.

This modification relies on binutils commit 
1fb3cdd87ec61715a5684925fb6d6a6cf53bb97c.
(Since the assembler will add nop based on the .align information when doing 
relax,
it will cause the conditional branch to go out of bounds during the assembly 
process.
This submission of binutils solves this problem.)

gcc/ChangeLog:

* config/loongarch/loongarch.h (ASM_OUTPUT_ALIGN_WITH_NOP):
Delete.

Co-authored-by: Chenghua Xu 

(cherry picked from commit b20c7ee066cb7d952fa193972e8bc6362c6e4063)
---
 gcc/config/loongarch/loongarch.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index f34a7a604cc..8d08b84c8eb 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -978,11 +978,6 @@ typedef struct {
 
 #define ASM_OUTPUT_ALIGN(STREAM, LOG) fprintf (STREAM, "\t.align\t%d\n", (LOG))
 
-/* "nop" instruction 54525952 (andi $r0,$r0,0) is
-   used for padding.  */
-#define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
-  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
-
 /* This is how to output an assembler line to advance the location
counter by SIZE bytes.  */
 
-- 
2.39.3



[PATCH v1 0/4] Fix a series of problems caused by ASM_OUTPUT_ALIGN_WITH_NOP (release/gcc-12).

2024-02-20 Thread Lulu Cheng
Because binutils2.42 corrects the implementation of
".align [abs-expr,[abs-expr[,abs-expr]]]".
The macro ASM_OUTPUT_ALIGN_WITH_NOP in GCC uses this assembler directive,
and an error occurs. See link below for detailed description.
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645067.html

In order to solve the above problems, do the following operations:

1. Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP. (cherry pick r14-4674)
2. Check whether binutils supports the relax function. (cherry pick r14-4160)
3. Disable relaxation if the assembler don't support
  conditional branch relaxation. (cherry pick r14-5434)

PR112299 is also fixed here.

Lulu Cheng (2):
  LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.
  LoongArch: Check whether binutils supports the relax function. If
supported, explicit relocs are turned off by default.

Xi Ruoyao (2):
  LoongArch: Disable relaxation if the assembler don't support
conditional branch relaxation [PR112330]
  LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

 gcc/config.in | 18 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  9 +++
 gcc/config/loongarch/gnu-user.h   |  4 +-
 gcc/config/loongarch/loongarch-opts.h | 12 
 gcc/config/loongarch/loongarch.h  | 22 +--
 gcc/config/loongarch/loongarch.opt|  9 +++
 gcc/configure | 66 +++
 gcc/configure.ac  | 14 
 gcc/doc/invoke.texi   | 24 ++-
 9 files changed, 169 insertions(+), 9 deletions(-)

-- 
2.39.3



[PATCH v1 3/4] LoongArch: Disable relaxation if the assembler don't support conditional branch relaxation [PR112330]

2024-02-20 Thread Lulu Cheng
From: Xi Ruoyao 

As the commit message of r14-4674 has indicated, if the assembler does
not support conditional branch relaxation, a relocation overflow may
happen on conditional branches when relaxation is enabled because the
number of NOP instructions inserted by the assembler will be more than
the number estimated by GCC.

To work around this issue, disable relaxation by default if the
assembler is detected incapable to perform conditional branch relaxation
at GCC build time.  We also need to pass -mno-relax to the assembler to
really disable relaxation.  But, if the assembler does not support
-mrelax option at all, we should not pass -mno-relax to the assembler or
it will immediately error out.  Also handle this with the build time
assembler capability probing, and add a pair of options
-m[no-]pass-mrelax-to-as to allow using a different assembler from the
build-time one.

With this change, if GCC is built with GAS 2.41, relaxation will be
disabled by default.  So the default value of -mexplicit-relocs= is also
changed to 'always' if -mno-relax is specified or implied by the
build-time default, because using assembler macros for symbol addresses
produces no benefit when relaxation is disabled.

gcc/ChangeLog:

PR target/112330
* config/loongarch/genopts/loongarch.opt.in: Add
-m[no]-pass-relax-to-as.  Change the default of -m[no]-relax to
account conditional branch relaxation support status.
* config/loongarch/loongarch.opt: Regenerate.
* configure.ac (gcc_cv_as_loongarch_cond_branch_relax): Check if
the assembler supports conditional branch relaxation.
* configure: Regenerate.
* config.in: Regenerate.  Note that there are some unrelated
changes introduced by r14-5424 (which does not contain a
config.in regeneration).
* config/loongarch/loongarch-opts.h
(HAVE_AS_COND_BRANCH_RELAXATION): Define to 0 if not defined.
* config/loongarch/loongarch-driver.h (ASM_MRELAX_DEFAULT):
Define.
(ASM_MRELAX_SPEC): Define.
(ASM_SPEC): Use ASM_MRELAX_SPEC instead of "%{mno-relax}".
* config/loongarch/loongarch.cc: Take the setting of
-m[no-]relax into account when determining the default of
-mexplicit-relocs=.
* doc/invoke.texi: Document -m[no-]relax and
-m[no-]pass-mrelax-to-as for LoongArch.  Update the default
value of -mexplicit-relocs=.

(cherry picked from commit fe23a2ff1f5072559552be0e41ab55bf72f5c79f)
---
 gcc/config.in |  6 
 gcc/config/loongarch/genopts/loongarch.opt.in |  6 +++-
 gcc/config/loongarch/loongarch-opts.h |  4 +++
 gcc/config/loongarch/loongarch.h  | 17 -
 gcc/config/loongarch/loongarch.opt|  6 +++-
 gcc/configure | 35 +++
 gcc/configure.ac  | 10 ++
 gcc/doc/invoke.texi   | 24 -
 8 files changed, 104 insertions(+), 4 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index f5b6287a96a..f3bdcb4cdda 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -367,6 +367,12 @@
 #endif
 
 
+/* Define if your assembler supports conditional branch relaxation. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_COND_BRANCH_RELAXATION
+#endif
+
+
 /* Define if your assembler supports the --debug-prefix-map option. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_DEBUG_PREFIX_MAP
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index edc2ed045d7..420a3941b3b 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -179,6 +179,10 @@ Target RejectNegative Joined Enum(cmodel) 
Var(la_opt_cmodel) Init(CMODEL_NORMAL)
 Specify the code model.
 
 mrelax
-Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
+Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION && 
HAVE_AS_COND_BRANCH_RELAXATION)
 Take advantage of linker relaxations to reduce the number of instructions
 required to materialize symbol addresses.
+
+mpass-mrelax-to-as
+Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION)
+Pass -mrelax or -mno-relax option to the assembler.
diff --git a/gcc/config/loongarch/loongarch-opts.h 
b/gcc/config/loongarch/loongarch-opts.h
index 60e682f57a0..bdf79ecc193 100644
--- a/gcc/config/loongarch/loongarch-opts.h
+++ b/gcc/config/loongarch/loongarch-opts.h
@@ -91,4 +91,8 @@ loongarch_config_target (struct loongarch_target *target,
 #define HAVE_AS_MRELAX_OPTION 0
 #endif
 
+#ifndef HAVE_AS_COND_BRANCH_RELAXATION
+#define HAVE_AS_COND_BRANCH_RELAXATION 0
+#endif
+
 #endif /* LOONGARCH_OPTS_H */
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8d08b84c8eb..28ab87eb660 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -69,8 +69,23 @@ along with GCC; see the 

[PATCH v1 2/4] LoongArch: Check whether binutils supports the relax function. If supported, explicit relocs are turned off by default.

2024-02-20 Thread Lulu Cheng
gcc/ChangeLog:

* config.in: Regenerate.
* config/loongarch/genopts/loongarch.opt.in: Add compilation option
mrelax. And set the initial value of explicit-relocs according to the
detection status.
* config/loongarch/gnu-user.h: When compiling with -mno-relax, pass the
--no-relax option to the linker.
* config/loongarch/loongarch-driver.h (ASM_SPEC): When compiling with
-mno-relax, pass the -mno-relax option to the assembler.
* config/loongarch/loongarch-opts.h (HAVE_AS_MRELAX_OPTION): Define 
macro.
* config/loongarch/loongarch.opt: Regenerate.
* configure: Regenerate.
* configure.ac: Add detection of support for binutils relax function.

(cherry picked from commint 9bab65a77049edcc7afc59532173206ee816e726)
---
 gcc/config.in | 12 +++
 gcc/config/loongarch/genopts/loongarch.opt.in |  5 +++
 gcc/config/loongarch/gnu-user.h   |  4 +--
 gcc/config/loongarch/loongarch-opts.h |  4 +++
 gcc/config/loongarch/loongarch.opt|  5 +++
 gcc/configure | 31 +++
 gcc/configure.ac  |  4 +++
 7 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index cc638759a40..f5b6287a96a 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -630,6 +630,12 @@
 #endif
 
 
+/* Define if your assembler supports -mrelax option. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_MRELAX_OPTION
+#endif
+
+
 /* Define if your assembler supports .mspabi_attribute. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_MSPABI_ATTRIBUTE
@@ -2214,6 +2220,12 @@
 #endif
 
 
+/* Define which stat syscall is able to handle 64bit indodes. */
+#ifndef USED_FOR_TARGET
+#undef HOST_STAT_FOR_64BIT_INODES
+#endif
+
+
 /* Define as const if the declaration of iconv() needs const. */
 #ifndef USED_FOR_TARGET
 #undef ICONV_CONST
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 61e7d72a0a1..edc2ed045d7 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -177,3 +177,8 @@ Enum(cmodel) String(@@STR_CMODEL_EXTREME@@) 
Value(CMODEL_EXTREME)
 mcmodel=
 Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) 
Init(CMODEL_NORMAL)
 Specify the code model.
+
+mrelax
+Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
+Take advantage of linker relaxations to reduce the number of instructions
+required to materialize symbol addresses.
diff --git a/gcc/config/loongarch/gnu-user.h b/gcc/config/loongarch/gnu-user.h
index f050078da52..28ac8b0e1f6 100644
--- a/gcc/config/loongarch/gnu-user.h
+++ b/gcc/config/loongarch/gnu-user.h
@@ -46,8 +46,8 @@ along with GCC; see the file COPYING3.  If not see
 #define GNU_USER_TARGET_LINK_SPEC \
   "%{G*} %{shared} -m " GNU_USER_LINK_EMULATION \
   "%{!shared: %{static} %{!static: %{rdynamic:-export-dynamic} " \
-  "-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}"
-
+  "-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}" \
+  "%{mno-relax: --no-relax}"
 
 /* Similar to standard Linux, but adding -ffast-math support.  */
 #undef GNU_USER_TARGET_MATHFILE_SPEC
diff --git a/gcc/config/loongarch/loongarch-opts.h 
b/gcc/config/loongarch/loongarch-opts.h
index eaa6fc07448..60e682f57a0 100644
--- a/gcc/config/loongarch/loongarch-opts.h
+++ b/gcc/config/loongarch/loongarch-opts.h
@@ -87,4 +87,8 @@ loongarch_config_target (struct loongarch_target *target,
while -m[no]-memcpy imposes a global constraint.  */
 #define TARGET_DO_OPTIMIZE_BLOCK_MOVE_P  loongarch_do_optimize_block_move_p()
 
+#ifndef HAVE_AS_MRELAX_OPTION
+#define HAVE_AS_MRELAX_OPTION 0
+#endif
+
 #endif /* LOONGARCH_OPTS_H */
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index 3ff0d860413..78b5e0cc452 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -184,3 +184,8 @@ Enum(cmodel) String(extreme) Value(CMODEL_EXTREME)
 mcmodel=
 Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) 
Init(CMODEL_NORMAL)
 Specify the code model.
+
+mrelax
+Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
+Take advantage of linker relaxations to reduce the number of instructions
+required to materialize symbol addresses.
diff --git a/gcc/configure b/gcc/configure
index b4907d258be..67cdd92a4f3 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -28871,6 +28871,37 @@ if test $gcc_cv_as_loongarch_dtprelword != yes; then
 $as_echo "#define HAVE_AS_DTPRELWORD 1" >>confdefs.h
 
 fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for -mrelax 
option" >&5
+$as_echo_n "checking assembler for -mrelax option... " >&6; }
+if ${gcc_cv_as_loongarch_relax+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_loongarch_relax=no
+  if test x$gcc_cv_as != x; then
+$as_echo '.text' > conftest.s
+if { 

[PATCH v1 4/4] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

2024-02-20 Thread Lulu Cheng
From: Xi Ruoyao 

Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
building a cross compiler if the cross assembler is not installed yet.

gcc/ChangeLog:

PR target/112299
* config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0
if not defined yet.

(cherry picked from commit 6bf2cebe2bf49919c78814cb447d3aa6e3550d89)
---
 gcc/config/loongarch/loongarch-opts.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/loongarch/loongarch-opts.h 
b/gcc/config/loongarch/loongarch-opts.h
index bdf79ecc193..b4115dd7f85 100644
--- a/gcc/config/loongarch/loongarch-opts.h
+++ b/gcc/config/loongarch/loongarch-opts.h
@@ -95,4 +95,8 @@ loongarch_config_target (struct loongarch_target *target,
 #define HAVE_AS_COND_BRANCH_RELAXATION 0
 #endif
 
+#ifndef HAVE_AS_TLS
+#define HAVE_AS_TLS 0
+#endif
+
 #endif /* LOONGARCH_OPTS_H */
-- 
2.39.3



[PATCH v1 1/4] LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.

2024-02-20 Thread Lulu Cheng
There are two reasons for removing this macro definition:
1. The default in the assembler is to use the nop instruction for filling.
2. For assembly directives: .align [abs-expr[, abs-expr[, abs-expr]]]
   The third expression it is the maximum number of bytes that should be
   skipped by this alignment directive.
   Therefore, it will affect the display of the specified alignment rules
   and affect the operating efficiency.

This modification relies on binutils commit 
1fb3cdd87ec61715a5684925fb6d6a6cf53bb97c.
(Since the assembler will add nop based on the .align information when doing 
relax,
it will cause the conditional branch to go out of bounds during the assembly 
process.
This submission of binutils solves this problem.)

gcc/ChangeLog:

* config/loongarch/loongarch.h (ASM_OUTPUT_ALIGN_WITH_NOP):
Delete.

Co-authored-by: Chenghua Xu 

(cherry picked from commit b20c7ee066cb7d952fa193972e8bc6362c6e4063)
---
 gcc/config/loongarch/loongarch.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index f34a7a604cc..8d08b84c8eb 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -978,11 +978,6 @@ typedef struct {
 
 #define ASM_OUTPUT_ALIGN(STREAM, LOG) fprintf (STREAM, "\t.align\t%d\n", (LOG))
 
-/* "nop" instruction 54525952 (andi $r0,$r0,0) is
-   used for padding.  */
-#define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
-  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
-
 /* This is how to output an assembler line to advance the location
counter by SIZE bytes.  */
 
-- 
2.39.3



[PATCH v1 0/4] Fix a series of problems caused by

2024-02-20 Thread Lulu Cheng
Because binutils2.42 corrects the implementation of
".align [abs-expr,[abs-expr[,abs-expr]]]".
The macro ASM_OUTPUT_ALIGN_WITH_NOP in GCC uses this assembler directive,
and an error occurs. See link below for detailed description.
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645067.html

In order to solve the above problems, do the following operations:

1. Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP. (cherry pick r14-4674)
2. Check whether binutils supports the relax function. (cherry pick r14-4160)
3. Disable relaxation if the assembler don't support
  conditional branch relaxation. (cherry pick r14-5434)

PR112299 is also fixed here.

Lulu Cheng (2):
  LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.
  LoongArch: Check whether binutils supports the relax function. If
supported, explicit relocs are turned off by default.

Xi Ruoyao (2):
  LoongArch: Disable relaxation if the assembler don't support
conditional branch relaxation [PR112330]
  LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

 gcc/config.in | 18 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  9 +++
 gcc/config/loongarch/gnu-user.h   |  4 +-
 gcc/config/loongarch/loongarch-opts.h | 12 
 gcc/config/loongarch/loongarch.h  | 22 +--
 gcc/config/loongarch/loongarch.opt|  9 +++
 gcc/configure | 66 +++
 gcc/configure.ac  | 14 
 gcc/doc/invoke.texi   | 24 ++-
 9 files changed, 169 insertions(+), 9 deletions(-)

-- 
2.39.3



[PATCH] c++/c-common: Fix convert_vector_to_array_for_subscript for qualified vector types [PR89224]

2024-02-20 Thread Andrew Pinski
After r7-987-gf17a223de829cb, the access for the elements of a vector type 
would lose the qualifiers.
So if we had `constvector[0]`, the type of the element of the array would not 
have const on it.
This was due to a missing build_qualified_type for the inner type of the vector 
when building the array type.
We need to add back the call to build_qualified_type and now the access has the 
correct qualifiers. So the
overloads and even if it is a lvalue or rvalue is correctly done.

Note we correctly now reject the testcase gcc.dg/pr83415.c which was 
incorrectly accepted after r7-987-gf17a223de829cb.

Built and tested for aarch64-linux-gnu.

PR c++/89224

gcc/c-family/ChangeLog:

* c-common.cc (convert_vector_to_array_for_subscript): Call 
build_qualified_type
for the inner type.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Compare main variants
for the vector/array types instead of the types directly.

gcc/testsuite/ChangeLog:

* g++.dg/torture/vector-subaccess-1.C: New test.
* gcc.dg/pr83415.c: Change warning to error.

Signed-off-by: Andrew Pinski 
---
 gcc/c-family/c-common.cc  |  7 +-
 gcc/cp/constexpr.cc   |  3 ++-
 .../g++.dg/torture/vector-subaccess-1.C   | 23 +++
 gcc/testsuite/gcc.dg/pr83415.c|  2 +-
 4 files changed, 32 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/vector-subaccess-1.C

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index e15eff698df..884dd9043f9 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -8936,6 +8936,7 @@ convert_vector_to_array_for_subscript (location_t loc,
   if (gnu_vector_type_p (TREE_TYPE (*vecp)))
 {
   tree type = TREE_TYPE (*vecp);
+  tree newitype;
 
   ret = !lvalue_p (*vecp);
 
@@ -8950,8 +8951,12 @@ convert_vector_to_array_for_subscript (location_t loc,
 for function parameters.  */
   c_common_mark_addressable_vec (*vecp);
 
+  /* Make sure qualifiers are copied from the vector type to the new 
element
+of the array type.  */
+  newitype = build_qualified_type (TREE_TYPE (type), TYPE_QUALS (type));
+
   *vecp = build1 (VIEW_CONVERT_EXPR,
- build_array_type_nelts (TREE_TYPE (type),
+ build_array_type_nelts (newitype,
  TYPE_VECTOR_SUBPARTS (type)),
  *vecp);
 }
diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index fa346fe01c9..1fe91d16e8e 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4421,7 +4421,8 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
   if (!lval
   && TREE_CODE (ary) == VIEW_CONVERT_EXPR
   && VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (ary, 0)))
-  && TREE_TYPE (t) == TREE_TYPE (TREE_TYPE (TREE_OPERAND (ary, 0
+  && TYPE_MAIN_VARIANT (TREE_TYPE (t))
+ == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (ary, 0)
 ary = TREE_OPERAND (ary, 0);
 
   tree oldidx = TREE_OPERAND (t, 1);
diff --git a/gcc/testsuite/g++.dg/torture/vector-subaccess-1.C 
b/gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
new file mode 100644
index 000..0c8958a4e03
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
@@ -0,0 +1,23 @@
+/* PR c++/89224 */
+
+/* The access of `vector[i]` has the same qualifiers as the original
+   vector which was missing. */
+
+typedef __attribute__((vector_size(16))) unsigned char  Int8x8_t;
+
+template 
+void g(T ) {
+__builtin_abort();
+}
+template 
+void g(const T ) {
+  __builtin_exit(0);
+}
+void f(const Int8x8_t x) {
+  g(x[0]);
+}
+int main(void)
+{
+Int8x8_t x ={};
+f(x);
+}
diff --git a/gcc/testsuite/gcc.dg/pr83415.c b/gcc/testsuite/gcc.dg/pr83415.c
index 5934c16d97c..2fc85031505 100644
--- a/gcc/testsuite/gcc.dg/pr83415.c
+++ b/gcc/testsuite/gcc.dg/pr83415.c
@@ -7,6 +7,6 @@ int
 main (int argc, short *argv[])
 {
   int i = argc;
-  y[i] = 7 - i; /* { dg-warning "read-only" } */
+  y[i] = 7 - i; /* { dg-error "read-only" } */
   return 0;
 }
-- 
2.43.0



[PATCH v1 2/4] LoongArch: Check whether binutils supports the relax function. If supported, explicit relocs are turned off by default.

2024-02-20 Thread Lulu Cheng
gcc/ChangeLog:

* config.in: Regenerate.
* config/loongarch/genopts/loongarch.opt.in: Add compilation option
mrelax. And set the initial value of explicit-relocs according to the
detection status.
* config/loongarch/gnu-user.h: When compiling with -mno-relax, pass the
--no-relax option to the linker.
* config/loongarch/loongarch-driver.h (ASM_SPEC): When compiling with
-mno-relax, pass the -mno-relax option to the assembler.
* config/loongarch/loongarch-opts.h (HAVE_AS_MRELAX_OPTION): Define 
macro.
* config/loongarch/loongarch.opt: Regenerate.
* configure: Regenerate.
* configure.ac: Add detection of support for binutils relax function.

(cherry picked from commint 9bab65a77049edcc7afc59532173206ee816e726)
---
 gcc/config.in |  6 
 gcc/config/loongarch/genopts/loongarch.opt.in |  7 -
 gcc/config/loongarch/gnu-user.h   |  3 +-
 gcc/config/loongarch/loongarch-opts.h |  4 +++
 gcc/config/loongarch/loongarch.opt|  7 -
 gcc/configure | 31 +++
 gcc/configure.ac  |  4 +++
 7 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index ef35af16f2f..36a74dd5974 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -636,6 +636,12 @@
 #endif
 
 
+/* Define if your assembler supports -mrelax option. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_MRELAX_OPTION
+#endif
+
+
 /* Define if your assembler supports .mspabi_attribute. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_MSPABI_ATTRIBUTE
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 4b9b4ac273e..e7c32e61a50 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -155,7 +155,7 @@ Target Joined RejectNegative UInteger 
Var(loongarch_max_inline_memcpy_size) Init
 -mmax-inline-memcpy-size=SIZE  Set the max size of memcpy to inline, default 
is 1024.
 
 mexplicit-relocs
-Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS)
+Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & 
!HAVE_AS_MRELAX_OPTION)
 Use %reloc() assembly operators.
 
 ; The code model option names for -mcmodel.
@@ -188,3 +188,8 @@ Specify the code model.
 mdirect-extern-access
 Target Var(TARGET_DIRECT_EXTERN_ACCESS) Init(0)
 Avoid using the GOT to access external symbols.
+
+mrelax
+Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
+Take advantage of linker relaxations to reduce the number of instructions
+required to materialize symbol addresses.
diff --git a/gcc/config/loongarch/gnu-user.h b/gcc/config/loongarch/gnu-user.h
index 5f1bd60ada3..e9f4bcef1d4 100644
--- a/gcc/config/loongarch/gnu-user.h
+++ b/gcc/config/loongarch/gnu-user.h
@@ -48,7 +48,8 @@ along with GCC; see the file COPYING3.  If not see
   "%{!shared: %{static} " \
   "%{!static: %{!static-pie: %{rdynamic:-export-dynamic} " \
   "-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}} " \
-  "%{static-pie: -static -pie --no-dynamic-linker -z text}}"
+  "%{static-pie: -static -pie --no-dynamic-linker -z text}}" \
+  "%{mno-relax: --no-relax}"
 
 
 /* Similar to standard Linux, but adding -ffast-math support.  */
diff --git a/gcc/config/loongarch/loongarch-opts.h 
b/gcc/config/loongarch/loongarch-opts.h
index b1ff54426e4..7ea02f4978c 100644
--- a/gcc/config/loongarch/loongarch-opts.h
+++ b/gcc/config/loongarch/loongarch-opts.h
@@ -92,4 +92,8 @@ loongarch_config_target (struct loongarch_target *target,
 #define HAVE_AS_EXPLICIT_RELOCS 0
 #endif
 
+#ifndef HAVE_AS_MRELAX_OPTION
+#define HAVE_AS_MRELAX_OPTION 0
+#endif
+
 #endif /* LOONGARCH_OPTS_H */
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index 68018ade73f..e37ed9015de 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -162,7 +162,7 @@ Target Joined RejectNegative UInteger 
Var(loongarch_max_inline_memcpy_size) Init
 -mmax-inline-memcpy-size=SIZE  Set the max size of memcpy to inline, default 
is 1024.
 
 mexplicit-relocs
-Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS)
+Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & 
!HAVE_AS_MRELAX_OPTION)
 Use %reloc() assembly operators.
 
 ; The code model option names for -mcmodel.
@@ -195,3 +195,8 @@ Specify the code model.
 mdirect-extern-access
 Target Var(TARGET_DIRECT_EXTERN_ACCESS) Init(0)
 Avoid using the GOT to access external symbols.
+
+mrelax
+Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
+Take advantage of linker relaxations to reduce the number of instructions
+required to materialize symbol addresses.
diff --git a/gcc/configure b/gcc/configure
index dec2eca1a45..760bea9d4a0 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -29075,6 +29075,37 @@ if test 

[PATCH v1 3/4] LoongArch: Disable relaxation if the assembler don't support conditional branch relaxation [PR112330]

2024-02-20 Thread Lulu Cheng
From: Xi Ruoyao 

As the commit message of r14-4674 has indicated, if the assembler does
not support conditional branch relaxation, a relocation overflow may
happen on conditional branches when relaxation is enabled because the
number of NOP instructions inserted by the assembler will be more than
the number estimated by GCC.

To work around this issue, disable relaxation by default if the
assembler is detected incapable to perform conditional branch relaxation
at GCC build time.  We also need to pass -mno-relax to the assembler to
really disable relaxation.  But, if the assembler does not support
-mrelax option at all, we should not pass -mno-relax to the assembler or
it will immediately error out.  Also handle this with the build time
assembler capability probing, and add a pair of options
-m[no-]pass-mrelax-to-as to allow using a different assembler from the
build-time one.

With this change, if GCC is built with GAS 2.41, relaxation will be
disabled by default.  So the default value of -mexplicit-relocs= is also
changed to 'always' if -mno-relax is specified or implied by the
build-time default, because using assembler macros for symbol addresses
produces no benefit when relaxation is disabled.

gcc/ChangeLog:

PR target/112330
* config/loongarch/genopts/loongarch.opt.in: Add
-m[no]-pass-relax-to-as.  Change the default of -m[no]-relax to
account conditional branch relaxation support status.
* config/loongarch/loongarch.opt: Regenerate.
* configure.ac (gcc_cv_as_loongarch_cond_branch_relax): Check if
the assembler supports conditional branch relaxation.
* configure: Regenerate.
* config.in: Regenerate.  Note that there are some unrelated
changes introduced by r14-5424 (which does not contain a
config.in regeneration).
* config/loongarch/loongarch-opts.h
(HAVE_AS_COND_BRANCH_RELAXATION): Define to 0 if not defined.
* config/loongarch/loongarch-driver.h (ASM_MRELAX_DEFAULT):
Define.
(ASM_MRELAX_SPEC): Define.
(ASM_SPEC): Use ASM_MRELAX_SPEC instead of "%{mno-relax}".
* config/loongarch/loongarch.cc: Take the setting of
-m[no-]relax into account when determining the default of
-mexplicit-relocs=.
* doc/invoke.texi: Document -m[no-]relax and
-m[no-]pass-mrelax-to-as for LoongArch.  Update the default
value of -mexplicit-relocs=.

(cherry picked from commit fe23a2ff1f5072559552be0e41ab55bf72f5c79f)
---
 gcc/config.in |  6 
 gcc/config/loongarch/genopts/loongarch.opt.in |  6 +++-
 gcc/config/loongarch/loongarch-opts.h |  4 +++
 gcc/config/loongarch/loongarch.h  | 17 -
 gcc/config/loongarch/loongarch.opt|  6 +++-
 gcc/configure | 35 +++
 gcc/configure.ac  | 10 ++
 gcc/doc/invoke.texi   | 24 -
 8 files changed, 104 insertions(+), 4 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index 36a74dd5974..83c98ae1457 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -373,6 +373,12 @@
 #endif
 
 
+/* Define if your assembler supports conditional branch relaxation. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_COND_BRANCH_RELAXATION
+#endif
+
+
 /* Define if your assembler supports the --debug-prefix-map option. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_DEBUG_PREFIX_MAP
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index e7c32e61a50..da6fedd153e 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -190,6 +190,10 @@ Target Var(TARGET_DIRECT_EXTERN_ACCESS) Init(0)
 Avoid using the GOT to access external symbols.
 
 mrelax
-Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
+Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION && 
HAVE_AS_COND_BRANCH_RELAXATION)
 Take advantage of linker relaxations to reduce the number of instructions
 required to materialize symbol addresses.
+
+mpass-mrelax-to-as
+Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION)
+Pass -mrelax or -mno-relax option to the assembler.
diff --git a/gcc/config/loongarch/loongarch-opts.h 
b/gcc/config/loongarch/loongarch-opts.h
index 7ea02f4978c..edd41c82b17 100644
--- a/gcc/config/loongarch/loongarch-opts.h
+++ b/gcc/config/loongarch/loongarch-opts.h
@@ -96,4 +96,8 @@ loongarch_config_target (struct loongarch_target *target,
 #define HAVE_AS_MRELAX_OPTION 0
 #endif
 
+#ifndef HAVE_AS_COND_BRANCH_RELAXATION
+#define HAVE_AS_COND_BRANCH_RELAXATION 0
+#endif
+
 #endif /* LOONGARCH_OPTS_H */
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index cc719d0c796..d072522e3cf 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -69,8 +69,23 @@ along with GCC; see the file 

[PATCH v1 0/4] Fix a series of problems caused by

2024-02-20 Thread Lulu Cheng
Because binutils2.42 corrects the implementation of
".align [abs-expr,[abs-expr[,abs-expr]]]".
The macro ASM_OUTPUT_ALIGN_WITH_NOP in GCC uses this assembler directive,
and an error occurs. See link below for detailed description.
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645067.html

In order to solve the above problems, do the following operations:

1. Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP. (cherry pick r14-4674)
2. Check whether binutils supports the relax function. (cherry pick r14-4160)
3. Disable relaxation if the assembler don't support
  conditional branch relaxation. (cherry pick r14-5434)

PR112299 is also fixed here.

Lulu Cheng (2):
  LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.
  LoongArch: Check whether binutils supports the relax function. If
supported, explicit relocs are turned off by default.

Xi Ruoyao (2):
  LoongArch: Disable relaxation if the assembler don't support
conditional branch relaxation [PR112330]
  LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

 gcc/config.in | 12 
 gcc/config/loongarch/genopts/loongarch.opt.in | 11 +++-
 gcc/config/loongarch/gnu-user.h   |  3 +-
 gcc/config/loongarch/loongarch-opts.h | 12 
 gcc/config/loongarch/loongarch.h  | 22 +--
 gcc/config/loongarch/loongarch.opt| 11 +++-
 gcc/configure | 66 +++
 gcc/configure.ac  | 14 
 gcc/doc/invoke.texi   | 24 ++-
 9 files changed, 165 insertions(+), 10 deletions(-)

-- 
2.39.3



[PATCH v1 1/4] LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.

2024-02-20 Thread Lulu Cheng
There are two reasons for removing this macro definition:
1. The default in the assembler is to use the nop instruction for filling.
2. For assembly directives: .align [abs-expr[, abs-expr[, abs-expr]]]
   The third expression it is the maximum number of bytes that should be
   skipped by this alignment directive.
   Therefore, it will affect the display of the specified alignment rules
   and affect the operating efficiency.

This modification relies on binutils commit 
1fb3cdd87ec61715a5684925fb6d6a6cf53bb97c.
(Since the assembler will add nop based on the .align information when doing 
relax,
it will cause the conditional branch to go out of bounds during the assembly 
process.
This submission of binutils solves this problem.)

gcc/ChangeLog:

* config/loongarch/loongarch.h (ASM_OUTPUT_ALIGN_WITH_NOP):
Delete.

Co-authored-by: Chenghua Xu 

(cherry picked from commit b20c7ee066cb7d952fa193972e8bc6362c6e4063)
---
 gcc/config/loongarch/loongarch.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index f0db67f8c7b..cc719d0c796 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -982,11 +982,6 @@ typedef struct {
 
 #define ASM_OUTPUT_ALIGN(STREAM, LOG) fprintf (STREAM, "\t.align\t%d\n", (LOG))
 
-/* "nop" instruction 54525952 (andi $r0,$r0,0) is
-   used for padding.  */
-#define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
-  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
-
 /* This is how to output an assembler line to advance the location
counter by SIZE bytes.  */
 
-- 
2.39.3



[PATCH v1 4/4] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

2024-02-20 Thread Lulu Cheng
From: Xi Ruoyao 

Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
building a cross compiler if the cross assembler is not installed yet.

gcc/ChangeLog:

PR target/112299
* config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0
if not defined yet.

(cherry picked from commit 6bf2cebe2bf49919c78814cb447d3aa6e3550d89)
---
 gcc/config/loongarch/loongarch-opts.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/loongarch/loongarch-opts.h 
b/gcc/config/loongarch/loongarch-opts.h
index edd41c82b17..02184e2991a 100644
--- a/gcc/config/loongarch/loongarch-opts.h
+++ b/gcc/config/loongarch/loongarch-opts.h
@@ -100,4 +100,8 @@ loongarch_config_target (struct loongarch_target *target,
 #define HAVE_AS_COND_BRANCH_RELAXATION 0
 #endif
 
+#ifndef HAVE_AS_TLS
+#define HAVE_AS_TLS 0
+#endif
+
 #endif /* LOONGARCH_OPTS_H */
-- 
2.39.3



Re: [PATCH] rs6000: Neuter option -mpower{8,9}-vector [PR109987]

2024-02-20 Thread Peter Bergner
On 2/20/24 3:27 AM, Kewen.Lin wrote:
> on 2024/2/20 02:45, Segher Boessenkool wrote:
>> On Tue, Jan 16, 2024 at 10:50:01AM +0800, Kewen.Lin wrote:
>>> it consists of some aspects:
>>>   - effective target powerpc_p{8,9}vector_ok are removed
>>> and replaced with powerpc_vsx_ok.
>>
>> So all such testcases already arrange to have p8 or p9 some other way?

Shouldn't that be replaced with powerpc_vsx instead of powerpc_vsx_ok?
That way we know VSX code gen is enabled for the options being used,
even those in RUNTESTFLAGS.

I thought we agreed that powerpc_vsx_ok was almost always useless and
we always want to use powerpc_vsx.  ...or did I miss that we removed
the old powerpc_vsx_ok and renamed powerpc_vsx to powerpc_vsx_ok?



>>>   - Some test cases are updated with explicit -mvsx.
>>>   - Some test cases with those two option mixed are adjusted
>>> to keep the test points, like -mpower8-vector
>>> -mno-power9-vector are updated with -mdejagnu-cpu=power8
>>> -mvsx etc.
>>
>> -mcpu=power8 implies -mvsx already.

Then we can omit the explicit -msx option, correct?  Ie, if the
user forces -mno-vsx in RUNTESTFLAGS, then we'll just skip the
test case as UNSUPPORTED rather than trying to compile some
vsx test case with vsx disabled via the options.



Peter


Re: [PATCH V2] RISC-V: Specify mtune and march for PR113742

2024-02-20 Thread Kito Cheng
LGTM, thanks for fixing that issue :)

On Wed, Feb 21, 2024 at 6:03 AM Edwin Lu  wrote:
>
> The testcase pr113742.c is failing for 32 bit targets due to the following cc1
> error:
> cc1: error: ABI requries '-march=rv64'
>
> Specify '-march=rv64gc' with '-mtune=sifive-p600-series'
>
> V1: https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645609.html
>
> PR target/113742
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr113742.c: change mcpu to mtune and add march
>
> Signed-off-by: Edwin Lu 
> ---
> V1: use require-effective-target
> V2: switch to specifying march and mtune
> ---
>  gcc/testsuite/gcc.target/riscv/pr113742.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr113742.c 
> b/gcc/testsuite/gcc.target/riscv/pr113742.c
> index ab8934c2a8a..573afd6f0ad 100644
> --- a/gcc/testsuite/gcc.target/riscv/pr113742.c
> +++ b/gcc/testsuite/gcc.target/riscv/pr113742.c
> @@ -1,4 +1,4 @@
> -//* { dg-do compile } */
> -/* { dg-options "-O2 -finstrument-functions -mabi=lp64d -mcpu=sifive-p670" } 
> */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -finstrument-functions -march=rv64gc -mabi=lp64d 
> -mtune=sifive-p600-series" } */
>
>  void foo(void) {}
> --
> 2.34.1
>


[pushed] analyzer: handle array-initialization from a string_cst [PR113999]

2024-02-20 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-9091-g0a6a5f8656ccf9.

gcc/analyzer/ChangeLog:
PR analyzer/113999
* analyzer.h (get_string_cst_size): New decl.
* region-model-manager.cc (get_string_cst_size): New.
(region_model_manager::maybe_get_char_from_string_cst): Treat
single-byte accesses within string_cst but beyond
TREE_STRING_LENGTH as being 0.
* region-model.cc (string_cst_has_null_terminator): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/113999
* c-c++-common/analyzer/strlen-pr113999.c: New test.
* gcc.dg/analyzer/strlen-1.c: More test coverage.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.h   |  3 ++
 gcc/analyzer/region-model-manager.cc  | 35 +++--
 gcc/analyzer/region-model.cc  | 17 +-
 .../c-c++-common/analyzer/strlen-pr113999.c   |  8 +++
 gcc/testsuite/gcc.dg/analyzer/strlen-1.c  | 52 +++
 5 files changed, 109 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/strlen-pr113999.c

diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index 23e3f71df0af..20a8e3f9a1d0 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -430,6 +430,9 @@ byte_offset_to_json (const byte_offset_t );
 extern tristate
 compare_constants (tree lhs_const, enum tree_code op, tree rhs_const);
 
+extern tree
+get_string_cst_size (const_tree string_cst);
+
 } // namespace ana
 
 extern bool is_special_named_call_p (const gcall *call, const char *funcname,
diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index 21e13b480257..93e72ec45a85 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -1407,6 +1407,20 @@ get_or_create_const_fn_result_svalue (tree type,
   return const_fn_result_sval;
 }
 
+/* Get a tree for the size of STRING_CST, or NULL_TREE.
+   Note that this may be larger than TREE_STRING_LENGTH (implying
+   a run of trailing zero bytes from TREE_STRING_LENGTH up to this
+   higher limit).  */
+
+tree
+get_string_cst_size (const_tree string_cst)
+{
+  gcc_assert (TREE_CODE (string_cst) == STRING_CST);
+  gcc_assert (TREE_CODE (TREE_TYPE (string_cst)) == ARRAY_TYPE);
+
+  return TYPE_SIZE_UNIT (TREE_TYPE (string_cst));
+}
+
 /* Given STRING_CST, a STRING_CST and BYTE_OFFSET_CST a constant,
attempt to get the character at that offset, returning either
the svalue for the character constant, or NULL if unsuccessful.  */
@@ -1420,16 +1434,27 @@ region_model_manager::maybe_get_char_from_string_cst 
(tree string_cst,
   /* Adapted from fold_read_from_constant_string.  */
   scalar_int_mode char_mode;
   if (TREE_CODE (byte_offset_cst) == INTEGER_CST
-  && compare_tree_int (byte_offset_cst,
-  TREE_STRING_LENGTH (string_cst)) < 0
   && is_int_mode (TYPE_MODE (TREE_TYPE (TREE_TYPE (string_cst))),
  _mode)
   && GET_MODE_SIZE (char_mode) == 1)
 {
+  /* If we're beyond the string_cst, the read is unsuccessful.  */
+  if (compare_constants (byte_offset_cst,
+GE_EXPR,
+get_string_cst_size (string_cst)).is_true ())
+   return NULL;
+
+  int char_val;
+  if (compare_tree_int (byte_offset_cst,
+   TREE_STRING_LENGTH (string_cst)) < 0)
+   /* We're within the area defined by TREE_STRING_POINTER.  */
+   char_val = (TREE_STRING_POINTER (string_cst)
+   [TREE_INT_CST_LOW (byte_offset_cst)]);
+  else
+   /* We're in the padding area of trailing zeroes.  */
+   char_val = 0;
   tree char_cst
-   = build_int_cst_type (TREE_TYPE (TREE_TYPE (string_cst)),
- (TREE_STRING_POINTER (string_cst)
-  [TREE_INT_CST_LOW (byte_offset_cst)]));
+   = build_int_cst_type (TREE_TYPE (TREE_TYPE (string_cst)), char_val);
   return get_or_create_constant_svalue (char_cst);
 }
   return NULL;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index a26be7075997..6ab917465d6f 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3648,7 +3648,22 @@ string_cst_has_null_terminator (tree string_cst,
byte_offset_t *out_bytes_read)
 {
   gcc_assert (bytes.m_start_byte_offset >= 0);
-  gcc_assert (bytes.m_start_byte_offset < TREE_STRING_LENGTH (string_cst));
+
+  /* If we're beyond the string_cst, reads are unsuccessful.  */
+  if (tree cst_size = get_string_cst_size (string_cst))
+if (TREE_CODE (cst_size) == INTEGER_CST)
+  if (bytes.m_start_byte_offset >= TREE_INT_CST_LOW (cst_size))
+   return tristate::unknown ();
+
+  /* Assume all bytes after TREE_STRING_LENGTH are zero.  

[pushed] analyzer: handle empty ranges in symbolic_byte_range::intersection [PR113998]

2024-02-20 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-9090-g79d4c7ddc83e00.

gcc/analyzer/ChangeLog:
PR analyzer/113998
* ranges.cc (symbolic_byte_range::intersection): Handle empty ranges.
(selftest::test_intersects): Add test coverage for empty ranges.

gcc/testsuite/ChangeLog:
PR analyzer/113998
* c-c++-common/analyzer/overlapping-buffers-pr113998.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/ranges.cc| 18 
 .../analyzer/overlapping-buffers-pr113998.c   | 21 +++
 2 files changed, 39 insertions(+)
 create mode 100644 
gcc/testsuite/c-c++-common/analyzer/overlapping-buffers-pr113998.c

diff --git a/gcc/analyzer/ranges.cc b/gcc/analyzer/ranges.cc
index f46b04121d3f..ffdd0d4c5722 100644
--- a/gcc/analyzer/ranges.cc
+++ b/gcc/analyzer/ranges.cc
@@ -193,6 +193,12 @@ tristate
 symbolic_byte_range::intersection (const symbolic_byte_range ,
   const region_model ) const
 {
+  /* If either is empty, then there is no intersection.  */
+  if (empty_p ())
+return tristate::TS_FALSE;
+  if (other.empty_p ())
+return tristate::TS_FALSE;
+
   /* For brevity, consider THIS to be "range A", and OTHER to be "range B".  */
 
   region_model_manager *mgr = model.get_manager ();
@@ -262,12 +268,17 @@ static void test_intersects (void)
   ASSERT_EQ (r0_9.get_next_byte_offset (mgr), ten);
   ASSERT_EQ (r0_9.get_last_byte_offset (mgr), nine);
 
+  symbolic_byte_range concrete_empty (zero, zero);
+  ASSERT_TRUE (concrete_empty.empty_p ());
+
   ASSERT_EQ (r0_9.intersection (r0, m), tristate::TS_TRUE);
   ASSERT_EQ (r0.intersection (r0_9, m), tristate::TS_TRUE);
   ASSERT_EQ (r0_9.intersection (r9, m), tristate::TS_TRUE);
   ASSERT_EQ (r9.intersection (r0_9, m), tristate::TS_TRUE);
   ASSERT_EQ (r0_9.intersection (r10, m), tristate::TS_FALSE);
   ASSERT_EQ (r10.intersection (r0_9, m), tristate::TS_FALSE);
+  ASSERT_EQ (concrete_empty.intersection (r0_9, m), tristate::TS_FALSE);
+  ASSERT_EQ (r0_9.intersection (concrete_empty, m), tristate::TS_FALSE);
 
   ASSERT_EQ (r5_9.intersection (r0, m), tristate::TS_FALSE);
   ASSERT_EQ (r0.intersection (r5_9, m), tristate::TS_FALSE);
@@ -286,6 +297,9 @@ static void test_intersects (void)
   symbolic_byte_range ry (y_init_sval, one);
   symbolic_byte_range rx_x_plus_y_minus_1 (x_init_sval, y_init_sval);
 
+  symbolic_byte_range symbolic_empty (x_init_sval, zero);
+  ASSERT_TRUE (symbolic_empty.empty_p ());
+
   ASSERT_EQ (rx_x_plus_y_minus_1.get_start_byte_offset (), x_init_sval);
   ASSERT_EQ (rx_x_plus_y_minus_1.get_size_in_bytes (), y_init_sval);
   ASSERT_EQ
@@ -296,6 +310,10 @@ static void test_intersects (void)
  SK_BINOP);
 
   ASSERT_EQ (rx.intersection (ry, m), tristate::TS_UNKNOWN);
+  ASSERT_EQ (rx.intersection (concrete_empty, m), tristate::TS_FALSE);
+  ASSERT_EQ (concrete_empty.intersection (rx, m), tristate::TS_FALSE);
+  ASSERT_EQ (rx.intersection (symbolic_empty, m), tristate::TS_FALSE);
+  ASSERT_EQ (symbolic_empty.intersection (rx, m), tristate::TS_FALSE);
   ASSERT_EQ (r0_x_minus_1.intersection (r0, m), tristate::TS_TRUE);
 #if 0
   ASSERT_EQ (r0_x_minus_1.intersection (rx, m), tristate::TS_FALSE);
diff --git a/gcc/testsuite/c-c++-common/analyzer/overlapping-buffers-pr113998.c 
b/gcc/testsuite/c-c++-common/analyzer/overlapping-buffers-pr113998.c
new file mode 100644
index ..5c6352eb42f4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/overlapping-buffers-pr113998.c
@@ -0,0 +1,21 @@
+/* Verify we don't ICE on -Wanalyzer-overlapping-buffers on
+   execution paths where the size is constant zero, but the
+   optimizer didn't see that.  */
+
+typedef __SIZE_TYPE__ size_t;
+
+extern char a[];
+size_t n;
+
+size_t  __attribute__((noinline))
+get_hidden_zero ()
+{
+  return 0;
+}
+
+void
+test_pr113998 ()
+{
+  size_t n = get_hidden_zero ();
+  __builtin_strncpy (a, a, n); /* { dg-warning "overlapping buffers passed as 
arguments to" } */
+}
-- 
2.26.3



[PATCH] c++: -Wuninitialized when binding a ref to uninit DM [PR113987]

2024-02-20 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This PR asks that our -Wuninitialized for mem-initializers does
not warn when binding a reference to an uninitialized data member.
We already check !INDIRECT_TYPE_P in find_uninit_fields_r, but
that won't catch binding a parameter of a reference type to an
uninitialized field, as in:

  struct S { S (int&); };
  struct T {
  T() : s(i) {}
  S s;
  int i;
  };

This patch adds a new function to handle this case.

PR c++/113987

gcc/cp/ChangeLog:

* call.cc (conv_binds_to_reference_parm_p): New.
* cp-tree.h (conv_binds_to_reference_parm_p): Declare.
* init.cc (find_uninit_fields_r): Call it.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wuninitialized-15.C: Turn dg-warning into dg-bogus.
* g++.dg/warn/Wuninitialized-34.C: New test.
---
 gcc/cp/call.cc| 24 ++
 gcc/cp/cp-tree.h  |  1 +
 gcc/cp/init.cc|  3 +-
 gcc/testsuite/g++.dg/warn/Wuninitialized-15.C |  3 +-
 gcc/testsuite/g++.dg/warn/Wuninitialized-34.C | 32 +++
 5 files changed, 60 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wuninitialized-34.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 1dac1470d3b..c40ef2e3028 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -14551,4 +14551,28 @@ maybe_show_nonconverting_candidate (tree to, tree 
from, tree arg, int flags)
"function was not considered");
 }
 
+/* We're converting EXPR to TYPE.  If that conversion involves a conversion
+   function and we're binding EXPR to a reference parameter of that function,
+   return true.  */
+
+bool
+conv_binds_to_reference_parm_p (tree type, tree expr)
+{
+  conversion_obstack_sentinel cos;
+  conversion *c = implicit_conversion (type, TREE_TYPE (expr), expr,
+  /*c_cast_p=*/false, LOOKUP_NORMAL,
+  tf_none);
+  if (c && !c->bad_p && c->user_conv_p)
+for (; c; c = next_conversion (c))
+  if (c->kind == ck_user)
+   for (z_candidate *cand = c->cand; cand; cand = cand->next)
+ if (cand->viable == 1)
+   for (size_t i = 0; i < cand->num_convs; ++i)
+ if (cand->convs[i]->kind == ck_ref_bind
+ && conv_get_original_expr (cand->convs[i]) == expr)
+   return true;
+
+  return false;
+}
+
 #include "gt-cp-call.h"
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 334c11396c2..ce2d85f1f86 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6845,6 +6845,7 @@ extern void cp_warn_deprecated_use_scopes (tree);
 extern tree get_function_version_dispatcher(tree);
 extern bool any_template_arguments_need_structural_equality_p (tree);
 extern void maybe_show_nonconverting_candidate (tree, tree, tree, int);
+extern bool conv_binds_to_reference_parm_p (tree, tree);
 
 /* in class.cc */
 extern tree build_vfield_ref   (tree, tree);
diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index ac37330527e..1a341f7e606 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -906,7 +906,8 @@ find_uninit_fields_r (tree *tp, int *walk_subtrees, void 
*data)
warning_at (EXPR_LOCATION (init), OPT_Wuninitialized,
"reference %qD is not yet bound to a value when used "
"here", field);
- else if (!INDIRECT_TYPE_P (type) || is_this_parameter (d->member))
+ else if ((!INDIRECT_TYPE_P (type) || is_this_parameter (d->member))
+  && !conv_binds_to_reference_parm_p (type, init))
warning_at (EXPR_LOCATION (init), OPT_Wuninitialized,
"member %qD is used uninitialized", field);
  *walk_subtrees = false;
diff --git a/gcc/testsuite/g++.dg/warn/Wuninitialized-15.C 
b/gcc/testsuite/g++.dg/warn/Wuninitialized-15.C
index 89e90668c41..2fd33037bfd 100644
--- a/gcc/testsuite/g++.dg/warn/Wuninitialized-15.C
+++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-15.C
@@ -65,8 +65,7 @@ struct H {
   G g;
   A a2;
   H() : g(a1) { }
-  // ??? clang++ doesn't warn here
-  H(int) : g(a2) { } // { dg-warning "member .H::a2. is used uninitialized" }
+  H(int) : g(a2) { } // { dg-bogus "member .H::a2. is used uninitialized" }
 };
 
 struct I {
diff --git a/gcc/testsuite/g++.dg/warn/Wuninitialized-34.C 
b/gcc/testsuite/g++.dg/warn/Wuninitialized-34.C
new file mode 100644
index 000..28226d8032e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-34.C
@@ -0,0 +1,32 @@
+// PR c++/113987
+// { dg-do compile }
+// { dg-options "-Wuninitialized" }
+
+struct t1 {
+t1(int);
+};
+struct t2 {
+t2(int&, int = 0);
+t2(double&, int = 0);
+};
+struct t3 {
+t3(int&);
+};
+struct t4 {};
+void f1(int&);
+struct t {
+t() :
+  v1(i),  // { dg-warning "is used uninitialized" }
+  v2(i),
+  v3(i),
+  v4((f1(i), t4())),
+

[PATCH v3] bpf: add inline memmove and memcpy expansion

2024-02-20 Thread David Faust
[Changes from v2: 
 - Fix incorrectly passing a location instead of OPT_W* for warning ().
 - Reword warning/error message and test accordingly.  ]
 
[Changes from v1: Jose's review comments, all of which I agree with.
 - Fix 'implments' typo in commit message.
 - Change check that alignment is CONST_INT to gcc_assert ().
 - Change default case in alignment switch to gcc_unreachable ().
 - Reword error message for non-constant size memmove/memcpy, and
   update test for the error accordingly.
 - Delete CPYMEM_EXPAND_ERR macro, since it was now only used in
   one place.  ]

BPF programs are not typically linked, which means we cannot fall back
on library calls to implement __builtin_{memmove,memcpy} and should
always expand them inline if possible.

GCC already successfully expands these builtins inline in many cases,
but failed to do so for a few for simple cases involving overlapping
memmove in the kernel BPF selftests and was instead emitting a libcall.

This patch implements a simple inline expansion of memcpy and memmove in
the BPF backend in a verifier-friendly way, with the caveat that the
size must be an integer constant, which is also required by clang.

Tested for bpf-unknown-none on x86_64-linux-gnu host.

Also tested against the BPF verifier by compiling and loading a test
program with overlapping memmove (essentially the memmove-1.c test)
which failed before due to a libcall, and now successfully loads and
passes the verifier.

gcc/

* config/bpf/bpf-protos.h (bpf_expand_cpymem): New.
* config/bpf/bpf.cc: (emit_move_loop, bpf_expand_cpymem): New.
* config/bpf/bpf.md: (cpymemdi, movmemdi): New define_expands.

gcc/testsuite/

* gcc.target/bpf/memcpy-1.c: New test.
* gcc.target/bpf/memmove-1.c: New test.
* gcc.target/bpf/memmove-2.c: New test.
---
 gcc/config/bpf/bpf-protos.h  |   2 +
 gcc/config/bpf/bpf.cc| 115 +++
 gcc/config/bpf/bpf.md|  36 +++
 gcc/testsuite/gcc.target/bpf/memcpy-1.c  |  26 +
 gcc/testsuite/gcc.target/bpf/memmove-1.c |  46 +
 gcc/testsuite/gcc.target/bpf/memmove-2.c |  23 +
 6 files changed, 248 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/bpf/memcpy-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/memmove-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/memmove-2.c

diff --git a/gcc/config/bpf/bpf-protos.h b/gcc/config/bpf/bpf-protos.h
index 46d950bd990..366acb87ae4 100644
--- a/gcc/config/bpf/bpf-protos.h
+++ b/gcc/config/bpf/bpf-protos.h
@@ -35,4 +35,6 @@ const char *bpf_add_core_reloc (rtx *operands, const char 
*templ);
 class gimple_opt_pass;
 gimple_opt_pass *make_pass_lower_bpf_core (gcc::context *ctxt);
 
+bool bpf_expand_cpymem (rtx *, bool);
+
 #endif /* ! GCC_BPF_PROTOS_H */
diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index d6ca47eeecb..f9ac263613a 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -1184,6 +1184,121 @@ bpf_use_by_pieces_infrastructure_p (unsigned 
HOST_WIDE_INT size,
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   bpf_use_by_pieces_infrastructure_p
 
+/* Helper for bpf_expand_cpymem.  Emit an unrolled loop moving the bytes
+   from SRC to DST.  */
+
+static void
+emit_move_loop (rtx src, rtx dst, machine_mode mode, int offset, int inc,
+   unsigned iters, unsigned remainder)
+{
+  rtx reg = gen_reg_rtx (mode);
+
+  /* First copy in chunks as large as alignment permits.  */
+  for (unsigned int i = 0; i < iters; i++)
+{
+  emit_move_insn (reg, adjust_address (src, mode, offset));
+  emit_move_insn (adjust_address (dst, mode, offset), reg);
+  offset += inc;
+}
+
+  /* Handle remaining bytes which might be smaller than the chunks
+ used above.  */
+  if (remainder & 4)
+{
+  emit_move_insn (reg, adjust_address (src, SImode, offset));
+  emit_move_insn (adjust_address (dst, SImode, offset), reg);
+  offset += (inc < 0 ? -4 : 4);
+  remainder -= 4;
+}
+  if (remainder & 2)
+{
+  emit_move_insn (reg, adjust_address (src, HImode, offset));
+  emit_move_insn (adjust_address (dst, HImode, offset), reg);
+  offset += (inc < 0 ? -2 : 2);
+  remainder -= 2;
+}
+  if (remainder & 1)
+{
+  emit_move_insn (reg, adjust_address (src, QImode, offset));
+  emit_move_insn (adjust_address (dst, QImode, offset), reg);
+}
+}
+
+/* Expand cpymem/movmem, as from __builtin_memcpy/memmove.
+   OPERANDS are the same as the cpymem/movmem patterns.
+   IS_MOVE is true if this is a memmove, false for memcpy.
+   Return true if we successfully expanded, or false if we cannot
+   and must punt to a libcall.  */
+
+bool
+bpf_expand_cpymem (rtx *operands, bool is_move)
+{
+  /* Size must be constant for this expansion to work.  */
+  if (!CONST_INT_P (operands[2]))
+{
+  const char *name = is_move ? "memmove" : "memcpy";
+  if (flag_building_libgcc)
+   warning (0, "could 

Re: [PATCH v2] bpf: add inline memmove and memcpy expansion

2024-02-20 Thread David Faust



On 2/20/24 12:37, Jose E. Marchesi wrote:
> 
> Hi Faust.
> 
>> +bool
>> +bpf_expand_cpymem (rtx *operands, bool is_move)
>> +{
>> +  /* Size must be constant for this expansion to work.  */
>> +  if (!CONST_INT_P (operands[2]))
>> +{
>> +  const char *name = is_move ? "memmove" : "memcpy";
>> +  if (flag_building_libgcc)
>> +warning (RTL_LOCATION (operands[2]),
>> + "could not expand call to %<__builtin_%s%> inline: "
>> + "size must be constant", name);
>> +  else
>> +error ("could not expand call to %<__builtin_%s%> inline: "
>> +   "size must be constant", name);
>> +  return false;
>> +}
> 
> I think you want to use warning_at and error_at above...  the first
> argument to `warning' (and second to warning_at) is not a location but
> an OPT_W* value.  You should pass 0 to the opt argument in this case as
> there is no -W option to handle this warning.

Oops. Thanks.

After fixing the arguments, `warning' and `error' generally give better
messages than `warning_at' and `error_at' since the
RTL_LOCATION (operands[2]) seems to be NULL for most (all?) cases, which
causes the diagnostic message to refer only to the containing function.
The non-`_at' versions highlight the builtin call being expanded, which
is much nicer.

> 
> Also, I would not mention 'expand' in the error/warning message, as it
> is a GCC internal concept.  I would use perhaps "could not inline call
> to %<__builtin_%s%>".

OK.

> 
>> +  /* Alignment is a CONST_INT.  */
>> +  gcc_assert (CONST_INT_P (operands[3]));
>> +
>> +  rtx dst = operands[0];
>> +  rtx src = operands[1];
>> +  rtx size = operands[2];
>> +  unsigned HOST_WIDE_INT size_bytes = UINTVAL (size);
>> +  unsigned align = UINTVAL (operands[3]);
>> +  enum machine_mode mode;
>> +  switch (align)
>> +{
>> +case 1: mode = QImode; break;
>> +case 2: mode = HImode; break;
>> +case 4: mode = SImode; break;
>> +case 8: mode = DImode; break;
>> +default:
>> +  gcc_unreachable ();
>> +}
>> +
>> +  unsigned iters = size_bytes >> ceil_log2 (align);
>> +  unsigned remainder = size_bytes & (align - 1);
>> +
>> +  int inc = GET_MODE_SIZE (mode);
>> +  rtx_code_label *fwd_label, *done_label;
>> +  if (is_move)
>> +{
>> +  /* For memmove, be careful of overlap.  It is not a concern for 
>> memcpy.
>> + To handle overlap, we check (at runtime) if SRC < DST, and if so do
>> + the move "backwards" starting from SRC + SIZE.  */
>> +  fwd_label = gen_label_rtx ();
>> +  done_label = gen_label_rtx ();
>> +
>> +  rtx dst_addr = copy_to_mode_reg (Pmode, XEXP (dst, 0));
>> +  rtx src_addr = copy_to_mode_reg (Pmode, XEXP (src, 0));
>> +  emit_cmp_and_jump_insns (src_addr, dst_addr, GEU, NULL_RTX, Pmode,
>> +   true, fwd_label, profile_probability::even ());
>> +
>> +  /* Emit the "backwards" unrolled loop.  */
>> +  emit_move_loop (src, dst, mode, size_bytes, -inc, iters, remainder);
>> +  emit_jump_insn (gen_jump (done_label));
>> +  emit_barrier ();
>> +
>> +  emit_label (fwd_label);
>> +}
>> +
>> +  emit_move_loop (src, dst, mode, 0, inc, iters, remainder);
>> +
>> +  if (is_move)
>> +emit_label (done_label);
>> +
>> +  return true;
>> +}
>> +
>>  /* Finally, build the GCC target.  */
>>  
>>  struct gcc_target targetm = TARGET_INITIALIZER;
>> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
>> index 50df1aaa3e2..ca677bc6b50 100644
>> --- a/gcc/config/bpf/bpf.md
>> +++ b/gcc/config/bpf/bpf.md
>> @@ -627,4 +627,40 @@ (define_insn "ldabs"
>>"{ldabs\t%0|r0 = *( *) skb[%0]}"
>>[(set_attr "type" "ld")])
>>  
>> +;;; memmove and memcopy
>> +
>> +;; 0 is dst
>> +;; 1 is src
>> +;; 2 is size of copy in bytes
>> +;; 3 is alignment
>> +
>> +(define_expand "cpymemdi"
>> +  [(match_operand:BLK 0 "memory_operand")
>> +   (match_operand:BLK 1 "memory_operand")
>> +   (match_operand:DI 2 "general_operand")
>> +   (match_operand:DI 3 "immediate_operand")]
>> +   ""
>> +{
>> +  if (bpf_expand_cpymem (operands, false))
>> +DONE;
>> +  FAIL;
>> +})
>> +
>> +;; 0 is dst
>> +;; 1 is src
>> +;; 2 is size of copy in bytes
>> +;; 3 is alignment
>> +
>> +(define_expand "movmemdi"
>> +  [(match_operand:BLK 0 "memory_operand")
>> +   (match_operand:BLK 1 "memory_operand")
>> +   (match_operand:DI 2 "general_operand")
>> +   (match_operand:DI 3 "immediate_operand")]
>> +   ""
>> +{
>> +  if (bpf_expand_cpymem (operands, true))
>> +DONE;
>> +  FAIL;
>> +})
>> +
>>  (include "atomic.md")
>> diff --git a/gcc/testsuite/gcc.target/bpf/memcpy-1.c 
>> b/gcc/testsuite/gcc.target/bpf/memcpy-1.c
>> new file mode 100644
>> index 000..6c9707f24e8
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/bpf/memcpy-1.c
>> @@ -0,0 +1,26 @@
>> +/* Ensure memcpy is expanded inline rather than emitting a libcall.  */
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +struct context {
>> + unsigned int data;
>> 

Re: [PATCH] rs6000: Update instruction counts due to combine changes [PR112103]

2024-02-20 Thread Segher Boessenkool
On Tue, Feb 20, 2024 at 01:49:30PM -0600, Peter Bergner wrote:
> I think this will become less fragile after we fix PR114004 which is

You call it "fragile".  I call it the testcase found the exact kind of
bug this testcase was meant to find!

Yes, the test should become quieter when the compiler has fewer bugs :-)


Segher


[PATCH v10 24/24] libstdc++: Optimize std::is_invocable compilation performance

2024-02-20 Thread Ken Matsui
This patch optimizes the compilation performance of std::is_invocable
by dispatching to the new __is_invocable built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_invocable): Use __is_invocable
built-in trait.
* testsuite/20_util/is_invocable/incomplete_args_neg.cc: Handle
the new error from __is_invocable.
* testsuite/20_util/is_invocable/incomplete_neg.cc: Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits  | 4 
 .../testsuite/20_util/is_invocable/incomplete_args_neg.cc | 1 +
 libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc | 1 +
 3 files changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 1577042a5b8..9af233bcc75 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -3235,7 +3235,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// std::is_invocable
   template
 struct is_invocable
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_invocable)
+: public __bool_constant<__is_invocable(_Fn, _ArgTypes...)>
+#else
 : __is_invocable_impl<__invoke_result<_Fn, _ArgTypes...>, void>::type
+#endif
 {
   static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
"_Fn must be a complete class or an unbounded array");
diff --git a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc 
b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
index a575750f9e9..9619129b817 100644
--- a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
@@ -18,6 +18,7 @@
 // .
 
 // { dg-error "must be a complete class" "" { target *-*-* } 0 }
+// { dg-prune-output "invalid use of incomplete type" }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc 
b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
index 05848603555..b478ebce815 100644
--- a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
@@ -18,6 +18,7 @@
 // .
 
 // { dg-error "must be a complete class" "" { target *-*-* } 0 }
+// { dg-prune-output "invalid use of incomplete type" }
 
 #include 
 
-- 
2.43.2



[PATCH V2] RISC-V: Specify mtune and march for PR113742

2024-02-20 Thread Edwin Lu
The testcase pr113742.c is failing for 32 bit targets due to the following cc1
error:
cc1: error: ABI requries '-march=rv64'

Specify '-march=rv64gc' with '-mtune=sifive-p600-series'

V1: https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645609.html

PR target/113742

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr113742.c: change mcpu to mtune and add march

Signed-off-by: Edwin Lu 
---
V1: use require-effective-target
V2: switch to specifying march and mtune
---
 gcc/testsuite/gcc.target/riscv/pr113742.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/pr113742.c 
b/gcc/testsuite/gcc.target/riscv/pr113742.c
index ab8934c2a8a..573afd6f0ad 100644
--- a/gcc/testsuite/gcc.target/riscv/pr113742.c
+++ b/gcc/testsuite/gcc.target/riscv/pr113742.c
@@ -1,4 +1,4 @@
-//* { dg-do compile } */
-/* { dg-options "-O2 -finstrument-functions -mabi=lp64d -mcpu=sifive-p670" } */
+/* { dg-do compile } */
+/* { dg-options "-O2 -finstrument-functions -march=rv64gc -mabi=lp64d 
-mtune=sifive-p600-series" } */
 
 void foo(void) {}
-- 
2.34.1



Re: [PATCH] libgccjit: Add option to allow special characters in function names

2024-02-20 Thread Iain Sandoe



> On 20 Feb 2024, at 20:50, David Malcolm  wrote:
> 
> On Thu, 2024-02-15 at 17:08 -0500, Antoni Boucher wrote:
>> Hi.
>> This patch adds a new option to allow special characters like . and $
>> in function names.
>> This is useful to allow for mangling using those characters.
>> Thanks for the review.
> 
> Thanks for the patch.
> 
>> diff --git a/gcc/jit/docs/topics/contexts.rst 
>> b/gcc/jit/docs/topics/contexts.rst
>> index 10a0e50f9f6..4af75ea7418 100644
>> --- a/gcc/jit/docs/topics/contexts.rst
>> +++ b/gcc/jit/docs/topics/contexts.rst
>> @@ -453,6 +453,10 @@ Boolean options
>>  If true, the :type:`gcc_jit_context` will not clean up intermediate 
>> files
>>  written to the filesystem, and will display their location on stderr.
>> 
>> +  .. macro:: GCC_JIT_BOOL_OPTION_SPECIAL_CHARS_IN_FUNC_NAMES
>> +
>> + If true, allow special characters like . and $ in function names.
> 
> The documentation and the comment in libgccjit.h say:
>  "allow special characters like . and $ in function names."
> and on reading the implementation, the special characters are exactly
> '.' and '$'.
> 
> The API seems rather arbitrary and inflexible to me; why the choice of
> those characters?  Presumably those are the ones that Rust's mangling
> scheme uses, but do other mangling schemes require other chars?
> 
> How about an API for setting the valid chars, something like:
> 
> extern void
> gcc_jit_context_set_valid_symbol_chars (gcc_jit_context *ctxt,
>const char *chars);
> 
> to specify the chars that are valid in addition to underscore and
> alphanumeric.
> 
> In your case you'd call:
> 
>  gcc_jit_context_set_valid_symbol_chars (ctxt, ".$");
> 
> Or is that overkill?

If we ever wanted to support objective-c (NeXT runtime) then we’d need to
be able to support +,-,[,] space and : at least.  The interesting thing there is
that most assemblers do not support that either (and the symbols then need
to be quoted into the assembler) .

So, it’s not (IMO) overkill considering at least one potential extension.

Iain

> 
> Dave
> 



Re: [PATCH] libgccjit: Add option to allow special characters in function names

2024-02-20 Thread David Malcolm
On Thu, 2024-02-15 at 17:08 -0500, Antoni Boucher wrote:
> Hi.
> This patch adds a new option to allow special characters like . and $
> in function names.
> This is useful to allow for mangling using those characters.
> Thanks for the review.

Thanks for the patch.

> diff --git a/gcc/jit/docs/topics/contexts.rst 
> b/gcc/jit/docs/topics/contexts.rst
> index 10a0e50f9f6..4af75ea7418 100644
> --- a/gcc/jit/docs/topics/contexts.rst
> +++ b/gcc/jit/docs/topics/contexts.rst
> @@ -453,6 +453,10 @@ Boolean options
>   If true, the :type:`gcc_jit_context` will not clean up intermediate 
> files
>   written to the filesystem, and will display their location on stderr.
>  
> +  .. macro:: GCC_JIT_BOOL_OPTION_SPECIAL_CHARS_IN_FUNC_NAMES
> +
> + If true, allow special characters like . and $ in function names.

The documentation and the comment in libgccjit.h say:
  "allow special characters like . and $ in function names."
and on reading the implementation, the special characters are exactly
'.' and '$'.

The API seems rather arbitrary and inflexible to me; why the choice of
those characters?  Presumably those are the ones that Rust's mangling
scheme uses, but do other mangling schemes require other chars?

How about an API for setting the valid chars, something like:

extern void
gcc_jit_context_set_valid_symbol_chars (gcc_jit_context *ctxt,
const char *chars);

to specify the chars that are valid in addition to underscore and
alphanumeric.

In your case you'd call:

  gcc_jit_context_set_valid_symbol_chars (ctxt, ".$");

Or is that overkill?

Dave



Re: [PATCH 3/5] btf: moved btf deallocation to final.

2024-02-20 Thread David Faust
Hi Cupertino,

On 2/20/24 02:24, Cupertino Miranda wrote:
> Dissociated .BTF.ext from the CO-RE relocations creation. Improvement of
> allocation/deallocation of BTF structures. Moving deallocation to final
> when needed.
> 
> gcc/ChangeLog:
> 
>   * config/bpf/bpf.cc (bpf_option_override): Make BTF.ext enabled
>   by default for BPF.
>   (btf_asm_init_sections): Add btf deallocation.
>   * dwarf2ctf.cc (ctf_debug_finalize): Fixed btf deallocation.

I find the commit message and ChangeLog here overly brief and a little
bit confusing.

You refer to moving the BTF deallocation to 'final', but IMO in this
context 'final' has the particular meaning of referring to pass_final,
which is not where the deallocation is moved. Rather, the patch moves it
to bpf_file_end, which implements the TARGET_ASM_FILE_END hook and is
called after pass_final. So I suggest to avoid calling it 'final', and
to explain a little in the commit message under what circumstances the
deallocation must be moved.

Also, I think the ChangeLog is missing an entry for bpf_file_end.

Please find a little note on a typo inline below.

> ---
>  gcc/config/bpf/bpf.cc | 20 +---
>  gcc/dwarf2ctf.cc  |  5 -
>  2 files changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index d6ca47eeecbe..4318b26b9cda 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -195,10 +195,8 @@ bpf_option_override (void)
>if (TARGET_BPF_CORE && !btf_debuginfo_p ())
>  error ("BPF CO-RE requires BTF debugging information, use %<-gbtf%>");
>  
> -  /* To support the portability needs of BPF CO-RE approach, BTF debug
> - information includes the BPF CO-RE relocations.  */
> -  if (TARGET_BPF_CORE)
> -write_symbols |= BTF_WITH_CORE_DEBUG;
> +  /* BPF applications always generate .BTF.ext.  */
> +  write_symbols |= BTF_WITH_CORE_DEBUG;
>  
>/* Unlike much of the other BTF debug information, the information 
> necessary
>   for CO-RE relocations is added to the CTF container by the BPF backend.
> @@ -218,10 +216,7 @@ bpf_option_override (void)
>/* -gbtf implies -mcore when using the BPF backend, unless -mno-co-re
>   is specified.  */
>if (btf_debuginfo_p () && !(target_flags_explicit & MASK_BPF_CORE))
> -{
> -  target_flags |= MASK_BPF_CORE;
> -  write_symbols |= BTF_WITH_CORE_DEBUG;
> -}
> +target_flags |= MASK_BPF_CORE;
>  
>/* Determine available features from ISA setting (-mcpu=).  */
>if (bpf_has_jmpext == -1)
> @@ -267,7 +262,7 @@ bpf_option_override (void)
>  static void
>  bpf_asm_init_sections (void)
>  {
> -  if (TARGET_BPF_CORE)
> +  if (btf_debuginfo_p () && btf_with_core_debuginfo_p ())
>  btf_ext_init ();
>  }
>  
> @@ -279,8 +274,11 @@ bpf_asm_init_sections (void)
>  static void
>  bpf_file_end (void)
>  {
> -  if (TARGET_BPF_CORE)
> -btf_ext_output ();
> +  if (btf_debuginfo_p () && btf_with_core_debuginfo_p ())
> +{
> +  btf_ext_output ();
> +  btf_finalize ();
> +}
>  }
>  
>  #undef TARGET_ASM_FILE_END
> diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc
> index 93e5619933fa..b9dfecf2c1c4 100644
> --- a/gcc/dwarf2ctf.cc
> +++ b/gcc/dwarf2ctf.cc
> @@ -944,7 +944,10 @@ ctf_debug_finalize (const char *filename, bool btf)
>if (btf)
>  {
>btf_output (filename);
> -  btf_finalize ();
> +  /* btf_finalize when compiling BPF applciations gets deallocated by the
> +  BPF target in bpf_file_end.  */

typo: applications

> +  if (btf_debuginfo_p () && !btf_with_core_debuginfo_p ())
> + btf_finalize ();
>  }
>  
>else


Re: [PATCH v2] bpf: add inline memmove and memcpy expansion

2024-02-20 Thread Jose E. Marchesi


Hi Faust.

> +bool
> +bpf_expand_cpymem (rtx *operands, bool is_move)
> +{
> +  /* Size must be constant for this expansion to work.  */
> +  if (!CONST_INT_P (operands[2]))
> +{
> +  const char *name = is_move ? "memmove" : "memcpy";
> +  if (flag_building_libgcc)
> + warning (RTL_LOCATION (operands[2]),
> +  "could not expand call to %<__builtin_%s%> inline: "
> +  "size must be constant", name);
> +  else
> + error ("could not expand call to %<__builtin_%s%> inline: "
> +"size must be constant", name);
> +  return false;
> +}

I think you want to use warning_at and error_at above...  the first
argument to `warning' (and second to warning_at) is not a location but
an OPT_W* value.  You should pass 0 to the opt argument in this case as
there is no -W option to handle this warning.

Also, I would not mention 'expand' in the error/warning message, as it
is a GCC internal concept.  I would use perhaps "could not inline call
to %<__builtin_%s%>".

> +  /* Alignment is a CONST_INT.  */
> +  gcc_assert (CONST_INT_P (operands[3]));
> +
> +  rtx dst = operands[0];
> +  rtx src = operands[1];
> +  rtx size = operands[2];
> +  unsigned HOST_WIDE_INT size_bytes = UINTVAL (size);
> +  unsigned align = UINTVAL (operands[3]);
> +  enum machine_mode mode;
> +  switch (align)
> +{
> +case 1: mode = QImode; break;
> +case 2: mode = HImode; break;
> +case 4: mode = SImode; break;
> +case 8: mode = DImode; break;
> +default:
> +  gcc_unreachable ();
> +}
> +
> +  unsigned iters = size_bytes >> ceil_log2 (align);
> +  unsigned remainder = size_bytes & (align - 1);
> +
> +  int inc = GET_MODE_SIZE (mode);
> +  rtx_code_label *fwd_label, *done_label;
> +  if (is_move)
> +{
> +  /* For memmove, be careful of overlap.  It is not a concern for memcpy.
> +  To handle overlap, we check (at runtime) if SRC < DST, and if so do
> +  the move "backwards" starting from SRC + SIZE.  */
> +  fwd_label = gen_label_rtx ();
> +  done_label = gen_label_rtx ();
> +
> +  rtx dst_addr = copy_to_mode_reg (Pmode, XEXP (dst, 0));
> +  rtx src_addr = copy_to_mode_reg (Pmode, XEXP (src, 0));
> +  emit_cmp_and_jump_insns (src_addr, dst_addr, GEU, NULL_RTX, Pmode,
> +true, fwd_label, profile_probability::even ());
> +
> +  /* Emit the "backwards" unrolled loop.  */
> +  emit_move_loop (src, dst, mode, size_bytes, -inc, iters, remainder);
> +  emit_jump_insn (gen_jump (done_label));
> +  emit_barrier ();
> +
> +  emit_label (fwd_label);
> +}
> +
> +  emit_move_loop (src, dst, mode, 0, inc, iters, remainder);
> +
> +  if (is_move)
> +emit_label (done_label);
> +
> +  return true;
> +}
> +
>  /* Finally, build the GCC target.  */
>  
>  struct gcc_target targetm = TARGET_INITIALIZER;
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 50df1aaa3e2..ca677bc6b50 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -627,4 +627,40 @@ (define_insn "ldabs"
>"{ldabs\t%0|r0 = *( *) skb[%0]}"
>[(set_attr "type" "ld")])
>  
> +;;; memmove and memcopy
> +
> +;; 0 is dst
> +;; 1 is src
> +;; 2 is size of copy in bytes
> +;; 3 is alignment
> +
> +(define_expand "cpymemdi"
> +  [(match_operand:BLK 0 "memory_operand")
> +   (match_operand:BLK 1 "memory_operand")
> +   (match_operand:DI 2 "general_operand")
> +   (match_operand:DI 3 "immediate_operand")]
> +   ""
> +{
> +  if (bpf_expand_cpymem (operands, false))
> +DONE;
> +  FAIL;
> +})
> +
> +;; 0 is dst
> +;; 1 is src
> +;; 2 is size of copy in bytes
> +;; 3 is alignment
> +
> +(define_expand "movmemdi"
> +  [(match_operand:BLK 0 "memory_operand")
> +   (match_operand:BLK 1 "memory_operand")
> +   (match_operand:DI 2 "general_operand")
> +   (match_operand:DI 3 "immediate_operand")]
> +   ""
> +{
> +  if (bpf_expand_cpymem (operands, true))
> +DONE;
> +  FAIL;
> +})
> +
>  (include "atomic.md")
> diff --git a/gcc/testsuite/gcc.target/bpf/memcpy-1.c 
> b/gcc/testsuite/gcc.target/bpf/memcpy-1.c
> new file mode 100644
> index 000..6c9707f24e8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/memcpy-1.c
> @@ -0,0 +1,26 @@
> +/* Ensure memcpy is expanded inline rather than emitting a libcall.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +struct context {
> + unsigned int data;
> + unsigned int data_end;
> + unsigned int data_meta;
> + unsigned int ingress;
> + unsigned int queue_index;
> + unsigned int egress;
> +};
> +
> +void
> +cpy_1(struct context *ctx)
> +{
> +  void *data = (void *)(long)ctx->data;
> +  char *dest;
> +  dest = data;
> +  dest += 16;
> +
> +  __builtin_memcpy (dest, data, 8);
> +}
> +
> +/* { dg-final { scan-assembler-times "call" 0 } } */
> diff --git a/gcc/testsuite/gcc.target/bpf/memmove-1.c 
> b/gcc/testsuite/gcc.target/bpf/memmove-1.c
> new file mode 100644
> index 000..3b8ba82639e
> --- /dev/null
> +++ 

Re: [PATCH] libgcc, aarch64: Allow for BE platforms in heap trampolines.

2024-02-20 Thread Richard Sandiford
Iain Sandoe  writes:
> Andrew Pinski pointed out on irc, that the current implementation of the
> heap trampoline code fragment would make the instruction byte order follow
> memory byte order for BE AArch64, which is not what is required.
>
> This patch revises the initializers so that instruction byte order is
> independent of memory byte order.
>
> I have tested this on aarch64-linux-gnu, aarch64-darwin and on a cross to
> aarch64_be-linux-gnu (including compile tests on the latter, but I have no
> way, at present, to carry out execute tests).
>
> (Note that this patch is applied on top of the one for PR113971).
>
> OK for trunk, or what would be a way forward?
> thanks
> Iain 
>
> --- 8< ---
>
> This arranges that the byte order of the instruction sequences is
> independent of the byte order of memory.
>
> libgcc/ChangeLog:
>
>   * config/aarch64/heap-trampoline.c
>   (aarch64_trampoline_insns): Arrange to encode instructions as a
>   byte array so that the order is independent of memory byte order.
>   (struct aarch64_trampoline): Likewise.
>
> Signed-off-by: Iain Sandoe 

OK, thanks.

Richard

> ---
>  libgcc/config/aarch64/heap-trampoline.c | 30 -
>  1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/libgcc/config/aarch64/heap-trampoline.c 
> b/libgcc/config/aarch64/heap-trampoline.c
> index 1e3460b1601..885df629da7 100644
> --- a/libgcc/config/aarch64/heap-trampoline.c
> +++ b/libgcc/config/aarch64/heap-trampoline.c
> @@ -30,23 +30,23 @@ void __gcc_nested_func_ptr_created (void *chain, void 
> *func, void *dst);
>  void __gcc_nested_func_ptr_deleted (void);
>  
>  #if defined(__linux__)
> -static const uint32_t aarch64_trampoline_insns[] = {
> -  0xd503245f, /* hint34 */
> -  0x58b1, /* ldr x17, .+20 */
> -  0x58d2, /* ldr x18, .+24 */
> -  0xd61f0220, /* br  x17 */
> -  0xd5033f9f, /* dsb sy */
> -  0xd5033fdf /* isb */
> +static const unsigned char aarch64_trampoline_insns[6][4] = {
> +  {0x5f, 0x24, 0x03, 0xd5}, /* hint34 */
> +  {0xb1, 0x00, 0x00, 0x58}, /* ldr x17, .+20 */
> +  {0xd2, 0x00, 0x00, 0x58}, /* ldr x18, .+24 */
> +  {0x20, 0x02, 0x1f, 0xd6}, /* br  x17 */
> +  {0x9f, 0x3f, 0x03, 0xd5}, /* dsb sy */
> +  {0xdf, 0x3f, 0x03, 0xd5} /* isb */
>  };
>  
>  #elif __APPLE__
> -static const uint32_t aarch64_trampoline_insns[] = {
> -  0xd503245f, /* hint34 */
> -  0x58b1, /* ldr x17, .+20 */
> -  0x58d0, /* ldr x16, .+24 */
> -  0xd61f0220, /* br  x17 */
> -  0xd5033f9f, /* dsb sy */
> -  0xd5033fdf /* isb */
> +static const unsigned char aarch64_trampoline_insns[6][4] = {
> +  {0x5f, 0x24, 0x03, 0xd5}, /* hint34 */
> +  {0xb1, 0x00, 0x00, 0x58}, /* ldr x17, .+20 */
> +  {0xd0, 0x00, 0x00, 0x58}, /* ldr x16, .+24 */
> +  {0x20, 0x02, 0x1f, 0xd6}, /* br  x17 */
> +  {0x9f, 0x3f, 0x03, 0xd5}, /* dsb sy */
> +  {0xdf, 0x3f, 0x03, 0xd5} /* isb */
>  };
>  
>  #else
> @@ -54,7 +54,7 @@ static const uint32_t aarch64_trampoline_insns[] = {
>  #endif
>  
>  struct aarch64_trampoline {
> -  uint32_t insns[6];
> +  unsigned char insns[6][4];
>void *func_ptr;
>void *chain_ptr;
>  };


Re: [PATCH] Fortran: fix passing array component to polymorphic argument [PR105658]

2024-02-20 Thread Steve Kargl
On Tue, Feb 20, 2024 at 08:53:37PM +0100, Harald Anlauf wrote:
> On 2/19/24 16:19, Peter Hill wrote:
> > Hi Harald,
> > 
> > Thanks for your help, please see the updated and signed-off patch below.
> 
> Pushed: https://gcc.gnu.org/g:14ba8d5b87acd5f91ab8b8c02165a0fd53dcc2f2
> 

Harald, Thanks for taking care of this commit.

Peter, welcome to the menagerie.

-- 
Steve


Re: [PATCH 1/5] btf: fixed type id in BTF_KIND_FUNC struct data.

2024-02-20 Thread David Faust



On 2/20/24 02:24, Cupertino Miranda wrote:
> This patch correct the aditition of +1 on the type id, which originally
> was done in the wrong location and leaded to func_sts->dtd_type for
> BTF_KIND_FUNCS struct data to contain the type id of the previous entry.
> 
> gcc/ChangeLog:
>   * btfout.cc (btf_collect_dataset): Corrected BTF type id.

OK, thanks.

> ---
>  gcc/btfout.cc | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
> index dcf751f8fe0d..7e114e224449 100644
> --- a/gcc/btfout.cc
> +++ b/gcc/btfout.cc
> @@ -457,7 +457,8 @@ btf_collect_datasec (ctf_container_ref ctfc)
>func_dtd->dtd_data.ctti_type = dtd->dtd_type;
>func_dtd->linkage = dtd->linkage;
>func_dtd->dtd_name = dtd->dtd_name;
> -  func_dtd->dtd_type = num_types_added + num_types_created;
> +  /* +1 for the sentinel type not in the types map.  */
> +  func_dtd->dtd_type = num_types_added + num_types_created + 1;
>  
>/* Only the BTF_KIND_FUNC type actually references the name. The
>BTF_KIND_FUNC_PROTO is always anonymous.  */
> @@ -480,8 +481,7 @@ btf_collect_datasec (ctf_container_ref ctfc)
>  
> struct btf_var_secinfo info;
>  
> -   /* +1 for the sentinel type not in the types map.  */
> -   info.type = func_dtd->dtd_type + 1;
> +   info.type = func_dtd->dtd_type;
>  
> /* Both zero at compile time.  */
> info.size = 0;


Re: [PATCH] Fortran: fix passing array component to polymorphic argument [PR105658]

2024-02-20 Thread Harald Anlauf

On 2/19/24 16:19, Peter Hill wrote:

Hi Harald,

Thanks for your help, please see the updated and signed-off patch below.


Pushed: https://gcc.gnu.org/g:14ba8d5b87acd5f91ab8b8c02165a0fd53dcc2f2



Re: [PATCH] rs6000: Update instruction counts due to combine changes [PR112103]

2024-02-20 Thread Peter Bergner
On 2/20/24 3:29 AM, Kewen.Lin wrote:
> on 2024/2/20 06:35, Peter Bergner wrote:
>> rs6000: Update instruction counts due to combine changes [PR112103]
>>
>> The PR91865 combine fix changed instruction counts slightly for rlwinm-0.c.
>> Adjust expected instruction counts accordingly.
>>
>> This passed on both powerpc64le-linux and powerpc64-linux running the
>> testsuite in both 32-bit and 64-bit modes.  Ok for trunk?
> 
> OK for trunk, thanks for fixing!

Ok, pushed.  Thanks.


>> FYI, I will open a new bug to track the removing of the superfluous
>> insns detected in PR112103.
> 
> Hope this test case will become not fragile any more once this filed
> issue gets fixed. :)

I think this will become less fragile after we fix PR114004 which is
the bug I opened to track fixing the superfluous insn that was emitted
that we found in this bug.  The fragility was due to the superfluous
insn being different before and after Roger's patch.  Once we don't
emit it anymore, this test case should be less fragile.

Peter



[PATCH v2] bpf: add inline memmove and memcpy expansion

2024-02-20 Thread David Faust
[Changes from v1: Jose's review comments, all of which I agree with.
 - Fix 'implments' typo in commit message.
 - Change check that alignment is CONST_INT to gcc_assert ().
 - Change default case in alignment switch to gcc_unreachable ().
 - Reword error message for non-constant size memmove/memcpy, and
   update test for the error accordingly.
 - Delete CPYMEM_EXPAND_ERR macro, since it was now only used in
   one place.  ]

BPF programs are not typically linked, which means we cannot fall back
on library calls to implement __builtin_{memmove,memcpy} and should
always expand them inline if possible.

GCC already successfully expands these builtins inline in many cases,
but failed to do so for a few for simple cases involving overlapping
memmove in the kernel BPF selftests and was instead emitting a libcall.

This patch implements a simple inline expansion of memcpy and memmove in
the BPF backend in a verifier-friendly way, with the caveat that the
size must be an integer constant, which is also required by clang.

Tested for bpf-unknown-none on x86_64-linux-gnu host.

Also tested against the BPF verifier by compiling and loading a test
program with overlapping memmove (essentially the memmove-1.c test)
which failed before due to a libcall, and now successfully loads and
passes the verifier.

gcc/

* config/bpf/bpf-protos.h (bpf_expand_cpymem): New.
* config/bpf/bpf.cc: (emit_move_loop, bpf_expand_cpymem): New.
* config/bpf/bpf.md: (cpymemdi, movmemdi): New define_expands.

gcc/testsuite/

* gcc.target/bpf/memcpy-1.c: New test.
* gcc.target/bpf/memmove-1.c: New test.
* gcc.target/bpf/memmove-2.c: New test.
---
 gcc/config/bpf/bpf-protos.h  |   2 +
 gcc/config/bpf/bpf.cc| 116 +++
 gcc/config/bpf/bpf.md|  36 +++
 gcc/testsuite/gcc.target/bpf/memcpy-1.c  |  26 +
 gcc/testsuite/gcc.target/bpf/memmove-1.c |  46 +
 gcc/testsuite/gcc.target/bpf/memmove-2.c |  23 +
 6 files changed, 249 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/bpf/memcpy-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/memmove-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/memmove-2.c

diff --git a/gcc/config/bpf/bpf-protos.h b/gcc/config/bpf/bpf-protos.h
index 46d950bd990..366acb87ae4 100644
--- a/gcc/config/bpf/bpf-protos.h
+++ b/gcc/config/bpf/bpf-protos.h
@@ -35,4 +35,6 @@ const char *bpf_add_core_reloc (rtx *operands, const char 
*templ);
 class gimple_opt_pass;
 gimple_opt_pass *make_pass_lower_bpf_core (gcc::context *ctxt);
 
+bool bpf_expand_cpymem (rtx *, bool);
+
 #endif /* ! GCC_BPF_PROTOS_H */
diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index d6ca47eeecb..5db031a4551 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -1184,6 +1184,122 @@ bpf_use_by_pieces_infrastructure_p (unsigned 
HOST_WIDE_INT size,
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   bpf_use_by_pieces_infrastructure_p
 
+/* Helper for bpf_expand_cpymem.  Emit an unrolled loop moving the bytes
+   from SRC to DST.  */
+
+static void
+emit_move_loop (rtx src, rtx dst, machine_mode mode, int offset, int inc,
+   unsigned iters, unsigned remainder)
+{
+  rtx reg = gen_reg_rtx (mode);
+
+  /* First copy in chunks as large as alignment permits.  */
+  for (unsigned int i = 0; i < iters; i++)
+{
+  emit_move_insn (reg, adjust_address (src, mode, offset));
+  emit_move_insn (adjust_address (dst, mode, offset), reg);
+  offset += inc;
+}
+
+  /* Handle remaining bytes which might be smaller than the chunks
+ used above.  */
+  if (remainder & 4)
+{
+  emit_move_insn (reg, adjust_address (src, SImode, offset));
+  emit_move_insn (adjust_address (dst, SImode, offset), reg);
+  offset += (inc < 0 ? -4 : 4);
+  remainder -= 4;
+}
+  if (remainder & 2)
+{
+  emit_move_insn (reg, adjust_address (src, HImode, offset));
+  emit_move_insn (adjust_address (dst, HImode, offset), reg);
+  offset += (inc < 0 ? -2 : 2);
+  remainder -= 2;
+}
+  if (remainder & 1)
+{
+  emit_move_insn (reg, adjust_address (src, QImode, offset));
+  emit_move_insn (adjust_address (dst, QImode, offset), reg);
+}
+}
+
+/* Expand cpymem/movmem, as from __builtin_memcpy/memmove.
+   OPERANDS are the same as the cpymem/movmem patterns.
+   IS_MOVE is true if this is a memmove, false for memcpy.
+   Return true if we successfully expanded, or false if we cannot
+   and must punt to a libcall.  */
+
+bool
+bpf_expand_cpymem (rtx *operands, bool is_move)
+{
+  /* Size must be constant for this expansion to work.  */
+  if (!CONST_INT_P (operands[2]))
+{
+  const char *name = is_move ? "memmove" : "memcpy";
+  if (flag_building_libgcc)
+   warning (RTL_LOCATION (operands[2]),
+"could not expand call to %<__builtin_%s%> inline: "
+"size must be constant", name);
+  else
+ 

Re: [PATCH][_GLIBCXX_DEBUG] Fix std::__niter_base behavior

2024-02-20 Thread Jonathan Wakely
On Tue, 20 Feb 2024 at 18:43, François Dumont wrote:
>
>libstdc++: [_GLIBCXX_DEBUG] Fix std::__niter_wrap behavior
>
> In _GLIBCXX_DEBUG mode the std::__niter_base can remove 2 layers, the
> __gnu_debug::_Safe_iterator<> and the __gnu_cxx::__normal_iterator<>.
> When std::__niter_wrap is called to build a __gnu_debug::_Safe_iterator<>
> from a __gnu_cxx::__normal_iterator<> we then have a consistency issue
> as the difference between the 2 iterators will done on a __normal_iterator
> on one side and a C pointer on the other. To avoid this problem call
> std::__niter_base on both input iterators.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/stl_algobase.h (std::__niter_wrap): Add a call to
> std::__niter_base on res iterator.
>
> Tested under Linux x86_64 normal and _GLIBCXX_DEBUG modes in c++98, c++11, 
> c++17.
>
> Ok to commit ?
>

OK, thanks.



Re: [PATCH][_GLIBCXX_DEBUG] Fix std::__niter_base behavior

2024-02-20 Thread François Dumont

   libstdc++: [_GLIBCXX_DEBUG] Fix std::__niter_wrap behavior

    In _GLIBCXX_DEBUG mode the std::__niter_base can remove 2 layers, the
    __gnu_debug::_Safe_iterator<> and the __gnu_cxx::__normal_iterator<>.
    When std::__niter_wrap is called to build a 
__gnu_debug::_Safe_iterator<>

    from a __gnu_cxx::__normal_iterator<> we then have a consistency issue
    as the difference between the 2 iterators will done on a 
__normal_iterator

    on one side and a C pointer on the other. To avoid this problem call
    std::__niter_base on both input iterators.

    libstdc++-v3/ChangeLog:

    * include/bits/stl_algobase.h (std::__niter_wrap): Add a 
call to

    std::__niter_base on res iterator.

Tested under Linux x86_64 normal and _GLIBCXX_DEBUG modes in c++98, 
c++11, c++17.


Ok to commit ?

François


On 19/02/2024 09:21, Jonathan Wakely wrote:



On Mon, 19 Feb 2024, 08:12 Jonathan Wakely,  wrote:



On Mon, 19 Feb 2024, 07:08 Stephan Bergmann, 
wrote:

On 2/17/24 15:14, François Dumont wrote:
> Thanks for the link, tested and committed.

I assume this is the cause for the below failure now,


Yes, the new >= C++11 overload of __niter_base recursively unwraps
multiple layers of wrapping, so that a safe iterator wrapping a
normal iterator wrapping a pointer is unwrapped to just a pointer.
But then __niter_wrap doesn't restore both layers.



Actually that's not the problem. __niter_wrap would restore both 
layers, except that it uses __niter_base itself:


>   347 |     { return __from + (__res - std::__niter_base(__from)); }
>       |  ~~~^~~~

And it seems to be getting called with the wrong types. Maybe that's 
just a bug in std:: erase or maybe niter_wrap needs adjusting.


I'll check in a couple of hours if François doesn't get to it first.

I have to wonder how this wasn't caught by existing tests though.


diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index 0f73da13172..d534e02871f 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -344,7 +344,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX20_CONSTEXPR
 inline _From
 __niter_wrap(_From __from, _To __res)
-{ return __from + (__res - std::__niter_base(__from)); }
+{ return __from + (std::__niter_base(__res) - std::__niter_base(__from)); }
 
   // No need to wrap, iterator already has the right type.
   template


PATCH 11/11] rs6000, make test vec-cmpne.c a runnable test

2024-02-20 Thread Carl Love
 GCC maintainers:

The patch changes the  vec-cmpne.c from a compile only test to a runnable test. 
 The macros to create the functions needed to test the built-ins and verify the 
restults are all there in the include file.  The .c file just needed to have 
the macro definitions inserted and change the header from compile to run.  The 
test can now do functional verification of the results in addition to verifying 
the expected instructions are generated.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

rs6000, make test vec-cmpne.c a runnable test

The macros in vec-cmpne.h define test functions.  They also setup
test value functions, verification functions and execute test functions.
The test is setup as a compile only test so none of the verification and
execute functions are being used.

The patch adds the macro definitions to create the intialization,
verfiy and execute functions to a main program so not only can the
test verify the correct instructions are generated but also run the
tests and verify the results.  The test is then changed from a compile
to a run test.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-cmple.c (main): Add main function with
macro calls to define the test functions, create the verify
functions and execute functions.
Update scan-assembler-times (vcmpequ): Updated count to include
instructions used to generate expected test results.
* gcc.target/powerpc/vec-cmple.h (vector_tests_##NAME): Remove
line continuation after closing bracket.  Remove extra blank line.
---
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.c | 41 +++-
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h |  3 +-
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c 
b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
index b57e0ac8638..2c369976a44 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
@@ -1,20 +1,41 @@
-/* { dg-do compile } */
+/* { dg-do run } */
 /* { dg-require-effective-target powerpc_altivec_ok } */
-/* { dg-options "-maltivec -O2" } */
+/* { dg-options "-maltivec -O2 -save-temps" } */
 
 /* Test that the vec_cmpne builtin generates the expected Altivec
instructions.  */
 
 #include "vec-cmpne.h"
 
-define_test_functions (int, signed int, signed int, si);
-define_test_functions (int, unsigned int, unsigned int, ui);
-define_test_functions (short, signed short, signed short, ss);
-define_test_functions (short, unsigned short, unsigned short, us);
-define_test_functions (char, signed char, signed char, sc);
-define_test_functions (char, unsigned char, unsigned char, uc);
-define_test_functions (int, signed int, float, ff);
+int main ()
+{
+  define_test_functions (int, signed int, signed int, si);
+  define_test_functions (int, unsigned int, unsigned int, ui);
+  define_test_functions (short, signed short, signed short, ss);
+  define_test_functions (short, unsigned short, unsigned short, us);
+  define_test_functions (char, signed char, signed char, sc);
+  define_test_functions (char, unsigned char, unsigned char, uc);
+  define_test_functions (int, signed int, float, ff);
+
+  define_init_verify_functions (int, signed int, signed int, si);
+  define_init_verify_functions (int, unsigned int, unsigned int, ui);
+  define_init_verify_functions (short, signed short, signed short, ss);
+  define_init_verify_functions (short, unsigned short, unsigned short, us);
+  define_init_verify_functions (char, signed char, signed char, sc);
+  define_init_verify_functions (char, unsigned char, unsigned char, uc);
+  define_init_verify_functions (int, signed int, float, ff);
+
+  execute_test_functions (int, signed int, signed int, si);
+  execute_test_functions (int, unsigned int, unsigned int, ui);
+  execute_test_functions (short, signed short, signed short, ss);
+  execute_test_functions (short, unsigned short, unsigned short, us);
+  execute_test_functions (char, signed char, signed char, sc);
+  execute_test_functions (char, unsigned char, unsigned char, uc);
+  execute_test_functions (int, signed int, float, ff);
+
+  return 0;
+}
 
 /* { dg-final { scan-assembler-times {\mvcmpequb\M}  2 } } */
 /* { dg-final { scan-assembler-times {\mvcmpequh\M}  2 } } */
-/* { dg-final { scan-assembler-times {\mvcmpequw\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequw\M}  32 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h 
b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
index a304de01d86..374cca360b3 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
+++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
@@ -33,7 +33,7 @@ __attribute__((noinline)) void vector_tests_##NAME () \
   tmp_##NAME = vec_cmpne (v1_##NAME, v2_##NAME); \
 

PATCH 10/11] rs6000, add test cases for __builtin_vec_init* and, __builtin_vec_set*

2024-02-20 Thread Carl Love
GCC maintainers:

The patch adds test cases for the __builtin_vec_init* and __builtin_vec_set* 
built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 


rs6000, add test cases for __builtin_vec_init* and __builtin_vec_set*

Add test cases for the following built-ins:

__builtin_vec_init_v1ti
__builtin_vec_init_v2df
__builtin_vec_init_v2di
__builtin_vec_set_v1ti
__builtin_vec_set_v2df
__builtin_vec_set_v2di

Note, the above built-ins are documented in extend.texi.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-21.c: New test file.
---
 .../gcc.target/powerpc/vsx-builtin-21.c   | 181 ++
 1 file changed, 181 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c
new file mode 100644
index 000..b7e1201f37e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c
@@ -0,0 +1,181 @@
+/* { dg-do run { target int128 } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-mvsx" } */
+
+/* This test should run the same on any target that supports vsx
+   instructions.  Intentionally not specifying cpu in order to test
+   all code generation paths.  */
+
+#define DEBUG 0
+
+#include 
+
+#if DEBUG
+#include 
+#include 
+
+void print_i128 (__int128_t val)
+{
+  printf(" %lld %llu (0x%llx %llx)",
+(signed long long)(val >> 64),
+(unsigned long long)(val & 0x),
+(unsigned long long)(val >> 64),
+(unsigned long long)(val & 0x));
+}
+#endif
+
+void abort (void);
+
+void test_vec_init_v1ti (__int128_t ti_arg,
+vector __int128_t v1ti_expected_result)
+{
+  vector __int128_t v1ti_result;
+
+  v1ti_result = __builtin_vec_init_v1ti (ti_arg);
+  if (v1ti_result[0] != v1ti_expected_result[0])
+{
+#if DEBUG
+   printf ("test_vec_init_v1ti: v1ti_result[0] = ");
+   print_i128 (v1ti_result[0]);
+   printf( "vf_expected_result[0] = ");
+   print_i128 (v1ti_expected_result[0]);
+   printf("\n");
+#else
+   abort();
+#endif
+}
+}
+
+void test_vec_init_v2df (double d_arg1, double d_arg2,
+vector double v2df_expected_result)
+{
+  vector double v2df_result;
+  int i;
+
+  v2df_result = __builtin_vec_init_v2df (d_arg1, d_arg2);
+
+  for ( i= 0; i < 2; i++)
+if (v2df_result[i] != v2df_expected_result[i])
+#if DEBUG
+  printf ("test_vec_init_v2df: v2df_result[%d] = %f, 
v2df_expected_result[%d] = %f\n",
+ i, v2df_result[i], i, v2df_expected_result[i]);
+#else
+   abort();
+#endif
+}
+
+void test_vec_init_v2di (signed long long sl_arg1, signed long long sl_arg2,
+vector signed long long v2di_expected_result)
+{
+  vector signed long long v2di_result;
+  int i;
+
+  v2di_result = __builtin_vec_init_v2di (sl_arg1, sl_arg2);
+
+  for ( i= 0; i < 2; i++)
+if (v2di_result[i] != v2di_expected_result[i])
+#if DEBUG
+  printf ("test_vec_init_v2di: v2di_result[%d] = %lld, 
v2df_expected_result[%d] = %lld\n",
+ i, v2di_result[i], i, v2di_expected_result[i]);
+#else
+   abort();
+#endif
+}
+
+void test_vec_set_v1ti (vector __int128_t v1ti_arg, __int128_t ti_arg,
+   vector __int128_t v1ti_expected_result)
+{
+  vector __int128_t v1ti_result;
+
+  v1ti_result = __builtin_vec_set_v1ti (v1ti_arg, ti_arg, 0);
+  if (v1ti_result[0] != v1ti_expected_result[0])
+{
+#if DEBUG
+   printf ("test_vec_set_v1ti: v1ti_result[0] = ");
+   print_i128 (v1ti_result[0]);
+   printf( "vf_expected_result[0] = ");
+   print_i128 (v1ti_expected_result[0]);
+   printf("\n");
+#else
+   abort();
+#endif
+}
+}
+
+void test_vec_set_v2df (vector double v2df_arg, double d_arg,
+   vector double v2df_expected_result)
+{
+  vector double v2df_result;
+  int i;
+
+  v2df_result = __builtin_vec_set_v2df (v2df_arg, d_arg, 0);
+
+  for ( i= 0; i < 2; i++)
+if (v2df_result[i] != v2df_expected_result[i])
+#if DEBUG
+  printf ("test_vec_set_v2df: v2df_result[%d] = %f, 
v2df_expected_result[%d] = %f\n",
+ i, v2df_result[i], i, v2df_expected_result[i]);
+#else
+   abort();
+#endif
+}
+
+void test_vec_set_v2di (vector signed long long v2di_arg, signed long long 
sl_arg,
+   vector signed long long v2di_expected_result)
+{
+  vector signed long long v2di_result;
+  int i;
+
+  v2di_result = __builtin_vec_set_v2di (v2di_arg, sl_arg, 1);
+
+  for ( i= 0; i < 2; i++)
+if (v2di_result[i] != v2di_expected_result[i])
+#if DEBUG
+  printf ("test_vec_set_v2di: v2di_result[%d] = %lld, 
v2df_expected_result[%d] = %lld\n",
+ i, v2di_result[i], i, 

[PATCH 09/11] rs6000, add test cases for the vec_cmpne built-ins

2024-02-20 Thread Carl Love
GCC maintainers:

The patch adds test cases for the vec_cmpne of built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

rs6000, add test cases for the vec_cmpne built-ins

Add test cases for the signed int, unsigned it, signed short, unsigned
short, signed char and unsigned char built-ins.

Note, the built-ins are documented in the Power Vector Instrinsic
Programing reference manual.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-cmple.c: New test case.
* gcc.target/powerpc/vec-cmple.h: New test case include file.
---
 gcc/testsuite/gcc.target/powerpc/vec-cmple.c | 35 
 gcc/testsuite/gcc.target/powerpc/vec-cmple.h | 84 
 2 files changed, 119 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmple.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmple.h

diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmple.c 
b/gcc/testsuite/gcc.target/powerpc/vec-cmple.c
new file mode 100644
index 000..766a1c770e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-cmple.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2" } */
+
+/* Test that the vec_cmpne builtin generates the expected Altivec
+   instructions.  */
+
+#include "vec-cmple.h"
+
+int main ()
+{
+  /* Note macro expansions for "signed long long int" and
+ "unsigned long long int" do not work for the vec_vsx_ld builtin.  */
+  define_test_functions (int, signed int, signed int, si);
+  define_test_functions (int, unsigned int, unsigned int, ui);
+  define_test_functions (short, signed short, signed short, ss);
+  define_test_functions (short, unsigned short, unsigned short, us);
+  define_test_functions (char, signed char, signed char, sc);
+  define_test_functions (char, unsigned char, unsigned char, uc);
+
+  define_init_verify_functions (int, signed int, signed int, si);
+  define_init_verify_functions (int, unsigned int, unsigned int, ui);
+  define_init_verify_functions (short, signed short, signed short, ss);
+  define_init_verify_functions (short, unsigned short, unsigned short, us);
+  define_init_verify_functions (char, signed char, signed char, sc);
+  define_init_verify_functions (char, unsigned char, unsigned char, uc);
+
+  execute_test_functions (int, signed int, signed int, si);
+  execute_test_functions (int, unsigned int, unsigned int, ui);
+  execute_test_functions (short, signed short, signed short, ss);
+  execute_test_functions (short, unsigned short, unsigned short, us);
+  execute_test_functions (char, signed char, signed char, sc);
+  execute_test_functions (char, unsigned char, unsigned char, uc);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmple.h 
b/gcc/testsuite/gcc.target/powerpc/vec-cmple.h
new file mode 100644
index 000..4126706b99a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-cmple.h
@@ -0,0 +1,84 @@
+#include "altivec.h"
+
+#define N 4096
+
+#include 
+void abort ();
+
+#define PRAGMA(X) _Pragma (#X)
+#define UNROLL0 PRAGMA (GCC unroll 0)
+
+#define define_test_functions(VBTYPE, RTYPE, STYPE, NAME)  \
+\
+RTYPE result_le_##NAME[N] __attribute__((aligned(16))); \
+STYPE operand1_##NAME[N] __attribute__((aligned(16))); \
+STYPE operand2_##NAME[N] __attribute__((aligned(16))); \
+RTYPE expected_##NAME[N] __attribute__((aligned(16))); \
+\
+__attribute__((noinline)) void vector_tests_##NAME () \
+{ \
+  vector STYPE v1_##NAME, v2_##NAME; \
+  vector bool VBTYPE tmp_##NAME; \
+  int i; \
+  UNROLL0 \
+  for (i = 0; i < N; i+=16/sizeof (STYPE)) \
+{ \
+  /* result_le = operand1!=operand2.  */ \
+  v1_##NAME = vec_vsx_ld (0, (const vector STYPE*)_##NAME[i]); \
+  v2_##NAME = vec_vsx_ld (0, (const vector STYPE*)_##NAME[i]); \
+\
+  tmp_##NAME = vec_cmple (v1_##NAME, v2_##NAME); \
+  vec_vsx_st (tmp_##NAME, 0, _le_##NAME[i]); \
+} \
+}
+
+#define define_init_verify_functions(VBTYPE, RTYPE, STYPE, NAME)   \
+__attribute__((noinline)) void init_##NAME () \
+{ \
+  int i; \
+  for (i = 0; i < N; ++i) \
+{ \
+  result_le_##NAME[i] = 7; \
+  if (i%3 == 0) \
+   { \
+ /* op1 < op2.  */ \
+ operand1_##NAME[i] = 1; \
+ operand2_##NAME[i] = 2; \
+   } \
+  else if (i%3 == 1) \
+   { \
+ /* op1 > op2.  */ \
+ operand1_##NAME[i] = 2; \
+ operand2_##NAME[i] = 1; \
+   } \
+  else if (i%3 == 2) \
+   { \
+ /* op1 == op2.  */ \
+ operand1_##NAME[i] = 3; \
+ operand2_##NAME[i] = 3; \
+   } \
+  /* For vector comparisons: "For each element of the result_le, the \
+ value of each bit is 1 if the corresponding elements of ARG1 and \
+ ARG2 are equal." {or whatever the comparison is} 

[PATCH 07/11] rs6000, __builtin_vsx_xvcmpeq[sp, dp, sp_p] add, documentation and test case

2024-02-20 Thread Carl Love


 GCC maintainers:

The patch adds documentation and test case for the  __builtin_vsx_xvcmpeq[sp, 
dp, sp_p] built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 


rs6000, __builtin_vsx_xvcmpeq[sp, dp, sp_p] add documentation and test case

Add a test case for the __builtin_vsx_xvcmpeqsp_p built-in.

Add documentation for the __builtin_vsx_xvcmpeqsp_p,
__builtin_vsx_xvcmpeqdp, and __builtin_vsx_xvcmpeqsp builtins.

gcc/ChangeLog:
* doc/extend.texi (__builtin_vsx_xvcmpeqsp_p,
__builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpeqsp): Add
documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-runnable-4.c: New test case.
---
 gcc/doc/extend.texi   |  23 +++
 .../powerpc/vsx-builtin-runnable-4.c  | 135 ++
 2 files changed, 158 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 22f67ebab31..87fd30bfa9e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22700,6 +22700,18 @@ vectors of their defined type.  The corresponding 
result element is set to
 all ones if the two argument elements are less than or equal and all zeros
 otherwise.
 
+@smallexample
+const vf __builtin_vsx_xvcmpeqsp (vf, vf);
+const vd __builtin_vsx_xvcmpeqdp (vd, vd);
+@end smallexample
+
+The builti-ins @code{__builtin_vsx_xvcmpeqdp} and
+@code{__builtin_vsx_xvcmpeqdp} compare two floating point vectors and return
+a vector.  If the corresponding elements are equal then the corresponding
+vector element of the result is set to all ones, it is set to all zeros
+otherwise.
+
+
 @node PowerPC AltiVec Built-in Functions Available on ISA 2.07
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07
 
@@ -23989,6 +24001,17 @@ is larger than 128 bits, the result is undefined.
 The result is the modulo result of dividing the first input  by the second
 input.
 
+@smallexample
+const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
+@end smallexample
+
+The first argument of the builti-in @code{__builtin_vsx_xvcmpeqdp_p} is an
+integer in the range of 0 to 1.  The second and third arguments are floating
+point vectors to be compared.  The result is 1 if the first argument is a 1
+and one or more of the corresponding vector elements are equal.  The result is
+1 if the first argument is 0 and all of the corresponding vector elements are
+not equal.  The result is zero otherwise.
+
 The following builtins perform 128-bit vector comparisons.  The
 @code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is
 one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c
new file mode 100644
index 000..8ac07c7c807
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c
@@ -0,0 +1,135 @@
+/* { dg-do run { target { power10_hw } } } */
+/* { dg-do link { target { ! power10_hw } } } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -save-temps" } */
+/* { dg-require-effective-target power10_ok } */
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+#include 
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+  int result;
+  vector float vf_arg1, vf_arg2;
+  vector double d_arg1, d_arg2;
+
+  /* Compare vectors with one equal element, check
+ for all elements unequal, i.e. first arg is 1.  */
+  vf_arg1 = (vector float) {1.0, 2.0, 3.0, 4.0};
+  vf_arg2 = (vector float) {1.0, 3.0, 2.0, 8.0};
+  result = __builtin_vsx_xvcmpeqsp_p (1, vf_arg1, vf_arg2);
+
+#if DEBUG
+  printf("result = 0x%x\n", (unsigned int) result);
+#endif
+
+  if (result != 1)
+for (i = 0; i < 4; i++)
+#if DEBUG
+  printf("ERROR, __builtin_vsx_xvcmpeqsp_p 1: arg 1 = 1, varg3[%d] = %f, 
varg3[%d] = %f\n",
+i, vf_arg1[i], i, vf_arg2[i]);
+#else
+  abort();
+#endif
+  /* Compare vectors with one equal element, check
+ for all elements unequal, i.e. first arg is 0.  */
+  vf_arg1 = (vector float) {1.0, 2.0, 3.0, 4.0};
+  vf_arg2 = (vector float) {1.0, 3.0, 2.0, 8.0};
+  result = __builtin_vsx_xvcmpeqsp_p (0, vf_arg1, vf_arg2);
+
+#if DEBUG
+  printf("result = 0x%x\n", (unsigned int) result);
+#endif
+
+  if (result != 0)
+for (i = 0; i < 4; i++)
+#if DEBUG
+  printf("ERROR, __builtin_vsx_xvcmpeqsp_p 2: arg 1 = 0, varg3[%d] = %f, 
varg3[%d] = %f\n",
+i, vf_arg1[i], i, vf_arg2[i]);
+#else
+  abort();
+#endif
+
+  /* Compare vectors with all unequal elements, check
+ for all elements unequal, i.e. first arg is 1.  */
+  vf_arg1 = (vector float) {1.0, 2.0, 3.0, 4.0};
+  vf_arg2 = (vector float) {8.0, 3.0, 2.0, 8.0};
+  result = __builtin_vsx_xvcmpeqsp_p 

[PATCH 03/11] rs6000, remove duplicated built-ins

2024-02-20 Thread Carl Love
GCC maintainers:

There are a number of undocumented built-ins that are duplicates of other 
documented built-ins.  This patch removes the duplicates so users will only use 
the documented built-in.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

-

rs6000, remove duplicated built-ins

The following undocumented built-ins are same as existing documented
overloaded builtins.

  const vf __builtin_vsx_xxmrghw (vf, vf);
same as  vf __builtin_vec_mergeh (vf, vf);  (overloaded vec_mergeh)

  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
same as vsi __builtin_vec_mergeh (vsi, vsi);   (overloaded vec_mergeh)

  const vf __builtin_vsx_xxmrglw (vf, vf);
same as vf __builtin_vec_mergel (vf, vf);  (overloaded vec_mergel)

  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
same as vsi __builtin_vec_mergel (vsi, vsi);   (overloaded vec_mergel)

  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
same as vsc __builtin_vec_sel (vsc, vsc, vuc);  (overloaded vec_sel)

  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
same as vuc __builtin_vec_sel (vuc, vuc, vuc);  (overloaded vec_sel)

  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
same as  vd __builtin_vec_sel (vd, vd, vull);   (overloaded vec_sel)

  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
same as vsll __builtin_vec_sel (vsll, vsll, vsll);  (overloaded vec_sel)

  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
same as vull __builtin_vec_sel (vull, vull, vsll);  (overloaded vec_sel)

  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
same as vf __builtin_vec_sel (vf, vf, vsi)  (overloaded vec_sel)

  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
same as vsi __builtin_vec_sel (vsi, vsi, vbi);  (overloaded vec_sel)

  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
same as vui __builtin_vec_sel (vui, vui, vui);  (overloaded vec_sel)

  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
same as vss __builtin_vec_sel (vss, vss, vbs);  (overloaded vec_sel)

  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
same as vus __builtin_vec_sel (vus, vus, vus);  (overloaded vec_sel)

This patch removed the duplicate built-in definitions so only the
documented built-ins will be available for use.  The case statements in
rs6000_gimple_fold_builtin that ar no longer needed are also removed.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw,
__builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw,
__builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_16qi_uns, __builtin_vsx_xxsel_2df,
__builtin_vsx_xxsel_2di, __builtin_vsx_xxsel_2di_uns,
__builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_4si_uns, __builtin_vsx_xxsel_8hi,
__builtin_vsx_xxsel_8hi_uns): Removed built-in definition.
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin):
remove case entries RS6000_BIF_XXMRGLW_4SI,
RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI,
RS6000_BIF_XXMRGHW_4SF.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df): Remove test
cases for removed built-ins.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  4 --
 gcc/config/rs6000/rs6000-builtins.def | 42 ---
 .../gcc.target/powerpc/vsx-builtin-3.c|  6 ---
 3 files changed, 52 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 6698274031b..e436cbe4935 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2110,20 +2110,16 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 /* vec_mergel (integrals).  */
 case RS6000_BIF_VMRGLH:
 case RS6000_BIF_VMRGLW:
-case RS6000_BIF_XXMRGLW_4SI:
 case RS6000_BIF_VMRGLB:
 case RS6000_BIF_VEC_MERGEL_V2DI:
-case RS6000_BIF_XXMRGLW_4SF:
 case RS6000_BIF_VEC_MERGEL_V2DF:
   fold_mergehl_helper (gsi, stmt, 1);
   return true;
 /* vec_mergeh (integrals).  */
 case RS6000_BIF_VMRGHH:
 case RS6000_BIF_VMRGHW:
-case RS6000_BIF_XXMRGHW_4SI:
 case RS6000_BIF_VMRGHB:
 case RS6000_BIF_VEC_MERGEH_V2DI:
-case RS6000_BIF_XXMRGHW_4SF:
 case RS6000_BIF_VEC_MERGEH_V2DF:
   fold_mergehl_helper (gsi, stmt, 0);
   return true;
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index fd316f629e5..96d095da2cb 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1925,18 +1925,6 @@
   const signed int __builtin_vsx_xvtsqrtsp_fg (vf);
 XVTSQRTSP_FG vsx_tsqrtv4sf2_fg {}
 

[PATCH 08/11] rs6000, add tests and documentation for various, built-ins

2024-02-20 Thread Carl Love
 
 GCC maintainers:

The patch adds documentation a number of built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

 rs6000, add tests and documentation for various built-ins

This patch adds a test case and documentation in extend.texi for the
following built-ins:

__builtin_altivec_fix_sfsi
__builtin_altivec_fixuns_sfsi
__builtin_altivec_float_sisf
__builtin_altivec_uns_float_sisf
__builtin_altivec_vrsqrtfp
__builtin_altivec_mask_for_load
__builtin_altivec_vsel_1ti
__builtin_altivec_vsel_1ti_uns
__builtin_vec_init_v16qi
__builtin_vec_init_v4sf
__builtin_vec_init_v4si
__builtin_vec_init_v8hi
__builtin_vec_set_v16qi
__builtin_vec_set_v4sf
__builtin_vec_set_v4si
__builtin_vec_set_v8hi

gcc/ChangeLog:
* doc/extend.texi (__builtin_altivec_fix_sfsi,
__builtin_altivec_fixuns_sfsi, __builtin_altivec_float_sisf,
__builtin_altivec_uns_float_sisf, __builtin_altivec_vrsqrtfp,
__builtin_altivec_mask_for_load, __builtin_altivec_vsel_1ti,
__builtin_altivec_vsel_1ti_uns, __builtin_vec_init_v16qi,
__builtin_vec_init_v4sf, __builtin_vec_init_v4si,
__builtin_vec_init_v8hi, __builtin_vec_set_v16qi,
__builtin_vec_set_v4sf, __builtin_vec_set_v4si,
__builtin_vec_set_v8hi): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-38.c: New test case.
---
 gcc/doc/extend.texi   |  98 
 gcc/testsuite/gcc.target/powerpc/altivec-38.c | 503 ++
 2 files changed, 601 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-38.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 87fd30bfa9e..89d0a1f77b0 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22678,6 +22678,104 @@ if the VSX instruction set is available.  The 
@samp{vec_vsx_ld} and
 @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
 
+@smallexample
+vector signed int __builtin_altivec_fix_sfsi (vector float);
+vector signed int __builtin_altivec_fixuns_sfsi (vector float);
+vector float __builtin_altivec_float_sisf (vector int);
+vector float __builtin_altivec_uns_float_sisf (vector int);
+vector float __builtin_altivec_vrsqrtfp (vector float);
+@end smallexample
+
+The @code{__builtin_altivec_fix_sfsi} converts a vector of single precision
+floating point values to a vector of signed integers with round to zero.
+
+The @code{__builtin_altivec_fixuns_sfsi} converts a vector of single precision
+floating point values to a vector of unsigned integers with round to zero.  If
+the rounded floating point value is less then 0 the result is 0 and VXCVI
+is set to 1.
+
+The @code{__builtin_altivec_float_sisf} converts a vector of single precision
+signed integers to a vector of floating point values using the rounding mode
+specified by RN.
+
+The @code{__builtin_altivec_uns_float_sisf} converts a vector of single
+precision unsigned integers to a vector of floating point values using the
+rounding mode specified by RN.
+
+The @code{__builtin_altivec_vrsqrtfp} returns a vector of floating point
+estimates of the reciprical square root of each floating point source vector
+element.
+
+@smallexample
+vector signed char test_altivec_mask_for_load (const void *);
+@end smallexample
+
+The @code{__builtin_altivec_vrsqrtfp} returns a vector mask based on the
+bottom four bits of the argument.  Let X be the 32-byte value:
+0x00 || 0x01 || 0x02 || ... || 0x1D || 0x1E || 0x1F.
+Bytes sh to sh+15 are returned where sh is given by the least significant 4
+bit of the argument. See description of lvsl, lvsr instructions.
+
+@smallexample
+vector signed __int128 __builtin_altivec_vsel_1ti (vector signed __int128,
+   vector signed __int128,
+   vector unsigned __int128);
+vector unsigned __int128
+  __builtin_altivec_vsel_1ti_uns (vector unsigned __int128,
+  vector unsigned __int128,
+  vector unsigned __int128)
+@end smallexample
+
+Let the arguments of @code{__builtin_altivec_vsel_1ti} and
+@code{__builtin_altivec_vsel_1ti_uns} be src1, src2, mask.  The result is
+given by (src1 & ~mask) | (src2 & mask).
+
+@smallexample
+vector signed char
+__builtin_vec_init_v16qi (signed char, signed char, signed char, signed char,
+  signed char, signed char, signed char, signed char,
+  signed char, signed char, signed char, signed char,
+  signed char, signed char, signed char, signed char);
+
+vector short int __builtin_vec_init_v8hi (short int, short int, short int,
+  short int, short int, short int,
+  short int, short int);

[PATCH 04/11] rs6000, Update comment for the __builtin_vsx_vper*, built-ins.

2024-02-20 Thread Carl Love
GCC maintainers:

The patch expands an existing comment to document that the duplicates are 
covered by an overloaded built-in.  I am wondering if we should just go ahead 
and remove the duplicates?

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

-
rs6000, Update comment for the __builtin_vsx_vper* built-ins.

There is a comment about the __builtin_vsx_vper* built-ins being
duplicates of the __builtin_altivec_* built-ins.  The note says we
should consider deprecation/removeal of the __builtin_vsx_vper*.  Add a
note that the _builtin_vsx_vper* built-ins are covered by the overloaded
vec_perm built-ins which use the __builtin_altivec_* built-in definitions.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def ( __builtin_vsx_vperm_*):
Add comment to existing comment about the built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 96d095da2cb..4c95429f137 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1556,6 +1556,14 @@
 ; These are duplicates of __builtin_altivec_* counterparts, and are being
 ; kept for backwards compatibility.  The reason for their existence is
 ; unclear.  TODO: Consider deprecation/removal at some point.
+; Note, __builtin_vsx_vperm_16qi, __builtin_vsx_vperm_16qi_uns,
+; __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_v1ti_uns,
+; __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di, __builtin_vsx_vperm_2di,
+; __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
+; __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns,
+; __builtin_vsx_vperm_8hi, __builtin_altivec_vperm_8hi_uns
+; are all covered by the overloaded vec_perm built-in which uses the
+; __builtin_altivec_* built-in definitions.
   const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc);
 VPERM_16QI_X altivec_vperm_v16qi {}
 
-- 
2.43.0



[PATCH 06/11] rs6000, __builtin_vsx_xxpermdi_1ti add documentation, and test case

2024-02-20 Thread Carl Love
GCC maintainers:

The patch adds documentation and test case for the __builtin_vsx_xxpermdi_1ti 
built-in.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 


rs6000, __builtin_vsx_xxpermdi_1ti add documentation and test case

Add documentation to the extend.texi file for the
__builtin_vsx_xxpermdi_1ti built-in.

Add test cases for the __builtin_vsx_xxpermdi_1ti built-in.

gcc/ChangeLog:
* doc/extend.texi (__builtin_vsx_xxpermdi_1ti): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-runnable-3.c: New test case.
---
 gcc/doc/extend.texi   |  7 +++
 .../powerpc/vsx-builtin-runnable-3.c  | 48 +++
 2 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 83eed9e334b..22f67ebab31 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21508,6 +21508,13 @@ vector __int128  __builtin_vsx_xxpermdi_1ti (vector 
__int128, vector __int128,
 const int);
 
 @end smallexample
+
+The  @code{__builtin_vsx_xxpermdi_1ti} Let srcA[127:0] be the 128-bit first
+argument and srcB[127:0] be the 128-bit second argument.  Let sel[1:0] be the
+least significant bits of the const int argument (third input argument).  The
+result bits [127:64] is srcB[127:64] if  sel[1] = 0, srcB[63:0] otherwise.  The
+result bits [63:0] is srcA[127:64] if  sel[0] = 0, srcA[63:0] otherwise.
+
 @node Basic PowerPC Built-in Functions Available on ISA 2.07
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.07
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c
new file mode 100644
index 000..ba287597cec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { lp64 } } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+#include 
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector signed __int128 vsq_arg1, vsq_arg2, vsq_result, vsq_expected_result;
+
+  vsq_arg1[0] = (__int128) 0x;
+  vsq_arg1[0] = vsq_arg1[0] << 64 | (__int128) 0x;
+  vsq_arg2[0] = (__int128) 0x1100110011001100;
+  vsq_arg2[0] = (vsq_arg2[0]  << 64) | (__int128) 0x;
+
+  vsq_expected_result[0] = (__int128) 0x;
+  vsq_expected_result[0] = (vsq_expected_result[0] << 64)
+| (__int128) 0x;
+
+  vsq_result = __builtin_vsx_xxpermdi_1ti (vsq_arg1, vsq_arg2, 2);
+
+  if (vsq_result[0] != vsq_expected_result[0])
+{
+#if DEBUG
+   printf("ERROR, __builtin_vsx_xxpermdi_1ti: vsq_result = 0x%016llx 
%016llx\n",
+ (unsigned long long) (vsq_result[0] >> 64),
+ (unsigned long long) vsq_result[0]);
+   printf(" vsq_expected_resultd = 0x%016llx 
%016llx\n",
+ (unsigned long long)(vsq_expected_result[0] >> 64),
+ (unsigned long long) vsq_expected_result[0]);
+#else
+  abort();
+#endif
+ }
+
+  return 0;
+}
-- 
2.43.0



[PATCH 05/11] rs6000, __builtin_vsx_xvneg[sp,dp] add documentation, and test cases

2024-02-20 Thread Carl Love
GCC maintainers:

The patch adds documentation and test cases for the __builtin_vsx_xvnegsp, 
__builtin_vsx_xvnegdp built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

rs6000, __builtin_vsx_xvneg[sp,dp] add documentation and test cases

Add documentation to the extend.texi file for the two built-ins
__builtin_vsx_xvnegsp, __builtin_vsx_xvnegdp.

Add test cases for the two built-ins.

gcc/ChangeLog:
* doc/extend.texi (__builtin_vsx_xvnegsp, __builtin_vsx_xvnegdp):
Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-runnable-2.c: New test case.
---
 gcc/doc/extend.texi   | 13 +
 .../powerpc/vsx-builtin-runnable-2.c  | 51 +++
 2 files changed, 64 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 583b1d890bf..83eed9e334b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21495,6 +21495,19 @@ The @code{__builtin_vsx_xvcvuxwdp} converts single 
precision unsigned integer
 value to a double precision floating point value.  Input element at index 2*i
 is stored in the destination element i.
 
+@smallexample
+vector float __builtin_vsx_xvnegsp (vector float);
+vector double __builtin_vsx_xvnegdp (vector double);
+@end smallexample
+
+The  @code{__builtin_vsx_xvnegsp} and @code{__builtin_vsx_xvnegdp} negate each
+vector element.
+
+@smallexample
+vector __int128  __builtin_vsx_xxpermdi_1ti (vector __int128, vector __int128,
+const int);
+
+@end smallexample
 @node Basic PowerPC Built-in Functions Available on ISA 2.07
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.07
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c
new file mode 100644
index 000..7906a8e01d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c
@@ -0,0 +1,51 @@
+/* { dg-do run { target { lp64 } } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+#include 
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+  vector double vd_arg1, vd_result, vd_expected_result;
+  vector float vf_arg1, vf_result, vf_expected_result;
+
+  /* VSX Vector Negate Single-Precision.  */
+
+  vf_arg1 = (vector float) {-1.0, 12345.98, -2.1234, 238.9};
+  vf_result = __builtin_vsx_xvnegsp (vf_arg1);
+  vf_expected_result = (vector float) {1.0, -12345.98, 2.1234, -238.9};
+
+  for (i = 0; i < 4; i++)
+if (vf_result[i] != vf_expected_result[i])
+#if DEBUG
+  printf("ERROR, __builtin_vsx_xvnegsp: vf_result[%d] = %f, 
vf_expected_result[%d] = %f\n",
+i, vf_result[i], i, vf_expected_result[i]);
+#else
+  abort();
+#endif
+
+  /* VSX Vector Negate Double-Precision.  */
+
+  vd_arg1 = (vector double) {12345.98, -2.1234};
+  vd_result = __builtin_vsx_xvnegdp (vd_arg1);
+  vd_expected_result = (vector double) {-12345.98, 2.1234};
+
+  for (i = 0; i < 2; i++)
+if (vd_result[i] != vd_expected_result[i])
+#if DEBUG
+  printf("ERROR, __builtin_vsx_xvnegdp: vd_result[%d] = %f, 
vd_expected_result[%d] = %f\n",
+i, vd_result[i], i, vd_expected_result[i]);
+#else
+  abort();
+#endif
+
+  return 0;
+}
-- 
2.43.0



[PATCH 02/11] rs6000, fix arguments, add documentation for vector, element conversions

2024-02-20 Thread Carl Love


GCC maintainers:

This patch fixes the  return type for the __builtin_vsx_xvcvdpuxws and 
__builtin_vsx_xvcvspuxds built-ins.  They were defined as signed but should 
have been defined as unsigned.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

-
rs6000, fix arguments, add documentation for vector element conversions

The return type for the __builtin_vsx_xvcvdpuxws, __builtin_vsx_xvcvspuxds,
__builtin_vsx_xvcvspuxws built-ins should be unsigned.  This patch changes
the return values from signed to unsigned.

The documentation for the vector element conversion built-ins:

__builtin_vsx_xvcvspsxws
__builtin_vsx_xvcvspsxds
__builtin_vsx_xvcvspuxds
__builtin_vsx_xvcvdpsxws
__builtin_vsx_xvcvdpuxws
__builtin_vsx_xvcvdpuxds_uns
__builtin_vsx_xvcvspdp
__builtin_vsx_xvcvdpsp
__builtin_vsx_xvcvspuxws
__builtin_vsx_xvcvsxwdp
__builtin_vsx_xvcvuxddp_uns
__builtin_vsx_xvcvuxwdp

is missing from extend.texi.  This patch adds the missing documentation.

This patch also adds runnable test cases for each of the built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvdpuxws,
__builtin_vsx_xvcvspuxds, __builtin_vsx_xvcvspuxws): Change
return type from signed to unsigned.
* doc/extend.texi (__builtin_vsx_xvcvspsxws,
__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds,
__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws,
__builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspdp,
__builtin_vsx_xvcvdpsp, __builtin_vsx_xvcvspuxws,
__builtin_vsx_xvcvsxwdp, __builtin_vsx_xvcvuxddp_uns,
__builtin_vsx_xvcvuxwdp): Add documentation for builtins.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-runnable-1.c: New test file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 +-
 gcc/doc/extend.texi   | 135 ++
 .../powerpc/vsx-builtin-runnable-1.c  | 233 ++
 3 files changed, 371 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-1.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index d66a53a0fab..fd316f629e5 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1724,7 +1724,7 @@
   const vull __builtin_vsx_xvcvdpuxds_uns (vd);
 XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
 
-  const vsi __builtin_vsx_xvcvdpuxws (vd);
+  const vui __builtin_vsx_xvcvdpuxws (vd);
 XVCVDPUXWS vsx_xvcvdpuxws {}
 
   const vd __builtin_vsx_xvcvspdp (vf);
@@ -1736,10 +1736,10 @@
   const vsi __builtin_vsx_xvcvspsxws (vf);
 XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
 
-  const vsll __builtin_vsx_xvcvspuxds (vf);
+  const vull __builtin_vsx_xvcvspuxds (vf);
 XVCVSPUXDS vsx_xvcvspuxds {}
 
-  const vsi __builtin_vsx_xvcvspuxws (vf);
+  const vui __builtin_vsx_xvcvspuxws (vf);
 XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
 
   const vd __builtin_vsx_xvcvsxddp (vsll);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 4d8610f6aa8..583b1d890bf 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21360,6 +21360,141 @@ __float128 __builtin_sqrtf128 (__float128);
 __float128 __builtin_fmaf128 (__float128, __float128, __float128);
 @end smallexample
 
+@smallexample
+vector int __builtin_vsx_xvcvspsxws (vector float);
+@end smallexample
+
+The @code{__builtin_vsx_xvcvspsxws} converts the single precision floating
+point vector element i to a signed single-precision integer value using
+round to zero storing the result in element i.  If the source element is NaN
+the result is set to 0x8000 and VXCI is set to 1.  If the source
+element is SNaN then VXSNAN is also set to 1.  If the rounded value is greater
+than 2^31 - 1 the result is 0x7FFF and VXCVI is set to 1.  If the
+rounded value is less than -2^31, the result is set to 0x8000 and
+VXCVI is set to 1. If the rounded result is inexact then XX is set to 1.
+
+@smallexample
+vector signed long long int __builtin_vsx_xvcvspsxds (vector float);
+@end smallexample
+
+The @code{__builtin_vsx_xvcvspsxds} converts the single precision floating
+point vector element to a double precision signed integer value using the
+round to zero rounding mode.  If the source element is NaN the result
+is set to 0x8000 and VXCI is set to 1.  If the source element is
+SNaN then VXSNAN is also set to 1.  If the rounded value is greater than
+2^63 - 1 the result is 0x7FFF and VXCVI is set to 1.  If the
+rounded value is less than zero, the result is set to 0x8000 and
+VXCVI is set to 1.  If the rounded result is inexact then XX is set to 1.
+
+@smallexample
+vector unsigned long long __builtin_vsx_xvcvspuxds (vector float);
+@end smallexample
+
+The @code{__builtin_vsx_xvcvspuxds} 

[PATCH 01/11] rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins

2024-02-20 Thread Carl Love


GCC maintainers:

This patch fixes the arguments and return type for the various 
__builtin_vsx_cmple* built-ins.  They were defined as signed but should have 
been defined as unsigned.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

-

rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins

The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
__builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take
unsigned arguments and return an unsigned result.  This patch changes
the arguments and return type from signed to unsigned.

The documentation for the signed and unsigned versions of
__builtin_vsx_cmple is missing from extend.texi.  This patch adds the
missing documentation.

Test cases are added for each of the signed and unsigned built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_u16qi,
__builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si): Change
arguments and return from signed to unsigned.
* doc/extend.texi (__builtin_vsx_cmple_16qi,
__builtin_vsx_cmple_8hi, __builtin_vsx_cmple_4si,
__builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u8hi,
__builtin_vsx_cmple_u4si): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-cmple.c: New test file.
---
 gcc/config/rs6000/rs6000-builtins.def|  10 +-
 gcc/doc/extend.texi  |  23 
 gcc/testsuite/gcc.target/powerpc/vsx-cmple.c | 127 +++
 3 files changed, 155 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-cmple.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 3bc7fed6956..d66a53a0fab 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1349,16 +1349,16 @@
   const vss __builtin_vsx_cmple_8hi (vss, vss);
 CMPLE_8HI vector_ngtv8hi {}
 
-  const vsc __builtin_vsx_cmple_u16qi (vsc, vsc);
+  const vuc __builtin_vsx_cmple_u16qi (vuc, vuc);
 CMPLE_U16QI vector_ngtuv16qi {}
 
-  const vsll __builtin_vsx_cmple_u2di (vsll, vsll);
+  const vull __builtin_vsx_cmple_u2di (vull, vull);
 CMPLE_U2DI vector_ngtuv2di {}
 
-  const vsi __builtin_vsx_cmple_u4si (vsi, vsi);
+  const vui __builtin_vsx_cmple_u4si (vui, vui);
 CMPLE_U4SI vector_ngtuv4si {}
 
-  const vss __builtin_vsx_cmple_u8hi (vss, vss);
+  const vus __builtin_vsx_cmple_u8hi (vus, vus);
 CMPLE_U8HI vector_ngtuv8hi {}
 
   const vd __builtin_vsx_concat_2df (double, double);
@@ -1769,7 +1769,7 @@
   const vf __builtin_vsx_xvcvuxdsp (vull);
 XVCVUXDSP vsx_xvcvuxdsp {}
 
-  const vd __builtin_vsx_xvcvuxwdp (vsi);
+  const vd __builtin_vsx_xvcvuxwdp (vui);
 XVCVUXWDP vsx_xvcvuxwdp {}
 
   const vf __builtin_vsx_xvcvuxwsp (vsi);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 2b8ba1949bf..4d8610f6aa8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22522,6 +22522,29 @@ if the VSX instruction set is available.  The 
@samp{vec_vsx_ld} and
 @samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
 @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
+
+@smallexample
+vector signed char __builtin_vsx_cmple_16qi (vector signed char,
+ vector signed char);
+vector signed short __builtin_vsx_cmple_8hi (vector signed short,
+ vector signed short);
+vector signed int __builtin_vsx_cmple_4si (vector signed int,
+ vector signed int);
+vector unsigned char __builtin_vsx_cmple_u16qi (vector unsigned char,
+vector unsigned char);
+vector unsigned short __builtin_vsx_cmple_u8hi (vector unsigned short,
+vector unsigned short);
+vector unsigned int __builtin_vsx_cmple_u4si (vector unsigned int,
+  vector unsigned int);
+@end smallexample
+
+The builti-ins @code{__builtin_vsx_cmple_16qi}, @code{__builtin_vsx_cmple_8hi},
+@code{__builtin_vsx_cmple_4si}, @code{__builtin_vsx_cmple_u16qi},
+@code{__builtin_vsx_cmple_u8hi} and @code{__builtin_vsx_cmple_u4si} compare
+vectors of their defined type.  The corresponding result element is set to
+all ones if the two argument elements are less than or equal and all zeros
+otherwise.
+
 @node PowerPC AltiVec Built-in Functions Available on ISA 2.07
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-cmple.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-cmple.c
new file mode 100644
index 000..081817b4ba3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-cmple.c
@@ -0,0 +1,127 @@
+/* { 

Re: [PATCH] bpf: Add documentation for the -mcpu option

2024-02-20 Thread Will Hawkins
On Tue, Feb 20, 2024 at 7:35 AM Jose E. Marchesi
 wrote:
>
>
> Hello Will.
>
> Thanks for the patch.
> I just installed it on your behalf.

Thank you!

>
> > Add documentation describing the meaning and values for the -mcpu
> > command-line option.
> >
> > Tested for bpf-unknown-none on x86_64-linux-gnu host.
> >
> > gcc/ChangeLog:
> >
> >   * config/bpf/bpf.opt: Add help information for -mcpu.
> >
> > Signed-off-by: Will Hawkins 
> > ---
> >  gcc/config/bpf/bpf.opt | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
> > index bc5b2220116..acfddebdad7 100644
> > --- a/gcc/config/bpf/bpf.opt
> > +++ b/gcc/config/bpf/bpf.opt
> > @@ -77,9 +77,11 @@ Enable signed move and memory load instructions.
> >
> >  mcpu=
> >  Target RejectNegative Joined Var(bpf_isa) Enum(bpf_isa) Init(ISA_V4)
> > +Select the eBPF ISA version to target in code generation.
> >
> >  Enum
> >  Name(bpf_isa) Type(enum bpf_isa_version)
> > +Valid ISA versions (for use with the -mcpu= option)
> >
> >  EnumValue
> >  Enum(bpf_isa) String(v1) Value(ISA_V1)


rs6000, built-in cleanup patch series

2024-02-20 Thread Carl Love
GCC maintainers:

The following series of patches cleanup some of the rs6000 built-in support.  
Some of the first patches fix errors in the definition of a few of the 
built-ins.  The built-ins are supposed to have unsigned arguments but are 
listed as signed.  Some of the built-ins are supposed to return unsigned values 
but were defined to return a signed value.

There are a number of built-ins that are not documented but are duplicates of 
other documented built-ins.  The duplicate definitions are removed so users 
will only use the supported documented built-ins.

There are a number of the built-ins that are not documented in either the Power 
Vector Intrinsic Reference manual or in the gcc/doc/extend.texi file.  The 
patch adds the missing documentation as needed.  

Also most of the built-ins do not have test cases.  The patch adds test cases 
for the various built-ins.

Carl 


Re: [Patch] OpenMP/nvptx: support 'arch(nvptx64)' as context selector

2024-02-20 Thread Jakub Jelinek
On Tue, Feb 20, 2024 at 05:39:39PM +0100, Tobias Burnus wrote:
> clang/lib/Headers/openmp_wrappers/complex:device = {arch(amdgcn, nvptx, 
> nvptx64)},   \

That one doesn't really need the nvptx64 support.
> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc
> @@ -6403,7 +6403,7 @@ nvptx_omp_device_kind_arch_isa (enum 
> omp_device_kind_arch_isa trait,
>  case omp_device_kind:
>return strcmp (name, "gpu") == 0;
>  case omp_device_arch:
> -  return strcmp (name, "nvptx") == 0;
> +  return strcmp (name, "nvptx") == 0 || strcmp (name, "nvptx64") == 0;

Maybe guard the nvptx64 on TARGET_ABI64, at least as long as we have that?
Just in case we'd reconsider at some point the -m64 only thing.

Otherwise LGTM.

Jakub



[Patch] OpenMP/nvptx: support 'arch(nvptx64)' as context selector

2024-02-20 Thread Tobias Burnus

I just encountered 'arch(nvptx64)'. I think it makes sense to support
it as alias for 'nvptx' in the context selector for better compatibility.

Comments, remarks, suggestions?

Tobias

PS: See the LLVM documentation below. I do note that those are not identical
as LLVM uses 'nvptx' for 32bit while we effectively only support 64bit
(at least for offloading). Thus, while 'nvptx' might cause problems, adding
'nvptx64' in addition should be harmless.

* * *
From LLVM's 
https://android.googlesource.com/toolchain/llvm/+/refs/heads/master/docs/NVPTXUsage.rst

"The NVPTX target uses the module triple to select between 32/64-bit code
generation and the driver-compiler interface to use. The triple architecture
can be one of ``nvptx`` (32-bit PTX) or ``nvptx64`` (64-bit PTX). The
operating system should be one of ``cuda`` or ``nvcl``, which determines the
interface used by the generated code to communicate with the driver.  Most
users will want to use ``cuda`` as the operating system, which makes the
generated PTX compatible with the CUDA Driver API.

Example: 32-bit PTX for CUDA Driver API: ``nvptx-nvidia-cuda``
Example: 64-bit PTX for CUDA Driver API: ``nvptx64-nvidia-cuda``"

And usage inside LLVM:

clang/lib/Headers/openmp_wrappers/complex:device = {arch(amdgcn, nvptx, 
nvptx64)},   \
OpenMP/nvptx: support 'arch(nvptx64)' as context selector

The main 'arch' context selector for nvptx is, well, 'nvptx';
however, as 'nvptx64' is used as by LLVM, it makes sense
to support it as well.

Note that LLVM has: "The triple architecture can be one of
``nvptx`` (32-bit PTX) or ``nvptx64`` (64-bit PTX)."
GCC effectively only supports the 64bit variant (at least for
offloading). Thus, GCC's 'nvptx' is not quite the same as LLVM's.

gcc/ChangeLog:

	* config/nvptx/gen-omp-device-properties.sh: Add 'nvptx64' to arch.
	* config/nvptx/nvptx.cc (nvptx_omp_device_kind_arch_isa): Likewise.

libgomp/ChangeLog:

	* libgomp.texi (OpenMP Context Selectors): Add 'nvptx64' as additional
	'arch' value for nvptx.

 gcc/config/nvptx/gen-omp-device-properties.sh | 2 +-
 gcc/config/nvptx/nvptx.cc | 2 +-
 libgomp/libgomp.texi  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/gen-omp-device-properties.sh b/gcc/config/nvptx/gen-omp-device-properties.sh
index 95c754a164f..3666f9746d1 100644
--- a/gcc/config/nvptx/gen-omp-device-properties.sh
+++ b/gcc/config/nvptx/gen-omp-device-properties.sh
@@ -23,7 +23,7 @@ nvptx_sm_def="$1/nvptx-sm.def"
 sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//')
 
 echo kind: gpu
-echo arch: nvptx
+echo arch: nvptx nvptx64
 
 isa=""
 for sm in $sms; do
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 9363d3ecc6a..3b46b70fc3b 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -6403,7 +6403,7 @@ nvptx_omp_device_kind_arch_isa (enum omp_device_kind_arch_isa trait,
 case omp_device_kind:
   return strcmp (name, "gpu") == 0;
 case omp_device_arch:
-  return strcmp (name, "nvptx") == 0;
+  return strcmp (name, "nvptx") == 0 || strcmp (name, "nvptx64") == 0;
 case omp_device_isa:
 #define NVPTX_SM(XX, SEP)\
   {			\
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index d7da799a922..9de6e15f1c2 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6193,7 +6193,7 @@ on more architectures, GCC currently does not match any @code{arch} or
 @item @code{amdgcn}, @code{gcn}
   @tab See @code{-march=} in ``AMD GCN Options''@footnote{Additionally,
   @code{gfx803} is supported as an alias for @code{fiji}.}
-@item @code{nvptx}
+@item @code{nvptx}, @code{nvptx64}
   @tab See @code{-march=} in ``Nvidia PTX Options''
 @end multitable
 


[PATCH v9 23/24] c++: Implement __is_invocable built-in trait

2024-02-20 Thread Ken Matsui
This patch implements built-in trait for std::is_invocable.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_invocable.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_INVOCABLE.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.
* cp-tree.h (build_invoke): New function.
* method.cc (build_invoke): New function.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_invocable.
* g++.dg/ext/is_invocable1.C: New test.
* g++.dg/ext/is_invocable2.C: New test.
* g++.dg/ext/is_invocable3.C: New test.
* g++.dg/ext/is_invocable4.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |   6 +
 gcc/cp/cp-trait.def  |   1 +
 gcc/cp/cp-tree.h |   2 +
 gcc/cp/method.cc | 134 +
 gcc/cp/semantics.cc  |   4 +
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |   3 +
 gcc/testsuite/g++.dg/ext/is_invocable1.C | 349 +++
 gcc/testsuite/g++.dg/ext/is_invocable2.C | 139 +
 gcc/testsuite/g++.dg/ext/is_invocable3.C |  51 
 gcc/testsuite/g++.dg/ext/is_invocable4.C |  33 +++
 10 files changed, 722 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable2.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable3.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable4.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 23ea66d9c12..c87b126fdb1 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3791,6 +3791,12 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_FUNCTION:
   inform (loc, "  %qT is not a function", t1);
   break;
+case CPTK_IS_INVOCABLE:
+  if (!t2)
+inform (loc, "  %qT is not invocable", t1);
+  else
+inform (loc, "  %qT is not invocable by %qE", t1, t2);
+  break;
 case CPTK_IS_LAYOUT_COMPATIBLE:
   inform (loc, "  %qT is not layout compatible with %qT", t1, t2);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 85056c8140b..6cb2b55f4ea 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -75,6 +75,7 @@ DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
 DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
 DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
 DEFTRAIT_EXPR (IS_FUNCTION, "__is_function", 1)
+DEFTRAIT_EXPR (IS_INVOCABLE, "__is_invocable", -1)
 DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
 DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
 DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 1)
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 334c11396c2..261d3a71faa 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7334,6 +7334,8 @@ extern tree get_copy_assign   (tree);
 extern tree get_default_ctor   (tree);
 extern tree get_dtor   (tree, tsubst_flags_t);
 extern tree build_stub_object  (tree);
+extern tree build_invoke   (tree, const_tree,
+tsubst_flags_t);
 extern tree strip_inheriting_ctors (tree);
 extern tree inherited_ctor_binfo   (tree);
 extern bool base_ctor_omit_inherited_parms (tree);
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 98c10e6a8b5..14bd70ecf06 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -1928,6 +1928,140 @@ build_trait_object (tree type)
   return build_stub_object (type);
 }
 
+/* [func.require] Build an expression of INVOKE(FN_TYPE, ARG_TYPES...).  If the
+   given is not invocable, returns error_mark_node.  */
+
+tree
+build_invoke (tree fn_type, const_tree arg_types, tsubst_flags_t complain)
+{
+  if (fn_type == error_mark_node || arg_types == error_mark_node)
+return error_mark_node;
+
+  gcc_assert (TYPE_P (fn_type));
+  gcc_assert (TREE_CODE (arg_types) == TREE_VEC);
+
+  /* Access check is required to determine if the given is invocable.  */
+  deferring_access_check_sentinel acs (dk_no_deferred);
+
+  /* INVOKE is an unevaluated context.  */
+  cp_unevaluated cp_uneval_guard;
+
+  bool is_ptrdatamem;
+  bool is_ptrmemfunc;
+  if (TREE_CODE (fn_type) == REFERENCE_TYPE)
+{
+  tree deref_fn_type = TREE_TYPE (fn_type);
+  is_ptrdatamem = TYPE_PTRDATAMEM_P (deref_fn_type);
+  is_ptrmemfunc = TYPE_PTRMEMFUNC_P (deref_fn_type);
+
+  /* Dereference fn_type if it is a pointer to member.  */
+  if (is_ptrdatamem || is_ptrmemfunc)
+   fn_type = deref_fn_type;
+}
+  else
+{
+  is_ptrdatamem = TYPE_PTRDATAMEM_P (fn_type);
+  is_ptrmemfunc = TYPE_PTRMEMFUNC_P (fn_type);
+}
+
+  if (is_ptrdatamem && TREE_VEC_LENGTH (arg_types) != 1)
+/* Only a pointer to data member with one argument is invocable.  */
+return error_mark_node;

[PATCH] doc: RISC-V: Document that -mcpu doesn't override -march or -mtune

2024-02-20 Thread Palmer Dabbelt
This came up recently as Edwin was looking through the test suite.  A
few of us were talking about this during the patchwork meeting and were
surprised.  Looks like this is the desired behavior, so let's at least
document it.

gcc/ChangeLog:

* doc/invoke.texi: Document -mcpu.

Signed-off-by: Palmer Dabbelt 
---
 gcc/doc/invoke.texi | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6ec56493e59..4a4bba9f1cd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -30670,6 +30670,8 @@ Permissible values for this option are: 
@samp{sifive-e20}, @samp{sifive-e21},
 @samp{sifive-s21}, @samp{sifive-s51}, @samp{sifive-s54}, @samp{sifive-s76},
 @samp{sifive-u54}, @samp{sifive-u74}, and @samp{sifive-x280}.
 
+Note that @option{-mcpu} does not override @option{-march} or @option{-mtune}.
+
 @opindex mtune
 @item -mtune=@var{processor-string}
 Optimize the output for the given processor, specified by microarchitecture or
-- 
2.43.0



[patch,avr,applied] Use int types of exact width and signedness in built-ins prototypes

2024-02-20 Thread Georg-Johann Lay

AVR: Use types of exact size and signedness in built-ins.

The AVR built-ins used types like "int" or "char" that don't
have exact signedness or type size which depend on -mint8
and -f[no-][un-]signed-char etc.  As the built-ins are modelling
machine instructions of given type sizes and signedness, also
use according types in their prototypes.

gcc/
* config/avr/builtins.def: Use function prototypes of given size
and signedness.
	* config/avr/avr.cc (avr_init_builtins): Adjust types required by 
builtins.def.

* doc/extend.texi (AVR Built-in Functions): Adjust accordingly.diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 4a55f14bff7..d3756a2f036 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -14605,35 +14605,35 @@ avr_init_builtins (void)
 {
   tree void_ftype_void
 = build_function_type_list (void_type_node, NULL_TREE);
-  tree uchar_ftype_uchar
-= build_function_type_list (unsigned_char_type_node,
-unsigned_char_type_node,
+  tree uintQI_ftype_uintQI
+= build_function_type_list (unsigned_intQI_type_node,
+unsigned_intQI_type_node,
 NULL_TREE);
-  tree uint_ftype_uchar_uchar
-= build_function_type_list (unsigned_type_node,
-unsigned_char_type_node,
-unsigned_char_type_node,
+  tree uintHI_ftype_uintQI_uintQI
+= build_function_type_list (unsigned_intHI_type_node,
+unsigned_intQI_type_node,
+unsigned_intQI_type_node,
 NULL_TREE);
-  tree int_ftype_char_char
-= build_function_type_list (integer_type_node,
-char_type_node,
-char_type_node,
+  tree intHI_ftype_intQI_intQI
+= build_function_type_list (intHI_type_node,
+intQI_type_node,
+intQI_type_node,
 NULL_TREE);
-  tree int_ftype_char_uchar
-= build_function_type_list (integer_type_node,
-char_type_node,
-unsigned_char_type_node,
+  tree intHI_ftype_intQI_uintQI
+= build_function_type_list (intHI_type_node,
+intQI_type_node,
+unsigned_intQI_type_node,
 NULL_TREE);
-  tree void_ftype_ulong
+  tree void_ftype_uintSI
 = build_function_type_list (void_type_node,
-long_unsigned_type_node,
+unsigned_intSI_type_node,
 NULL_TREE);
 
-  tree uchar_ftype_ulong_uchar_uchar
-= build_function_type_list (unsigned_char_type_node,
-long_unsigned_type_node,
-unsigned_char_type_node,
-unsigned_char_type_node,
+  tree uintQI_ftype_uintSI_uintQI_uintQI
+= build_function_type_list (unsigned_intQI_type_node,
+unsigned_intSI_type_node,
+unsigned_intQI_type_node,
+unsigned_intQI_type_node,
 NULL_TREE);
 
   tree const_memx_void_node
@@ -14644,8 +14644,8 @@ avr_init_builtins (void)
   tree const_memx_ptr_type_node
 = build_pointer_type_for_mode (const_memx_void_node, PSImode, false);
 
-  tree char_ftype_const_memx_ptr
-= build_function_type_list (char_type_node,
+  tree intQI_ftype_const_memx_ptr
+= build_function_type_list (intQI_type_node,
 const_memx_ptr_type_node,
 NULL);
 
diff --git a/gcc/config/avr/builtins.def b/gcc/config/avr/builtins.def
index b4bf7beb590..316bdebe498 100644
--- a/gcc/config/avr/builtins.def
+++ b/gcc/config/avr/builtins.def
@@ -43,17 +43,17 @@ DEF_BUILTIN (SLEEP, 0, void_ftype_void, sleep, NULL)
 /* Mapped to respective instruction but might also be folded away
or emit as libgcc call if ISA does not provide the instruction.  */
 
-DEF_BUILTIN (SWAP,   1, uchar_ftype_uchar,  rotlqi3_4, NULL)
-DEF_BUILTIN (FMUL,   2, uint_ftype_uchar_uchar, fmul, NULL)
-DEF_BUILTIN (FMULS,  2, int_ftype_char_char,fmuls, NULL)
-DEF_BUILTIN (FMULSU, 2, int_ftype_char_uchar,   fmulsu, NULL)
+DEF_BUILTIN (SWAP,   1, uintQI_ftype_uintQI,rotlqi3_4, NULL)
+DEF_BUILTIN (FMUL,   2, uintHI_ftype_uintQI_uintQI, fmul, NULL)
+DEF_BUILTIN (FMULS,  2, intHI_ftype_intQI_intQI,fmuls, NULL)
+DEF_BUILTIN (FMULSU, 2, intHI_ftype_intQI_uintQI,   fmulsu, NULL)
 
 /* More complex stuff that cannot be mapped 1:1 to an instruction.  */
 
-DEF_BUILTIN (DELAY_CYCLES, -1, void_ftype_ulong, nothing, NULL)
-DEF_BUILTIN (NOPS, -1, void_ftype_ulong, nothing, NULL)
-DEF_BUILTIN (INSERT_BITS, 3, uchar_ftype_ulong_uchar_uchar, insert_bits, NULL)
-DEF_BUILTIN (FLASH_SEGMENT, 1, char_ftype_const_memx_ptr, flash_segment, NULL)
+DEF_BUILTIN (DELAY_CYCLES, -1, void_ftype_uintSI, nothing, NULL)
+DEF_BUILTIN (NOPS, -1, void_ftype_uintSI, nothing, NULL)
+DEF_BUILTIN (INSERT_BITS, 3, uintQI_ftype_uintSI_uintQI_uintQI, insert_bits, NULL)
+DEF_BUILTIN (FLASH_SEGMENT, 1, intQI_ftype_const_memx_ptr, flash_segment, NULL)
 
 /* ISO/IEC TR 18037 "Embedded C"
The following builtins are undocumented and used by stdfix.h.  */
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b2383b55666..2135dfde9c8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -16783,32 +16783,30 @@ or if not a specific built-in is implemented or not. For example, if
 @code{__BUILTIN_AVR_NOP} is defined to @code{1} and undefined otherwise.
 
 

Re: [PATCH] libcpp: Improve location for macro names [PR66290]

2024-02-20 Thread Lewis Hyatt
On Mon, Feb 19, 2024 at 11:36 PM Alexandre Oliva  wrote:
>
> This backport for gcc-13 is the first of two required for the
> g++.dg/pch/line-map-3.C test to stop hitting a variant of the known
> problem mentioned in that testcase: on riscv64-elf and riscv32-elf,
> after restoring the PCH, the location of the macros is mentioned as if
> they were on line 3 rather than 2, so even the existing xfails fail.  I
> think this might be too much to backport, and I'm ready to use an xfail
> instead, but since this would bring more predictability, I thought I'd
> ask whether you'd find this backport acceptable.
>
> Regstrapped on x86_64-linux-gnu, along with other backports, and tested
> manually on riscv64-elf.  Ok to install?

Sorry that test is causing a problem, I hadn't realized at first that
the wrong output was target-dependent. I feel like simply deleting
this test g++.dg/pch/line-map-3.C from GCC 13 branch should be fine
too, as a safer alternative, if release managers prefer? It doesn't
really need to be on the branch, it's only purpose is to remind me to
fix the underlying issue for GCC 15...

-Lewis


Re: [PATCH] RISC-V: Fix CTZ unnecessary sign extension [PR #106888]

2024-02-20 Thread Alexandre Oliva
On Feb 20, 2024, Jeff Law  wrote:

> On 2/19/24 21:26, Alexandre Oliva wrote:
>> This backport for gcc-13 is required for pr90838.c to get the expected
>> count of andi instructions on riscv64-elf
.
> In general, shouldn't backports be focused on correctness issues?

*nod*.

> It's unclear what the motivation is for backporting this change into
> gcc-13.

There's this unexpected fail in gcc-13 (pr90838.c), one out of a handful
that we've hit while transitioning our riscv toolchains to gcc-13.

I set out to understand them, I identified the patches that got them to
pass in the trunk, and so I've proposed their backports to fix the fails
in gcc-13.

Surely there are other ways to address each one of the fails.

But even if we choose to just xfail them, or leave them failing noisily,
I've already gone through the process of identifying the fix, so I
figured I might as well share it.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] RISC-V: Fix riscv/arch-19.c with different ISA spec version

2024-02-20 Thread Kito Cheng
LGTM :)

On Tue, Feb 20, 2024 at 12:03 PM Alexandre Oliva  wrote:
>
> This testcase is failing with riscv64-elf and riscv32-elf in the gcc-13
> branch, if configured to use an assembler that supports -misa-spec; with
> an assembler that doesn't, the test passes both with and without the
> following backport from the trunk, so I'd like to install it in gcc-13.
> Regstrapped on x86_64-linux-gnu, along with other backports, and tested
> manually on riscv64-elf.  Ok to install?
>
> From: Kito Cheng 
>
> In newer ISA spec, F will implied zicsr, add that into -march option to
> prevent different test result on different default -misa-spec version.
>
> gcc/testsuite/
>
> * gcc.target/riscv/arch-19.c: Add -misa-spec.
>
> (cherry picked from commit 9fde76a3be8e1717d9d38492c40675e742611e45)
> ---
>  gcc/testsuite/gcc.target/riscv/arch-19.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/arch-19.c 
> b/gcc/testsuite/gcc.target/riscv/arch-19.c
> index b042e1a49fe6f..95204ede26a69 100644
> --- a/gcc/testsuite/gcc.target/riscv/arch-19.c
> +++ b/gcc/testsuite/gcc.target/riscv/arch-19.c
> @@ -1,4 +1,4 @@
>  /* { dg-do compile } */
> -/* { dg-options "-march=rv64if_zfinx -mabi=lp64" } */
> +/* { dg-options "-march=rv64if_zicsr_zfinx -mabi=lp64" } */
>  int foo() {}
> -/* { dg-error "'-march=rv64if_zfinx': z\\*inx conflicts with floating-point 
> extensions" "" { target *-*-* } 0 } */
> +/* { dg-error "'-march=rv64if_zicsr_zfinx': z\\*inx conflicts with 
> floating-point extensions" "" { target *-*-* } 0 } */
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] aarch64: Allow aarch64-linux-muscl for heap trampolines [PR113971].

2024-02-20 Thread Richard Biener
On Tue, Feb 20, 2024 at 11:27 AM Iain Sandoe  wrote:
>
> Tested on aarch64-linux-gnu, aarch64-darwin by me and on aarch64-linux-musl
> by Sam James (thanks!).  OK for trunk?

OK

> thanks
> Iain
>
> --- 8< ---
>
>
> This allows the same trampoline pattern to be used on all linux variants
> rather than restricting it to linux gnu.
>
> PR target/113971.
>
> libgcc/ChangeLog:
>
> * config/aarch64/heap-trampoline.c: Allow all linux variants.
>
> Signed-off-by: Iain Sandoe 
> ---
>  libgcc/config/aarch64/heap-trampoline.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libgcc/config/aarch64/heap-trampoline.c 
> b/libgcc/config/aarch64/heap-trampoline.c
> index 9d5b19983b1..1e3460b1601 100644
> --- a/libgcc/config/aarch64/heap-trampoline.c
> +++ b/libgcc/config/aarch64/heap-trampoline.c
> @@ -29,7 +29,7 @@ void *allocate_trampoline_page (void);
>  void __gcc_nested_func_ptr_created (void *chain, void *func, void *dst);
>  void __gcc_nested_func_ptr_deleted (void);
>
> -#if defined(__gnu_linux__)
> +#if defined(__linux__)
>  static const uint32_t aarch64_trampoline_insns[] = {
>0xd503245f, /* hint34 */
>0x58b1, /* ldr x17, .+20 */
> @@ -82,7 +82,7 @@ allocate_trampoline_page (void)
>  {
>void *page;
>
> -#if defined(__gnu_linux__)
> +#if defined(__linux__)
>page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
>MAP_ANON | MAP_PRIVATE, 0, 0);
>  #elif __APPLE__
> --
> 2.39.2 (Apple Git-143)
>


[patch,avr,applied] Use @defbuiltin to document built-ins.

2024-02-20 Thread Georg-Johann Lay

This patch uses @defbuiltin to document built-in
functions so that the functions are listed in the index.
Previously, @table @code was used.

Johann

--

AVR: extend.texi - Use @defbuiltin to document built-ins.

gcc/
* doc/extend.texi (AVR Built-in Functions): Use @defbuiltin
instead of @table.diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e048404dffe..b2383b55666 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -16782,37 +16782,41 @@ or if not a specific built-in is implemented or not. For example, if
 @code{__builtin_avr_nop} is available the macro
 @code{__BUILTIN_AVR_NOP} is defined to @code{1} and undefined otherwise.
 
-@table @code
+@defbuiltin{void __builtin_avr_nop (void)}
+@defbuiltinx{void __builtin_avr_nop (void)}
+@defbuiltinx{void __builtin_avr_sei (void)}
+@defbuiltinx{void __builtin_avr_cli (void)}
+@defbuiltinx{void __builtin_avr_sleep (void)}
+@defbuiltinx{void __builtin_avr_wdr (void)}
+@defbuiltinx{{unsigned char} __builtin_avr_swap (unsigned char)}
+@defbuiltinx{{unsigned int} __builtin_avr_fmul (unsigned char, unsigned char)}
+@defbuiltinx{int __builtin_avr_fmuls (char, char)}
+@defbuiltinx{int __builtin_avr_fmulsu (char, unsigned char)}
 
-@item void __builtin_avr_nop (void)
-@itemx void __builtin_avr_sei (void)
-@itemx void __builtin_avr_cli (void)
-@itemx void __builtin_avr_sleep (void)
-@itemx void __builtin_avr_wdr (void)
-@itemx unsigned char __builtin_avr_swap (unsigned char)
-@itemx unsigned int __builtin_avr_fmul (unsigned char, unsigned char)
-@itemx int __builtin_avr_fmuls (char, char)
-@itemx int __builtin_avr_fmulsu (char, unsigned char)
 These built-in functions map to the respective machine
 instruction, i.e.@: @code{nop}, @code{sei}, @code{cli}, @code{sleep},
 @code{wdr}, @code{swap}, @code{fmul}, @code{fmuls}
 resp. @code{fmulsu}. The three @code{fmul*} built-ins are implemented
 as library call if no hardware multiplier is available.
 
-@item void __builtin_avr_delay_cycles (unsigned long ticks)
+@enddefbuiltin
+
+@defbuiltin{void __builtin_avr_delay_cycles (unsigned long @var{ticks})}
 Delay execution for @var{ticks} cycles. Note that this
 built-in does not take into account the effect of interrupts that
 might increase delay time. @var{ticks} must be a compile-time
 integer constant; delays with a variable number of cycles are not supported.
+@enddefbuiltin
 
-@item char __builtin_avr_flash_segment (const __memx void*)
+@defbuiltin{char __builtin_avr_flash_segment (const __memx void*)}
 This built-in takes a byte address to the 24-bit
 @ref{AVR Named Address Spaces,address space} @code{__memx} and returns
 the number of the flash segment (the 64 KiB chunk) where the address
 points to.  Counting starts at @code{0}.
 If the address does not point to flash memory, return @code{-1}.
+@enddefbuiltin
 
-@item uint8_t __builtin_avr_insert_bits (uint32_t map, uint8_t bits, uint8_t val)
+@defbuiltin{uint8_t __builtin_avr_insert_bits (uint32_t @var{map}, uint8_t @var{bits}, uint8_t @var{val})}
 Insert bits from @var{bits} into @var{val} and return the resulting
 value. The nibbles of @var{map} determine how the insertion is
 performed: Let @var{X} be the @var{n}-th nibble of @var{map}
@@ -16856,12 +16860,12 @@ __builtin_avr_insert_bits (0x3210, bits, val);
 // reverse the bit order of bits
 __builtin_avr_insert_bits (0x01234567, bits, 0);
 @end smallexample
+@enddefbuiltin
 
-@item void __builtin_avr_nops (unsigned count)
+@defbuiltin{void __builtin_avr_nops (unsigned @var{count})}
 Insert @var{count} @code{NOP} instructions.
 The number of instructions must be a compile-time integer constant.
-
-@end table
+@enddefbuiltin
 
 @noindent
 There are many more AVR-specific built-in functions that are used to


Re: [PATCH] bpf: Add documentation for the -mcpu option

2024-02-20 Thread Jose E. Marchesi


Hello Will.

Thanks for the patch.
I just installed it on your behalf.

> Add documentation describing the meaning and values for the -mcpu
> command-line option.
>
> Tested for bpf-unknown-none on x86_64-linux-gnu host.
>
> gcc/ChangeLog:
>
>   * config/bpf/bpf.opt: Add help information for -mcpu.
>
> Signed-off-by: Will Hawkins 
> ---
>  gcc/config/bpf/bpf.opt | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
> index bc5b2220116..acfddebdad7 100644
> --- a/gcc/config/bpf/bpf.opt
> +++ b/gcc/config/bpf/bpf.opt
> @@ -77,9 +77,11 @@ Enable signed move and memory load instructions.
>  
>  mcpu=
>  Target RejectNegative Joined Var(bpf_isa) Enum(bpf_isa) Init(ISA_V4)
> +Select the eBPF ISA version to target in code generation.
>  
>  Enum
>  Name(bpf_isa) Type(enum bpf_isa_version)
> +Valid ISA versions (for use with the -mcpu= option)
>  
>  EnumValue
>  Enum(bpf_isa) String(v1) Value(ISA_V1)


Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread chenglulu



在 2024/2/20 下午7:54, Xi Ruoyao 写道:

On Tue, 2024-02-20 at 19:50 +0800, chenglulu wrote:

在 2024/2/20 下午7:31, Xi Ruoyao 写道:

On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote:

On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:


So I think that without worrying about performance and ensuring that
there is no problem

with binutils, I think we can make the following modifications:

     -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
     -   used for padding.  */
     +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
     +   default.  */
  #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
     -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
     +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))

What do you think of it?

Unfortunately it will cause warnings with GAS 2.41 or earlier like

t1.s:1: Warning: expected fill pattern missing
t1.s:5: Warning: expected fill pattern missing

And AFAIK these things may cause many test failures due to "excessive
errors" if running the GCC test suite with these earlier GAS versions.
Maybe we'll have to add some autoconf-based probing for the linker
anyway?

Or just silence the warning passing "--no-warn" to the assembler but I'm
highly unsure if this is really a good idea :(.


I am not opposed to adding detection code, but I looked at this problem
today

and I think this change is the smallest change. I asked Meng Qinggang and he

said that the warning of GAS 2.41 can be removed.

Yes, but we cannot change a released binutils-2.41 tarball and Binutils
folks don't make point releases like GCC.

OK, I agree with you. I will backpoint r14-4674 and r14-5434 to gcc12 
and gcc13.




Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread Xi Ruoyao
On Tue, 2024-02-20 at 19:50 +0800, chenglulu wrote:
> 
> 在 2024/2/20 下午7:31, Xi Ruoyao 写道:
> > On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote:
> > > On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:
> > > 
> > > > So I think that without worrying about performance and ensuring that
> > > > there is no problem
> > > > 
> > > > with binutils, I think we can make the following modifications:
> > > > 
> > > >     -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
> > > >     -   used for padding.  */
> > > >     +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding 
> > > > by
> > > >     +   default.  */
> > > >  #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
> > > >     -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
> > > >     +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))
> > > > 
> > > > What do you think of it?
> > > Unfortunately it will cause warnings with GAS 2.41 or earlier like
> > > 
> > > t1.s:1: Warning: expected fill pattern missing
> > > t1.s:5: Warning: expected fill pattern missing
> > > 
> > > And AFAIK these things may cause many test failures due to "excessive
> > > errors" if running the GCC test suite with these earlier GAS versions.
> > > Maybe we'll have to add some autoconf-based probing for the linker
> > > anyway?
> > Or just silence the warning passing "--no-warn" to the assembler but I'm
> > highly unsure if this is really a good idea :(.
> > 
> I am not opposed to adding detection code, but I looked at this problem 
> today
> 
> and I think this change is the smallest change. I asked Meng Qinggang and he
> 
> said that the warning of GAS 2.41 can be removed.

Yes, but we cannot change a released binutils-2.41 tarball and Binutils
folks don't make point releases like GCC.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread chenglulu



在 2024/2/20 下午7:31, Xi Ruoyao 写道:

On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote:

On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:


So I think that without worrying about performance and ensuring that
there is no problem

with binutils, I think we can make the following modifications:

    -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
    -   used for padding.  */
    +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
    +   default.  */
     #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
    -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
    +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))

What do you think of it?

Unfortunately it will cause warnings with GAS 2.41 or earlier like

t1.s:1: Warning: expected fill pattern missing
t1.s:5: Warning: expected fill pattern missing

And AFAIK these things may cause many test failures due to "excessive
errors" if running the GCC test suite with these earlier GAS versions.
Maybe we'll have to add some autoconf-based probing for the linker
anyway?

Or just silence the warning passing "--no-warn" to the assembler but I'm
highly unsure if this is really a good idea :(.

I am not opposed to adding detection code, but I looked at this problem 
today


and I think this change is the smallest change. I asked Meng Qinggang and he

said that the warning of GAS 2.41 can be removed.



Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread Xi Ruoyao
On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote:
> On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:
> 
> > So I think that without worrying about performance and ensuring that
> > there is no problem
> > 
> > with binutils, I think we can make the following modifications:
> > 
> >    -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
> >    -   used for padding.  */
> >    +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
> >    +   default.  */
> >     #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
> >    -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
> >    +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))
> > 
> > What do you think of it?
> 
> Unfortunately it will cause warnings with GAS 2.41 or earlier like
> 
> t1.s:1: Warning: expected fill pattern missing
> t1.s:5: Warning: expected fill pattern missing
> 
> And AFAIK these things may cause many test failures due to "excessive
> errors" if running the GCC test suite with these earlier GAS versions.
> Maybe we'll have to add some autoconf-based probing for the linker
> anyway?

Or just silence the warning passing "--no-warn" to the assembler but I'm
highly unsure if this is really a good idea :(.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[pushed] aarch64: Fix streaming-compatible code with -mtrack-speculation [PR113805]

2024-02-20 Thread Richard Sandiford
This patch makes -mtrack-speculation work on streaming-compatible
functions.  There were two related issues.  The first is that the
streaming-compatible code was using TB(N)Z unconditionally, whereas
those instructions are not allowed with speculation tracking.
That part can be fixed in a similar way to the recent eh_return
fix (PR112987).

The second issue was that the speculation-tracking pass runs
before some of the conditional branches are inserted.  It isn't
safe to insert the branches any earlier, so the patch instead adds
a second speculation-tracking pass that runs afterwards.  The new
pass is only used for streaming-compatible functions.

The testcase is adapted from call_sm_switch_1.c.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
PR target/113805
* config/aarch64/aarch64-passes.def (pass_late_track_speculation):
New pass.
* config/aarch64/aarch64-protos.h (make_pass_late_track_speculation):
Declare.
* config/aarch64/aarch64.md (is_call): New attribute.
(*and3nr_compare0): Rename to...
(@aarch64_and3nr_compare0): ...this.
* config/aarch64/aarch64-sme.md (aarch64_get_sme_state)
(aarch64_tpidr2_save, aarch64_tpidr2_restore): Add is_call attributes.
* config/aarch64/aarch64-speculation.cc: Update file comment to
describe the new late pass.
(aarch64_do_track_speculation): Handle is_call insns like other calls.
(pass_track_speculation): Add an is_late member variable.
(pass_track_speculation::gate): Run the late pass for streaming-
compatible functions and the early pass for other functions.
(make_pass_track_speculation): Update accordingly.
(make_pass_late_track_speculation): New function.
* config/aarch64/aarch64.cc (aarch64_gen_test_and_branch): New
function.
(aarch64_guard_switch_pstate_sm): Use it.

gcc/testsuite/
PR target/113805
* gcc.target/aarch64/sme/call_sm_switch_11.c: New test.
---
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 gcc/config/aarch64/aarch64-sme.md |   3 +
 gcc/config/aarch64/aarch64-speculation.cc |  64 --
 gcc/config/aarch64/aarch64.cc |  26 ++-
 gcc/config/aarch64/aarch64.md |   8 +-
 .../aarch64/sme/call_sm_switch_11.c   | 209 ++
 7 files changed, 291 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_11.c

diff --git a/gcc/config/aarch64/aarch64-passes.def 
b/gcc/config/aarch64/aarch64-passes.def
index 769d48f4faa..e20e6f4395b 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -22,6 +22,7 @@ INSERT_PASS_BEFORE (pass_sched, 1, pass_aarch64_early_ra);
 INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
 INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation);
 INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, 
pass_switch_pstate_sm);
+INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, 
pass_late_track_speculation);
 INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
 INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
 INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index a0b142e0b94..bd719b992a5 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1075,6 +1075,7 @@ std::string aarch64_get_extension_string_for_isa_flags 
(aarch64_feature_flags,
 rtl_opt_pass *make_pass_aarch64_early_ra (gcc::context *);
 rtl_opt_pass *make_pass_fma_steering (gcc::context *);
 rtl_opt_pass *make_pass_track_speculation (gcc::context *);
+rtl_opt_pass *make_pass_late_track_speculation (gcc::context *);
 rtl_opt_pass *make_pass_tag_collision_avoidance (gcc::context *);
 rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
 rtl_opt_pass *make_pass_cc_fusion (gcc::context *ctxt);
diff --git a/gcc/config/aarch64/aarch64-sme.md 
b/gcc/config/aarch64/aarch64-sme.md
index 6edcdd96679..81d941871ac 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -105,6 +105,7 @@ (define_insn "aarch64_get_sme_state"
(clobber (reg:CC CC_REGNUM))]
   ""
   "bl\t__arm_sme_state"
+  [(set_attr "is_call" "yes")]
 )
 
 (define_insn "aarch64_read_svcr"
@@ -242,6 +243,7 @@ (define_insn "aarch64_tpidr2_save"
(clobber (reg:CC CC_REGNUM))]
   ""
   "bl\t__arm_tpidr2_save"
+  [(set_attr "is_call" "yes")]
 )
 
 ;; Set PSTATE.ZA to 1.  If ZA was previously dormant or active,
@@ -358,6 +360,7 @@ (define_insn "aarch64_tpidr2_restore"
(clobber (reg:CC CC_REGNUM))]
   ""
   "bl\t__arm_tpidr2_restore"
+  [(set_attr "is_call" "yes")]
 )
 
 ;; Check whether a lazy save set up by aarch64_save_za was committed
diff --git 

Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread Xi Ruoyao
On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:

> So I think that without worrying about performance and ensuring that 
> there is no problem
> 
> with binutils, I think we can make the following modifications:
> 
>    -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
>    -   used for padding.  */
>    +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
>    +   default.  */
>     #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
>    -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
>    +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))
> 
> What do you think of it?

Unfortunately it will cause warnings with GAS 2.41 or earlier like

t1.s:1: Warning: expected fill pattern missing
t1.s:5: Warning: expected fill pattern missing

And AFAIK these things may cause many test failures due to "excessive
errors" if running the GCC test suite with these earlier GAS versions. 
Maybe we'll have to add some autoconf-based probing for the linker
anyway?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] rs6000: Neuter option -mpower{8,9}-vector [PR109987]

2024-02-20 Thread Segher Boessenkool
On Tue, Feb 20, 2024 at 05:27:07PM +0800, Kewen.Lin wrote:
> > -mcpu=power8 implies -mvsx already.
> 
> Yes, but users can specify -mno-vsx in RUNTESTFLAGS, dejagnu
> framework can have different behaviors (options order) for
> different versions, this explicit -mvsx is mainly for the
> consistency between the checking and the actual testing.

It is not supported at all.  It is better to assume users do not try
to hang themselves.

> > It is mostly a testsuite patch, and testcase patches are fine (and much
> > wanted!) in stage 4.  The actual compiler options remain, and behaviour
> > does not change for anyone who used the option as intended,
> 
> Yes, excepting for one unexpected use that users having one cpu type which
> doesn't support power8/power9 capability but meanwhile specifies option
> -mpower{8,9}-vector to gain power8/power9 capability (as currently these
> options can enable the corresponding flags).  But I don't think it's an
> expected use case.

Yeah, it is why we do not want such options at all :-)

> >>* config/rs6000/rs6000.opt: Make option power{8,9}-vector as
> >>WarnRemoved.
> > 
> > Do we want this, or do we want it silent?  Should we remove the options
> > later, if we now warn for it?
> 
> Good question, it mainly follows the practice of option direct-move here.
> IMHO at least for power8-vector we want WarnRemoved for now as it's
> documented before, and we can probably make it (or them) removed later on
> trunk once all active branch releases don't support it any more.
> 
> What's your opinion on this?

Originally I did
  Warn(%qs is deprecated)
which already was a mistake.  It then changed to
  Deprecated
and then to
  WarnRemoved
which make it clearer that it is a bad plan.

If it is okay to remove an option, we should not talk about it at all
anymore.  Well maybe warn about it for another release or so, but not
longer.

> >>  (define_register_constraint "we" 
> >> "rs6000_constraints[RS6000_CONSTRAINT_we]"
> >> -  "@internal Like @code{wa}, if @option{-mpower9-vector} and 
> >> @option{-m64} are
> >> -   used; otherwise, @code{NO_REGS}.")
> >> +  "@internal Like @code{wa}, if the cpu type is power9 or up, meanwhile
> >> +   @option{-mvsx} and @option{-m64} are used; otherwise, @code{NO_REGS}.")
> > 
> > "if this is a POWER9 or later and @option{-mvsx} and @option{-m64} are
> > used".  How clumsy.  Maybe we should make the patterns that use "we"
> > work without mtvsrdd as well?  Hrm, they will still require 64-bit GPRs
> > of course, unless we can do something tricky.
> > 
> > We do not need the special constraint at all of course (we can add these
> > conditions to all patterns that use it: all *two* patterns).  So maybe
> > that's what we should do :-)
> 
> Not sure the original intention introducing it (Mike might know it best), but
> removing it sounds doable.

It is for mtvsrdd.

>  btw, it seems more than two patterns using it?
> like (if I didn't miss something):
>   - vsx_concat_
>   - vsx_splat__reg
>   - vsx_splat_v4si_di
>   - vsx_mov_64bit

Yes, it isn't clear we should use this contraint in those last two.  It
looks like those do not even need the restriction to 64 bit systems.
Well the last one obviously has that already, but then it could just use
"wa", no?

> > -mcpu=power8 implies -mvsx (power7 already).  You can disable VSX, or
> > VMX as well, but by default it is enabled.
> 
> Yes, it's meant to consider the explicitly -mno-vsx, which suffers the option
> order issue.  But considering we raise error for -mno-vsx -mpower{8,9}-vector
> before, without specifying -mvsx is closer to the previous.
> 
> I'll adjust it and the below similar ones, thanks!

It is never supported to do unsupported things :-)

We need to be able to rely on defaults.  Otherwise, we will have to
implement all of GCC recursively, in itself, in the testsuite, and in
individual tests.  Let's not :-)

Cheers,


Segher


Re: Repost [PATCH 1/6] Add -mcpu=future

2024-02-20 Thread Kewen.Lin
Hi Mike,

Sorry for late reply (just back from vacation).

on 2024/2/8 03:58, Michael Meissner wrote:
> On Wed, Feb 07, 2024 at 05:21:10PM +0800, Kewen.Lin wrote:
>> on 2024/2/6 14:01, Michael Meissner wrote:
>> Sorry for the possible confusion here, the "tune_proc" that I referred to is
>> the variable in the above else branch:
>>
>>enum processor_type tune_proc = (TARGET_POWERPC64 ? PROCESSOR_DEFAULT64 : 
>> PROCESSOR_DEFAULT);
>>
>> It's either PROCESSOR_DEFAULT64 or PROCESSOR_DEFAULT, so it doesn't have a
>> chance to be PROCESSOR_FUTURE, so the checking "tune_proc == 
>> PROCESSOR_FUTURE"
>> is useless.
> 
> PROCESSOR_DEFAULT can be PROCESSOR_FUTURE if somebody configures GCC with
> --with-cpu=future.  While in general it shouldn't occur, it is helpful to
> consider all of the corner cases.

But it sounds not true, I think you meant TARGET_CPU_DEFAULT instead?

On one local ppc64le machine I tried to configure with --with-cpu=power10,
I got {,OPTION_}TARGET_CPU_DEFAULT "power10" but PROCESSOR_DEFAULT is still
PROCESSOR_POWER7 (PROCESSOR_DEFAULT64 is PROCESSOR_POWER8).  I think these
PROCESSOR_DEFAULT{,64} are defined by various headers:

$ grep -r "define PROCESSOR_DEFAULT" gcc/config/rs6000/
gcc/config/rs6000/aix71.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7
gcc/config/rs6000/aix71.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER7
gcc/config/rs6000/aix72.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7
gcc/config/rs6000/aix72.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER7
gcc/config/rs6000/aix73.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER8
gcc/config/rs6000/aix73.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8
gcc/config/rs6000/darwin.h:#define PROCESSOR_DEFAULT  PROCESSOR_PPC7400
gcc/config/rs6000/darwin.h:#define PROCESSOR_DEFAULT64  PROCESSOR_POWER4
gcc/config/rs6000/freebsd64.h:#define PROCESSOR_DEFAULT PROCESSOR_PPC7450
gcc/config/rs6000/freebsd64.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8
gcc/config/rs6000/linux64.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7
gcc/config/rs6000/linux64.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8
gcc/config/rs6000/rs6000.h:#define PROCESSOR_DEFAULT   PROCESSOR_PPC603
gcc/config/rs6000/rs6000.h:#define PROCESSOR_DEFAULT64 PROCESSOR_RS64A
gcc/config/rs6000/vxworks.h:#define PROCESSOR_DEFAULT PROCESSOR_PPC604

, and they are unlikely to be updated later, no?

btw, the given --with-cpu=future will make cpu_index never negative so

  ...
  else if (cpu_index >= 0)
rs6000_tune_index = tune_index = cpu_index;
  else
... 

so there is no chance to enter "else" arm, that is, that arm only takes
effect when no cpu/tune is given (neither -m{cpu,tune} nor --with-cpu=).

BR,
Kewen



[PATCH] libgcc, aarch64: Allow for BE platforms in heap trampolines.

2024-02-20 Thread Iain Sandoe
Andrew Pinski pointed out on irc, that the current implementation of the
heap trampoline code fragment would make the instruction byte order follow
memory byte order for BE AArch64, which is not what is required.

This patch revises the initializers so that instruction byte order is
independent of memory byte order.

I have tested this on aarch64-linux-gnu, aarch64-darwin and on a cross to
aarch64_be-linux-gnu (including compile tests on the latter, but I have no
way, at present, to carry out execute tests).

(Note that this patch is applied on top of the one for PR113971).

OK for trunk, or what would be a way forward?
thanks
Iain 

--- 8< ---

This arranges that the byte order of the instruction sequences is
independent of the byte order of memory.

libgcc/ChangeLog:

* config/aarch64/heap-trampoline.c
(aarch64_trampoline_insns): Arrange to encode instructions as a
byte array so that the order is independent of memory byte order.
(struct aarch64_trampoline): Likewise.

Signed-off-by: Iain Sandoe 
---
 libgcc/config/aarch64/heap-trampoline.c | 30 -
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/libgcc/config/aarch64/heap-trampoline.c 
b/libgcc/config/aarch64/heap-trampoline.c
index 1e3460b1601..885df629da7 100644
--- a/libgcc/config/aarch64/heap-trampoline.c
+++ b/libgcc/config/aarch64/heap-trampoline.c
@@ -30,23 +30,23 @@ void __gcc_nested_func_ptr_created (void *chain, void 
*func, void *dst);
 void __gcc_nested_func_ptr_deleted (void);
 
 #if defined(__linux__)
-static const uint32_t aarch64_trampoline_insns[] = {
-  0xd503245f, /* hint34 */
-  0x58b1, /* ldr x17, .+20 */
-  0x58d2, /* ldr x18, .+24 */
-  0xd61f0220, /* br  x17 */
-  0xd5033f9f, /* dsb sy */
-  0xd5033fdf /* isb */
+static const unsigned char aarch64_trampoline_insns[6][4] = {
+  {0x5f, 0x24, 0x03, 0xd5}, /* hint34 */
+  {0xb1, 0x00, 0x00, 0x58}, /* ldr x17, .+20 */
+  {0xd2, 0x00, 0x00, 0x58}, /* ldr x18, .+24 */
+  {0x20, 0x02, 0x1f, 0xd6}, /* br  x17 */
+  {0x9f, 0x3f, 0x03, 0xd5}, /* dsb sy */
+  {0xdf, 0x3f, 0x03, 0xd5} /* isb */
 };
 
 #elif __APPLE__
-static const uint32_t aarch64_trampoline_insns[] = {
-  0xd503245f, /* hint34 */
-  0x58b1, /* ldr x17, .+20 */
-  0x58d0, /* ldr x16, .+24 */
-  0xd61f0220, /* br  x17 */
-  0xd5033f9f, /* dsb sy */
-  0xd5033fdf /* isb */
+static const unsigned char aarch64_trampoline_insns[6][4] = {
+  {0x5f, 0x24, 0x03, 0xd5}, /* hint34 */
+  {0xb1, 0x00, 0x00, 0x58}, /* ldr x17, .+20 */
+  {0xd0, 0x00, 0x00, 0x58}, /* ldr x16, .+24 */
+  {0x20, 0x02, 0x1f, 0xd6}, /* br  x17 */
+  {0x9f, 0x3f, 0x03, 0xd5}, /* dsb sy */
+  {0xdf, 0x3f, 0x03, 0xd5} /* isb */
 };
 
 #else
@@ -54,7 +54,7 @@ static const uint32_t aarch64_trampoline_insns[] = {
 #endif
 
 struct aarch64_trampoline {
-  uint32_t insns[6];
+  unsigned char insns[6][4];
   void *func_ptr;
   void *chain_ptr;
 };
-- 
2.39.2 (Apple Git-143)



Re: [PATCH]AArch64: update vget_set_lane_1.c test output

2024-02-20 Thread Richard Sandiford
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Thursday, February 1, 2024 4:42 PM
>> To: Tamar Christina 
>> Cc: Andrew Pinski ; gcc-patches@gcc.gnu.org; nd
>> ; Richard Earnshaw ; Marcus
>> Shawcroft ; Kyrylo Tkachov
>> 
>> Subject: Re: [PATCH]AArch64: update vget_set_lane_1.c test output
>> 
>> Tamar Christina  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford 
>> >> Sent: Thursday, February 1, 2024 2:24 PM
>> >> To: Andrew Pinski 
>> >> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; nd
>> >> ; Richard Earnshaw ; Marcus
>> >> Shawcroft ; Kyrylo Tkachov
>> >> 
>> >> Subject: Re: [PATCH]AArch64: update vget_set_lane_1.c test output
>> >>
>> >> Andrew Pinski  writes:
>> >> > On Thu, Feb 1, 2024 at 1:26 AM Tamar Christina 
>> >> wrote:
>> >> >>
>> >> >> Hi All,
>> >> >>
>> >> >> In the vget_set_lane_1.c test the following entries now generate a zip1
>> instead
>> >> of an INS
>> >> >>
>> >> >> BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
>> >> >> BUILD_TEST (int32x2_t,   int32x2_t,   , , s32, 1, 0)
>> >> >> BUILD_TEST (uint32x2_t,  uint32x2_t,  , , u32, 1, 0)
>> >> >>
>> >> >> This is because the non-Q variant for indices 0 and 1 are just 
>> >> >> shuffling values.
>> >> >> There is no perf difference between INS SIMD to SIMD and ZIP, as such 
>> >> >> just
>> >> update the
>> >> >> test file.
>> >> > Hmm, is this true on all cores? I suspect there is a core out there
>> >> > where INS is implemented with a much lower latency than ZIP.
>> >> > If we look at config/aarch64/thunderx.md, we can see INS is 2 cycles
>> >> > while ZIP is 6 cycles (3/7 for q versions).
>> >> > Now I don't have any invested interest in that core any more but I
>> >> > just wanted to point out that is not exactly true for all cores.
>> >>
>> >> Thanks for the pointer.  In that case, perhaps we should prefer
>> >> aarch64_evpc_ins over aarch64_evpc_zip in
>> aarch64_expand_vec_perm_const_1?
>> >> That's enough to fix this failure, but it'll probably require other
>> >> tests to be adjusted...
>> >
>> > I think given that Thundex-X is a 10 year old micro-architecture that is 
>> > several
>> cases where
>> > often used instructions have very high latencies that generic codegen 
>> > should not
>> be blocked
>> > from progressing because of it.
>> >
>> > we use zips in many things and if thunderx codegen is really of that much
>> importance then I
>> > think the old codegen should be gated behind -mcpu=thunderx rather than
>> preventing generic
>> > changes.
>> 
>> But you said there was no perf difference between INS and ZIP, so it
>> sounds like for all known cases, using INS rather than ZIP is either
>> neutral or better.
>> 
>> There's also the possible secondary benefit that the INS patterns use
>> standard RTL operations whereas the ZIP patterns use unspecs.
>> 
>> Keeping ZIP seems OK there's a specific reason to prefer it over INS for
>> more modern cores though.
>
> Ok, that's a fair point.  Doing some due diligence, Neoverse-E1 and
> Cortex-A65 SWoGs seem to imply that there ZIPs have better throughput
> than INSs. However the entries are inconsistent and I can't measure the
> difference so I believe this to be a documentation bug.
>
> That said, switching the operands seems to show one issue in that preferring
> INS degenerates code in cases where we are inserting the top bits of the first
> parameter into the bottom of the second parameter and returning,
>
> Zip being a Three operand instruction allows us to put the result into the 
> final
> destination register with one operation whereas INS requires an fmov:
>
> foo_uzp1_s32:
> ins v0.s[1], v1.s[0]
> fmovd0, d0
> ret
> foo_uzp2_s32:
> ins v1.s[0], v0.s[1]
> fmovd0, d1
> ret

Ah, yeah, I should have thought about that.

In that case, the original patch is OK, thanks.

Richard


Re: [PATCH] c-family, c++, v2: Fix up handling of types which may have padding in __atomic_{compare_}exchange

2024-02-20 Thread Jakub Jelinek
On Tue, Feb 20, 2024 at 11:11:22AM +0100, Richard Biener wrote:
> > --- gcc/c-family/c-common.cc.jj 2024-02-17 16:40:42.831571693 +0100
> > +++ gcc/c-family/c-common.cc2024-02-20 10:58:56.599865656 +0100
> > @@ -7793,9 +7793,14 @@ resolve_overloaded_atomic_exchange (loca
> >/* Convert object pointer to required type.  */
> >p0 = build1 (VIEW_CONVERT_EXPR, I_type_ptr, p0);
> >(*params)[0] = p0; 
> > -  /* Convert new value to required type, and dereference it.  */
> > -  p1 = build_indirect_ref (loc, p1, RO_UNARY_STAR);
> > -  p1 = build1 (VIEW_CONVERT_EXPR, I_type, p1);
> > +  /* Convert new value to required type, and dereference it.
> > + If *p1 type can have padding or may involve floating point which
> > + could e.g. be promoted to wider precision and demoted afterwards,
> > + state of padding bits might not be preserved.  */
> > +  build_indirect_ref (loc, p1, RO_UNARY_STAR);
> > +  p1 = build2_loc (loc, MEM_REF, I_type,
> > +  build1 (VIEW_CONVERT_EXPR, I_type_ptr, p1),
> 
> Why the V_C_E to I_type_ptr?  The type of p1 doesn't
> really matter (unless it could be a non-pointer).

Just to help the FE when trying to constexpr evaluate it, because
for constexpr evaluation it evaluates MEM_REF the same as INDIRECT_REF
(and punts on non-0 second argument).  The actual call is non-constexpr,
just wanted to avoid ICEs or something similar, constexpr evaluation can
try to process the arguments (and succeed or fail, doesn't matter,
but not ICE) and then the call will not be constant expression and so
everything won't be.

> Also note that I_type needs to be properly address-space qualified
> in case the access should be to an address-space.  Formerly with
> the INDIRECT_REF that would likely be automagic.

I don't think using __atomic_*exchange on non-default as is valid,
if one doesn't have the exact __atomic_*exchange_N, it is handled
as a library call which is passed pointers and that definitely will
not deal with non-default address spaces.
Furthermore, the type should be the same in all arguments, and
the first argument is just converted to I_type_ptr and dealt with later, so
I don't think it ever worked even for the supported sizes.
int 

 
foo (__seg_gs int *a, __seg_gs int *b, __seg_gs int *c) 

 
{   

  
  return __atomic_compare_exchange (a, b, c, 0, __ATOMIC_RELAXED, 
__ATOMIC_RELAXED);  
   
}   

  
results in
movl(%rsi), %eax
movl%gs:(%rdx), %edx
lock cmpxchgl   %edx, (%rdi)
sete%dl
je  .L2
movl%eax, (%rsi)
.L2:
i.e. pretty much random what is loaded from a different AS and what is not.

Jakub



[PATCH] aarch64: Allow aarch64-linux-muscl for heap trampolines [PR113971].

2024-02-20 Thread Iain Sandoe
Tested on aarch64-linux-gnu, aarch64-darwin by me and on aarch64-linux-musl
by Sam James (thanks!).  OK for trunk?
thanks
Iain

--- 8< ---


This allows the same trampoline pattern to be used on all linux variants
rather than restricting it to linux gnu.

PR target/113971.

libgcc/ChangeLog:

* config/aarch64/heap-trampoline.c: Allow all linux variants.

Signed-off-by: Iain Sandoe 
---
 libgcc/config/aarch64/heap-trampoline.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/aarch64/heap-trampoline.c 
b/libgcc/config/aarch64/heap-trampoline.c
index 9d5b19983b1..1e3460b1601 100644
--- a/libgcc/config/aarch64/heap-trampoline.c
+++ b/libgcc/config/aarch64/heap-trampoline.c
@@ -29,7 +29,7 @@ void *allocate_trampoline_page (void);
 void __gcc_nested_func_ptr_created (void *chain, void *func, void *dst);
 void __gcc_nested_func_ptr_deleted (void);
 
-#if defined(__gnu_linux__)
+#if defined(__linux__)
 static const uint32_t aarch64_trampoline_insns[] = {
   0xd503245f, /* hint34 */
   0x58b1, /* ldr x17, .+20 */
@@ -82,7 +82,7 @@ allocate_trampoline_page (void)
 {
   void *page;
 
-#if defined(__gnu_linux__)
+#if defined(__linux__)
   page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
   MAP_ANON | MAP_PRIVATE, 0, 0);
 #elif __APPLE__
-- 
2.39.2 (Apple Git-143)



[PATCH 4/5] bpf: implementation of func_info in .BTF.ext.

2024-02-20 Thread Cupertino Miranda
Kernel verifier complains in some particular cases for missing func_info
implementation in .BTF.ext. This patch implements it.

Strings are cached locally in coreout.cc to avoid adding duplicated
strings in the string list. This string deduplication should eventually
be moved to the CTFC functions such that this happens widely.

With this implementation, the CO-RE relocations information was also
simplified and integrated with the FuncInfo structures.

gcc/Changelog:
PR target/113453
* config/bpf/bpf.cc (bpf_function_prologue): Defined target
hook.
* config/bpf/coreout.cc (brf_ext_info_section)
(btf_ext_info): Moved from coreout.h
(btf_ext_funcinfo, btf_ext_lineinfo): Added struct.
(bpf_core_reloc): Renamed to btf_ext_core_reloc.
(btf_ext): Added static variable.
(btfext_info_sec_find_or_add, SEARCH_NODE_AND_RETURN)
(bpf_create_or_find_funcinfo, bpt_create_core_reloc)
(btf_ext_add_string, btf_funcinfo_type_callback)
(btf_add_func_info_for, btf_validate_funcinfo)
(btf_ext_info_len, output_btfext_func_info): Added function.
(output_btfext_header, bpf_core_reloc_add)
(output_btfext_core_relocs, btf_ext_init, btf_ext_output):
Changed to support new structs.
* config/bpf/coreout.h (btf_ext_funcinfo, btf_ext_lineinfo):
Moved and changed in coreout.cc.
(btf_add_func_info_for, btf_ext_add_string): Added prototypes.

gcc/testsuite/ChangeLog:
PR target/113453
* gcc.target/bpf/btfext-funcinfo-nocore.c: Added.
* gcc.target/bpf/btfext-funcinfo.c: Added.
* gcc.target/bpf/core-attr-5.c: Fixed regexp.
* gcc.target/bpf/core-attr-6.c: Fixed regexp.
* gcc.target/bpf/core-builtin-fieldinfo-offset-1.c: Fixed regexp.
* gcc.target/bpf/core-section-1.c: Fixed regexp
---
 gcc/config/bpf/bpf.cc |  12 +
 gcc/config/bpf/coreout.cc | 518 +-
 gcc/config/bpf/coreout.h  |  20 +-
 .../gcc.target/bpf/btfext-funcinfo-nocore.c   |  42 ++
 .../gcc.target/bpf/btfext-funcinfo.c  |  46 ++
 gcc/testsuite/gcc.target/bpf/core-attr-5.c|   9 +-
 gcc/testsuite/gcc.target/bpf/core-attr-6.c|   6 +-
 .../bpf/core-builtin-fieldinfo-offset-1.c |  13 +-
 gcc/testsuite/gcc.target/bpf/core-section-1.c |   2 +-
 9 files changed, 506 insertions(+), 162 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/btfext-funcinfo-nocore.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/btfext-funcinfo.c

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 4318b26b9cda..ea47e3a8dbfb 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -385,6 +385,18 @@ bpf_compute_frame_layout (void)
 #undef TARGET_COMPUTE_FRAME_LAYOUT
 #define TARGET_COMPUTE_FRAME_LAYOUT bpf_compute_frame_layout
 
+/* Defined to initialize data for func_info region in .BTF.ext section.  */
+
+static void
+bpf_function_prologue (FILE *f ATTRIBUTE_UNUSED)
+{
+  if (btf_debuginfo_p ())
+btf_add_func_info_for (cfun->decl, current_function_func_begin_label);
+}
+
+#undef TARGET_ASM_FUNCTION_PROLOGUE
+#define TARGET_ASM_FUNCTION_PROLOGUE bpf_function_prologue
+
 /* Expand to the instructions in a function prologue.  This function
is called when expanding the 'prologue' pattern in bpf.md.  */
 
diff --git a/gcc/config/bpf/coreout.cc b/gcc/config/bpf/coreout.cc
index 2f06ec2a0f29..31b2abc3151b 100644
--- a/gcc/config/bpf/coreout.cc
+++ b/gcc/config/bpf/coreout.cc
@@ -31,6 +31,7 @@
 #include "btf.h"
 #include "rtl.h"
 #include "tree-pretty-print.h"
+#include "cgraph.h"
 
 #include "coreout.h"
 
@@ -95,64 +96,193 @@
result, a single .BTF.ext section can contain CO-RE relocations for multiple
programs in distinct sections.  */
 
-/* Internal representation of a BPF CO-RE relocation record.  */
+/* BTF.ext debug info section.  */
+static GTY (()) section * btf_ext_info_section;
+
+#ifndef BTF_EXT_INFO_SECTION_NAME
+#define BTF_EXT_INFO_SECTION_NAME ".BTF.ext"
+#endif
+#define BTF_EXT_INFO_SECTION_FLAGS (SECTION_DEBUG)
+
+#ifndef BTF_EXT_INFO_SECTION_LABEL
+#define BTF_EXT_INFO_SECTION_LABEL "Lbtfext"
+#endif
+
+#define MAX_BTF_EXT_LABEL_BYTES 40
+static char btf_ext_info_section_label[MAX_BTF_EXT_LABEL_BYTES];
+
+/* A funcinfo record, in the .BTF.ext funcinfo section.  */
+struct GTY ((chain_next ("%h.next"))) btf_ext_funcinfo
+{
+  uint32_t type; /* Type ID of a BTF_KIND_FUNC type.  */
+  const char *fnname;
+  const char *label;
+
+  struct btf_ext_funcinfo *next; /* Linked list to collect func_info elems.  */
+};
+
+/* A lineinfo record, in the .BTF.ext lineinfo section.  */
+struct GTY ((chain_next ("%h.next"))) btf_ext_lineinfo
+{
+  uint32_t insn_off;  /* Offset of the instruction.  */
+  uint32_t file_name_off; /* Offset of file name in BTF string table.  */
+  uint32_t line_off;  /* Offset of source line in BTF string table.  */
+  uint32_t 

[PATCH 5/5] bpf: renamed coreout.* files to btfext-out.*.

2024-02-20 Thread Cupertino Miranda
gcc/ChangeLog:
* config.gcc (target_gtfiles): changed coreout to btfext-out.
(extra_objs): changed coreout to btfext-out.
* config/bpf/coreout.cc: Renamed to btfext-out.cc
* config/bpf/btfext-out.cc: Added
* config/bpf/coreout.h: Renamed to btfext-out.h
* config/bpf/btfext-out.h: Added
* config/bpf/core-builtins.cc: Changed include
* config/bpf/core-builtins.h: Changed include
* config/bpf/t-bpf: Renamed file.
---
 gcc/config.gcc   | 4 ++--
 gcc/config/bpf/{coreout.cc => btfext-out.cc} | 4 ++--
 gcc/config/bpf/{coreout.h => btfext-out.h}   | 2 +-
 gcc/config/bpf/core-builtins.cc  | 2 +-
 gcc/config/bpf/core-builtins.h   | 2 +-
 gcc/config/bpf/t-bpf | 4 ++--
 6 files changed, 9 insertions(+), 9 deletions(-)
 rename gcc/config/bpf/{coreout.cc => btfext-out.cc} (99%)
 rename gcc/config/bpf/{coreout.h => btfext-out.h} (98%)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a0f9c6723083..1ca033d75b66 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1653,8 +1653,8 @@ bpf-*-*)
 tmake_file="${tmake_file} bpf/t-bpf"
 use_collect2=no
 use_gcc_stdint=provide
-extra_objs="coreout.o core-builtins.o"
-target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc 
\$(srcdir)/config/bpf/core-builtins.cc"
+extra_objs="btfext-out.o core-builtins.o"
+target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/btfext-out.cc 
\$(srcdir)/config/bpf/core-builtins.cc"
 ;;
 cris-*-elf | cris-*-none)
tm_file="elfos.h newlib-stdint.h ${tm_file}"
diff --git a/gcc/config/bpf/coreout.cc b/gcc/config/bpf/btfext-out.cc
similarity index 99%
rename from gcc/config/bpf/coreout.cc
rename to gcc/config/bpf/btfext-out.cc
index 31b2abc3151b..4281cca83e13 100644
--- a/gcc/config/bpf/coreout.cc
+++ b/gcc/config/bpf/btfext-out.cc
@@ -33,7 +33,7 @@
 #include "tree-pretty-print.h"
 #include "cgraph.h"
 
-#include "coreout.h"
+#include "btfext-out.h"
 
 /* This file contains data structures and routines for construction and output
of BPF Compile Once - Run Everywhere (BPF CO-RE) information.
@@ -618,4 +618,4 @@ btf_ext_output (void)
   dw2_asm_output_data (4, 0, "Required padding by libbpf structs");
 }
 
-#include "gt-coreout.h"
+#include "gt-btfext-out.h"
diff --git a/gcc/config/bpf/coreout.h b/gcc/config/bpf/btfext-out.h
similarity index 98%
rename from gcc/config/bpf/coreout.h
rename to gcc/config/bpf/btfext-out.h
index 1c26b9274739..b36309475c97 100644
--- a/gcc/config/bpf/coreout.h
+++ b/gcc/config/bpf/btfext-out.h
@@ -1,4 +1,4 @@
-/* coreout.h - Declarations and definitions related to
+/* btfext-out.h - Declarations and definitions related to
BPF Compile Once - Run Everywhere (CO-RE) support.
Copyright (C) 2021-2024 Free Software Foundation, Inc.
 
diff --git a/gcc/config/bpf/core-builtins.cc b/gcc/config/bpf/core-builtins.cc
index aa75fd68cae6..8d8c54c1fb3d 100644
--- a/gcc/config/bpf/core-builtins.cc
+++ b/gcc/config/bpf/core-builtins.cc
@@ -45,7 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "ctfc.h"
 #include "btf.h"
-#include "coreout.h"
+#include "btfext-out.h"
 #include "core-builtins.h"
 
 /* BPF CO-RE builtins definition.
diff --git a/gcc/config/bpf/core-builtins.h b/gcc/config/bpf/core-builtins.h
index c54f6ddac812..e56b55b94e0c 100644
--- a/gcc/config/bpf/core-builtins.h
+++ b/gcc/config/bpf/core-builtins.h
@@ -1,7 +1,7 @@
 #ifndef BPF_CORE_BUILTINS_H
 #define BPF_CORE_BUILTINS_H
 
-#include "coreout.h"
+#include "btfext-out.h"
 
 enum bpf_builtins
 {
diff --git a/gcc/config/bpf/t-bpf b/gcc/config/bpf/t-bpf
index 18f1fa67794d..dc50332350c4 100644
--- a/gcc/config/bpf/t-bpf
+++ b/gcc/config/bpf/t-bpf
@@ -1,7 +1,7 @@
 
-TM_H += $(srcdir)/config/bpf/coreout.h $(srcdir)/config/bpf/core-builtins.h
+TM_H += $(srcdir)/config/bpf/btfext-out.h $(srcdir)/config/bpf/core-builtins.h
 
-coreout.o: $(srcdir)/config/bpf/coreout.cc
+btfext-out.o: $(srcdir)/config/bpf/btfext-out.cc
$(COMPILE) $<
$(POSTCOMPILE)
 
-- 
2.39.2



[PATCH 3/5] btf: moved btf deallocation to final.

2024-02-20 Thread Cupertino Miranda
Dissociated .BTF.ext from the CO-RE relocations creation. Improvement of
allocation/deallocation of BTF structures. Moving deallocation to final
when needed.

gcc/ChangeLog:

* config/bpf/bpf.cc (bpf_option_override): Make BTF.ext enabled
by default for BPF.
(btf_asm_init_sections): Add btf deallocation.
* dwarf2ctf.cc (ctf_debug_finalize): Fixed btf deallocation.
---
 gcc/config/bpf/bpf.cc | 20 +---
 gcc/dwarf2ctf.cc  |  5 -
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index d6ca47eeecbe..4318b26b9cda 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -195,10 +195,8 @@ bpf_option_override (void)
   if (TARGET_BPF_CORE && !btf_debuginfo_p ())
 error ("BPF CO-RE requires BTF debugging information, use %<-gbtf%>");
 
-  /* To support the portability needs of BPF CO-RE approach, BTF debug
- information includes the BPF CO-RE relocations.  */
-  if (TARGET_BPF_CORE)
-write_symbols |= BTF_WITH_CORE_DEBUG;
+  /* BPF applications always generate .BTF.ext.  */
+  write_symbols |= BTF_WITH_CORE_DEBUG;
 
   /* Unlike much of the other BTF debug information, the information necessary
  for CO-RE relocations is added to the CTF container by the BPF backend.
@@ -218,10 +216,7 @@ bpf_option_override (void)
   /* -gbtf implies -mcore when using the BPF backend, unless -mno-co-re
  is specified.  */
   if (btf_debuginfo_p () && !(target_flags_explicit & MASK_BPF_CORE))
-{
-  target_flags |= MASK_BPF_CORE;
-  write_symbols |= BTF_WITH_CORE_DEBUG;
-}
+target_flags |= MASK_BPF_CORE;
 
   /* Determine available features from ISA setting (-mcpu=).  */
   if (bpf_has_jmpext == -1)
@@ -267,7 +262,7 @@ bpf_option_override (void)
 static void
 bpf_asm_init_sections (void)
 {
-  if (TARGET_BPF_CORE)
+  if (btf_debuginfo_p () && btf_with_core_debuginfo_p ())
 btf_ext_init ();
 }
 
@@ -279,8 +274,11 @@ bpf_asm_init_sections (void)
 static void
 bpf_file_end (void)
 {
-  if (TARGET_BPF_CORE)
-btf_ext_output ();
+  if (btf_debuginfo_p () && btf_with_core_debuginfo_p ())
+{
+  btf_ext_output ();
+  btf_finalize ();
+}
 }
 
 #undef TARGET_ASM_FILE_END
diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc
index 93e5619933fa..b9dfecf2c1c4 100644
--- a/gcc/dwarf2ctf.cc
+++ b/gcc/dwarf2ctf.cc
@@ -944,7 +944,10 @@ ctf_debug_finalize (const char *filename, bool btf)
   if (btf)
 {
   btf_output (filename);
-  btf_finalize ();
+  /* btf_finalize when compiling BPF applciations gets deallocated by the
+BPF target in bpf_file_end.  */
+  if (btf_debuginfo_p () && !btf_with_core_debuginfo_p ())
+   btf_finalize ();
 }
 
   else
-- 
2.39.2



[PATCH 2/5] btf: added KIND_FUNC traversal function.

2024-02-20 Thread Cupertino Miranda
Added a traversal function to traverse all BTF_KIND_FUNC nodes with a
callback function. Used for .BTF.ext section content creation.

gcc/ChangeLog

* btfout.cc (output_btf_func_types): use FOR_EACH_VEC_ELT.
(traverse_btf_func_types): Defined function.
* ctfc.h (funcs_traverse_callback): typedef for function
prototype.
(traverse_btf_func_types): Added prototype.
---
 gcc/btfout.cc | 22 --
 gcc/ctfc.h|  3 +++
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 7e114e224449..7aabd99f3e7c 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -1276,8 +1276,10 @@ output_btf_types (ctf_container_ref ctfc)
 static void
 output_btf_func_types (ctf_container_ref ctfc)
 {
-  for (size_t i = 0; i < vec_safe_length (funcs); i++)
-btf_asm_func_type (ctfc, (*funcs)[i], i);
+  ctf_dtdef_ref ref;
+  unsigned i;
+  FOR_EACH_VEC_ELT (*funcs, i, ref)
+btf_asm_func_type (ctfc, ref, i);
 }
 
 /* Output all BTF_KIND_DATASEC records.  */
@@ -1452,4 +1454,20 @@ btf_finalize (void)
   tu_ctfc = NULL;
 }
 
+/* Traversal function for all BTF_KIND_FUNC type records.  */
+
+bool
+traverse_btf_func_types (funcs_traverse_callback callback, void *data)
+{
+  ctf_dtdef_ref ref;
+  unsigned i;
+  FOR_EACH_VEC_ELT (*funcs, i, ref)
+{
+  bool stop = callback (ref, data);
+  if (stop == true)
+   return true;
+}
+  return false;
+}
+
 #include "gt-btfout.h"
diff --git a/gcc/ctfc.h b/gcc/ctfc.h
index 7aac57edac55..fa188bf2f5a4 100644
--- a/gcc/ctfc.h
+++ b/gcc/ctfc.h
@@ -441,6 +441,9 @@ extern int ctf_add_variable (ctf_container_ref, const char 
*, ctf_id_t,
 extern ctf_id_t ctf_lookup_tree_type (ctf_container_ref, const tree);
 extern ctf_id_t get_btf_id (ctf_id_t);
 
+typedef bool (*funcs_traverse_callback) (ctf_dtdef_ref, void *);
+bool traverse_btf_func_types (funcs_traverse_callback, void *);
+
 /* CTF section does not emit location information; at this time, location
information is needed for BTF CO-RE use-cases.  */
 
-- 
2.39.2



[PATCH 1/5] btf: fixed type id in BTF_KIND_FUNC struct data.

2024-02-20 Thread Cupertino Miranda
This patch correct the aditition of +1 on the type id, which originally
was done in the wrong location and leaded to func_sts->dtd_type for
BTF_KIND_FUNCS struct data to contain the type id of the previous entry.

gcc/ChangeLog:
* btfout.cc (btf_collect_dataset): Corrected BTF type id.
---
 gcc/btfout.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index dcf751f8fe0d..7e114e224449 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -457,7 +457,8 @@ btf_collect_datasec (ctf_container_ref ctfc)
   func_dtd->dtd_data.ctti_type = dtd->dtd_type;
   func_dtd->linkage = dtd->linkage;
   func_dtd->dtd_name = dtd->dtd_name;
-  func_dtd->dtd_type = num_types_added + num_types_created;
+  /* +1 for the sentinel type not in the types map.  */
+  func_dtd->dtd_type = num_types_added + num_types_created + 1;
 
   /* Only the BTF_KIND_FUNC type actually references the name. The
 BTF_KIND_FUNC_PROTO is always anonymous.  */
@@ -480,8 +481,7 @@ btf_collect_datasec (ctf_container_ref ctfc)
 
  struct btf_var_secinfo info;
 
- /* +1 for the sentinel type not in the types map.  */
- info.type = func_dtd->dtd_type + 1;
+ info.type = func_dtd->dtd_type;
 
  /* Both zero at compile time.  */
  info.size = 0;
-- 
2.39.2



bpf: PR target/113453 func_info .BTF.ext implementation

2024-02-20 Thread Cupertino Miranda
Good morning,

This is a patch series with the implementation of func_info region
within bpf target .BTF.ext section.
Considering the required changes it also implied some changes in BTF and
in the original CO-RE implementation, more specifically the structure
used and how the relocations were created.

Looking forward to your review.

Best regards,
Cupertino




Re: [PATCH] ipa: Convert lattices from pure array to vector (PR 113476)

2024-02-20 Thread Jan Hubicka
> On Tue, Feb 13 2024, Martin Jambor wrote:
> > On Mon, Feb 12 2024, Jan Hubicka wrote:
> >>> Believe it or not, even though I have re-worked the internals of the
> >>> lattices completely, the array itself is older than my involvement with
> >>> GCC (or at least with ipa-cp.c ;-).
> >>> 
> >>> So it being an array and not a vector is historical coincidence, as far
> >>> as I am concerned :-).  But that may be the reason, or because vector
> >>> macros at that time looked scary, or perhaps the initialization by
> >>> XCNEWVEC zeroing everything out was considered attractive (I kind of
> >>> like that but constructors would probably be cleaner), I don't know.
> >>
> >> If your class is no longer a POD, then the clearing before construcion
> >> is dead and GCC may optimize it out.  So fixing this may solve some
> >> surprised in foreseable future when we will try to compile older GCC's
> >> with newer ones.
> >>
> >
> > That's a good point.  I'll prepare a patch converting the whole thing to
> > use constructors and vectors.
> >
> 
> In PR 113476 we have discovered that ipcp_param_lattices is no longer
> a POD and should be destructed.  In a follow-up discussion it
> transpired that their initialization done by memsetting their backing
> memory to zero is also invalid because now any write there before
> construction can be considered dead.  Plus that having them in an
> array is a little bit old-school and does not get the extra checking
> offered by vector along with automatic construction and destruction
> when necessary.
> 
> So this patch converts the array to a vector.  That however means that
> ipcp_param_lattices cannot be just a forward declared type but must be
> known to all code that deal with ipa_node_params and thus to all code
> that includes ipa-prop.h.  Therefore I have moved ipcp_param_lattices
> and the type it depends on to a new header ipa-cp.h which now
> ipa-prop.h depends on.  Because we have the (IMHO not a very wise)
> rule that headers don't include what they need themselves, I had to
> add inclusions of ipa-cp.h and sreal.h (on which it depends) to very
> many files, which made the patch rather ugly.
> 
> Bootstrapped and tested on x86_64-linux.  I also had it checked by our
> script which builds more than a hundred of cross-compilers, so other
> targets are hopefully also fine.
> 
> OK for master?
> 
> Martin
> 
> 
> gcc/lto/ChangeLog:
> 
> 2024-02-16  Martin Jambor  
> 
>   * lto-common.cc: Include sreal.h and ipa-cp.h.
>   * lto-partition.cc: Include ipa-cp.h, move inclusion of sreal higher.
>   * lto.cc: Include sreal.h and ipa-cp.h.
> 
> gcc/ChangeLog:
> 
> 2024-02-16  Martin Jambor  
> 
>   * ipa-prop.h (ipa_node_params): Convert lattices to a vector, adjust
>   initializers in the contructor.
>   (ipa_node_params::~ipa_node_params): Release lattices as a vector.
>   * ipa-cp.h: New file.
>   * ipa-cp.cc: Include sreal.h and ipa-cp.h.
>   (ipcp_value_source): Move to ipa-cp.h.
>   (ipcp_value_base): Likewise.
>   (ipcp_value): Likewise.
>   (ipcp_lattice): Likewise.
>   (ipcp_agg_lattice): Likewise.
>   (ipcp_bits_lattice): Likewise.
>   (ipcp_vr_lattice): Likewise.
>   (ipcp_param_lattices): Likewise.
>   (ipa_get_parm_lattices): Remove assert latticess is non-NULL).
>   (ipa_value_from_jfunc): Adjust a check for empty lattices.
>   (ipa_context_from_jfunc): Likewise.
>   (ipa_agg_value_from_jfunc): Likewise.
>   (merge_agg_lats_step): Do not memset new aggregate lattices to zero.
>   (ipcp_propagate_stage): Allocate lattices in a vector as opposed to
>   just in contiguous memory.
>   (ipcp_store_vr_results): Adjust a check for empty lattices.
>   * auto-profile.cc: Include sreal.h and ipa-cp.h.
>   * cgraph.cc: Likewise.
>   * cgraphclones.cc: Likewise.
>   * cgraphunit.cc: Likewise.
>   * config/aarch64/aarch64.cc: Likewise.
>   * config/i386/i386-builtins.cc: Likewise.
>   * config/i386/i386-expand.cc: Likewise.
>   * config/i386/i386-features.cc: Likewise.
>   * config/i386/i386-options.cc: Likewise.
>   * config/i386/i386.cc: Likewise.
>   * config/rs6000/rs6000.cc: Likewise.
>   * config/s390/s390.cc: Likewise.
>   * gengtype.cc (open_base_files): Added sreal.h and ipa-cp.h to the
>   files to be included in gtype-desc.cc.
>   * gimple-range-fold.cc: Include sreal.h and ipa-cp.h.
>   * ipa-devirt.cc: Likewise.
>   * ipa-fnsummary.cc: Likewise.
>   * ipa-icf.cc: Likewise.
>   * ipa-inline-analysis.cc: Likewise.
>   * ipa-inline-transform.cc: Likewise.
>   * ipa-inline.cc: Include ipa-cp.h, move inclusion of sreal.h higher.
>   * ipa-modref.cc: Include sreal.h and ipa-cp.h.
>   * ipa-param-manipulation.cc: Likewise.
>   * ipa-predicate.cc: Likewise.
>   * ipa-profile.cc: Likewise.
>   * ipa-prop.cc: Likewise.
>   (ipa_node_params_t::duplicate): Assert new lattices 

Re: [PATCH] c-family, c++, v2: Fix up handling of types which may have padding in __atomic_{compare_}exchange

2024-02-20 Thread Richard Biener
On Tue, 20 Feb 2024, Jakub Jelinek wrote:

> On Tue, Feb 20, 2024 at 09:01:10AM +0100, Richard Biener wrote:
> > I'm not sure those would be really equivalent (MEM_REF vs. V_C_E
> > as well as combined vs. split).  It really depends how RTL expansion
> > handles this (as you can see padding can be fun here).
> > 
> > So I'd be nervous for a match.pd rule here (also we can't match
> > memory defs).
> 
> Ok.  Perhaps forwprop then; anyway, that would be an optimization.
> 
> > As for your patch I'd go with a MEM_REF unconditionally, I don't
> > think we want different behavior whether there's padding or not ...
> 
> I've made it conditional so that the MEM_REFs don't appear that often in the
> FE trees, but maybe that is fine.
> 
> The unconditional patch would then be:
> 
> 2024-02-20  Jakub Jelinek  
> 
> gcc/c-family/
>   * c-common.cc (resolve_overloaded_atomic_exchange): Instead of setting
>   p1 to VIEW_CONVERT_EXPR (*p1), set it to MEM_REF with p1 and
>   (typeof (p1)) 0 operands and I_type type.
>   (resolve_overloaded_atomic_compare_exchange): Similarly for p2.
> gcc/cp/
>   * pt.cc (tsubst_expr): Handle MEM_REF.
> gcc/testsuite/
>   * g++.dg/ext/atomic-5.C: New test.
> 
> --- gcc/c-family/c-common.cc.jj   2024-02-17 16:40:42.831571693 +0100
> +++ gcc/c-family/c-common.cc  2024-02-20 10:58:56.599865656 +0100
> @@ -7793,9 +7793,14 @@ resolve_overloaded_atomic_exchange (loca
>/* Convert object pointer to required type.  */
>p0 = build1 (VIEW_CONVERT_EXPR, I_type_ptr, p0);
>(*params)[0] = p0; 
> -  /* Convert new value to required type, and dereference it.  */
> -  p1 = build_indirect_ref (loc, p1, RO_UNARY_STAR);
> -  p1 = build1 (VIEW_CONVERT_EXPR, I_type, p1);
> +  /* Convert new value to required type, and dereference it.
> + If *p1 type can have padding or may involve floating point which
> + could e.g. be promoted to wider precision and demoted afterwards,
> + state of padding bits might not be preserved.  */
> +  build_indirect_ref (loc, p1, RO_UNARY_STAR);
> +  p1 = build2_loc (loc, MEM_REF, I_type,
> +build1 (VIEW_CONVERT_EXPR, I_type_ptr, p1),

Why the V_C_E to I_type_ptr?  The type of p1 doesn't
really matter (unless it could be a non-pointer).

Also note that I_type needs to be properly address-space qualified
in case the access should be to an address-space.  Formerly with
the INDIRECT_REF that would likely be automagic.

> +build_zero_cst (TREE_TYPE (p1)));
>(*params)[1] = p1;
>  
>/* Move memory model to the 3rd position, and end param list.  */
> @@ -7873,9 +7878,14 @@ resolve_overloaded_atomic_compare_exchan
>p1 = build1 (VIEW_CONVERT_EXPR, I_type_ptr, p1);
>(*params)[1] = p1;
>  
> -  /* Convert desired value to required type, and dereference it.  */
> -  p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
> -  p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
> +  /* Convert desired value to required type, and dereference it.
> + If *p2 type can have padding or may involve floating point which
> + could e.g. be promoted to wider precision and demoted afterwards,
> + state of padding bits might not be preserved.  */
> +  build_indirect_ref (loc, p2, RO_UNARY_STAR);
> +  p2 = build2_loc (loc, MEM_REF, I_type,
> +build1 (VIEW_CONVERT_EXPR, I_type_ptr, p2),
> +build_zero_cst (TREE_TYPE (p2)));
>(*params)[2] = p2;
>  
>/* The rest of the parameters are fine. NULL means no special return value
> --- gcc/cp/pt.cc.jj   2024-02-17 16:40:42.868571182 +0100
> +++ gcc/cp/pt.cc  2024-02-20 10:57:36.646973603 +0100
> @@ -20088,6 +20088,14 @@ tsubst_expr (tree t, tree args, tsubst_f
>   RETURN (r);
>}
>  
> +case MEM_REF:
> +  {
> + tree op0 = RECUR (TREE_OPERAND (t, 0));
> + tree op1 = RECUR (TREE_OPERAND (t, 0));
> + tree new_type = tsubst (TREE_TYPE (t), args, complain, in_decl);
> + RETURN (build2_loc (EXPR_LOCATION (t), MEM_REF, new_type, op0, op1));
> +  }
> +
>  case NOP_EXPR:
>{
>   tree type = tsubst (TREE_TYPE (t), args, complain, in_decl);
> --- gcc/testsuite/g++.dg/ext/atomic-5.C.jj2024-02-20 10:57:36.647973589 
> +0100
> +++ gcc/testsuite/g++.dg/ext/atomic-5.C   2024-02-20 10:57:36.647973589 
> +0100
> @@ -0,0 +1,42 @@
> +// { dg-do compile { target c++14 } }
> +
> +template 
> +void
> +foo (long double *ptr, long double *val, long double *ret)
> +{
> +  __atomic_exchange (ptr, val, ret, __ATOMIC_RELAXED);
> +}
> +
> +template 
> +bool
> +bar (long double *ptr, long double *exp, long double *des)
> +{
> +  return __atomic_compare_exchange (ptr, exp, des, false,
> + __ATOMIC_RELAXED, __ATOMIC_RELAXED);
> +}
> +
> +bool
> +baz (long double *p, long double *q, long double *r)
> +{
> +  foo<0> (p, q, r);
> +  foo<1> (p + 1, q + 1, r + 1);
> +  return bar<0> (p + 2, q + 2, r + 2) || bar<1> (p + 3, q + 3, r + 3);
> +}
> +
> +constexpr int
> +qux 

Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

2024-02-20 Thread Richard Sandiford
Alex Coplan  writes:
> On 14/02/2024 11:18, Richard Sandiford wrote:
>> Alex Coplan  writes:
>> > This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch.
>> > The only part of the patch that isn't a straight cherry-pick is due to
>> > the TX iterator lacking TDmode for GCC 12, so this version adjusts
>> > TX_V16QI accordingly.
>> >
>> > Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the
>> > testsuite I saw were in
>> > gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output
>> > "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch
>> > since libhwasan gained the short granule tag feature, I've requested a
>> > backport of the following patch (committed as
>> > r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that
>> > (independent) issue for GCC 12:
>> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html
>> >
>> > OK for the GCC 12 branch?
>> 
>> OK, thanks.
>
> Thanks.  The patch cherry-picks cleanly on the GCC 11 branch, and
> bootstraps/regtests OK there.  Is it OK for GCC 11 too, even though the
> issue is latent there (at least for the testcase in the patch)?

Yeah, I think we should apply it there too, since the bug is likely
to manifest with other testcases.

Richard

> Alex
>
>> 
>> Richard
>> 
>> > Thanks,
>> > Alex
>> >
>> > -- >8 --
>> >
>> > The PR shows us ICEing due to an unrecognizable TFmode save emitted by
>> > aarch64_process_components.  The problem is that for T{I,F,D}mode we
>> > conservatively require mems to be in range for x-register ldp/stp.  That
>> > is because (at least for TImode) it can be allocated to both GPRs and
>> > FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
>> > a q-register load/store.
>> >
>> > As Richard pointed out in the PR, aarch64_get_separate_components
>> > already checks that the offsets are suitable for a single load, so we
>> > just need to choose a mode in aarch64_reg_save_mode that gives the full
>> > q-register range.  In this patch, we choose V16QImode as an alternative
>> > 16-byte "bag-of-bits" mode that doesn't have the artificial range
>> > restrictions imposed on T{I,F,D}mode.
>> >
>> > Unlike for GCC 14 we need additional handling in the load/store pair
>> > code as various cases are not expecting to see V16QImode (particularly
>> > the writeback patterns, but also aarch64_gen_load_pair).
>> >
>> > gcc/ChangeLog:
>> >
>> >PR target/111677
>> >* config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
>> >V16QImode for the full 16-byte FPR saves in the vector PCS case.
>> >(aarch64_gen_storewb_pair): Handle V16QImode.
>> >(aarch64_gen_loadwb_pair): Likewise.
>> >(aarch64_gen_load_pair): Likewise.
>> >* config/aarch64/aarch64.md (loadwb_pair_):
>> >Rename to ...
>> >(loadwb_pair_): ... this, extending to
>> >V16QImode.
>> >(storewb_pair_): Rename to ...
>> >(storewb_pair_): ... this, extending to
>> >V16QImode.
>> >* config/aarch64/iterators.md (TX_V16QI): New.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >PR target/111677
>> >* gcc.target/aarch64/torture/pr111677.c: New test.
>> >
>> > (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3)
>> > ---
>> >  gcc/config/aarch64/aarch64.cc | 13 ++-
>> >  gcc/config/aarch64/aarch64.md | 35 ++-
>> >  gcc/config/aarch64/iterators.md   |  3 ++
>> >  .../gcc.target/aarch64/torture/pr111677.c | 28 +++
>> >  4 files changed, 61 insertions(+), 18 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
>> >
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index 3bccd96a23d..2bbba323770 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno)
>> >case ARM_PCS_SIMD:
>> >/* The vector PCS saves the low 128 bits (which is the full
>> >   register on non-SVE targets).  */
>> > -  return TFmode;
>> > +  return V16QImode;
>> >  
>> >case ARM_PCS_SVE:
>> >/* Use vectors of DImode for registers that need frame
>> > @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx 
>> > base, rtx reg, rtx reg2,
>> >return gen_storewb_pairtf_di (base, base, reg, reg2,
>> >GEN_INT (-adjustment),
>> >GEN_INT (UNITS_PER_VREG - adjustment));
>> > +case E_V16QImode:
>> > +  return gen_storewb_pairv16qi_di (base, base, reg, reg2,
>> > + GEN_INT (-adjustment),
>> > + GEN_INT (UNITS_PER_VREG - adjustment));
>> >  default:
>> >gcc_unreachable ();
>> >  }
>> > @@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx 
>> > base, rtx reg, rtx reg2,
>> >  case E_TFmode:
>> >  

Re: [PATCH]AArch64: remove ls64 from being mandatory on armv8.7-a..

2024-02-20 Thread Richard Sandiford
Tamar Christina  writes:
> Hi,  this I a new version of the patch updating some additional tests
> because some of the LTO tests required a newer binutils than my distro had.
>
> ---
>
> The Arm Architectural Reference Manual (Version J.a, section A2.9 on 
> FEAT_LS64)
> shows that ls64 is an optional extensions and should not be enabled by default
> for Armv8.7-a.
>
> This drops it from the mandatory bits for the architecture and brings GCC 
> inline
> with LLVM and the achitecture.
>
> Note that we will not be changing binutils to preserve compatibility with 
> older
> released compilers.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? and backport to GCC 13,12,11?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-arches.def (AARCH64_ARCH): Remove LS64 from
> Armv8.7-a.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/aarch64/acle/ls64.C: Add +ls64.
> * g++.target/aarch64/acle/ls64_lto.C: Likewise.
> * gcc.target/aarch64/acle/ls64_lto.c: Likewise.
> * gcc.target/aarch64/acle/pr110100.c: Likewise.
> * gcc.target/aarch64/acle/pr110132.c: Likewise.
> * gcc.target/aarch64/options_set_28.c: Drop check for nols64.
> * gcc.target/aarch64/pragma_cpp_predefs_2.c: Correct header checks.

OK, thanks.

Richard

> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-arches.def 
> b/gcc/config/aarch64/aarch64-arches.def
> index 
> b7115ff7c3d4a7ee7abbedcb091ef15a7efacc79..9bec30e9203bac01155281ef3474846c402bb29e
>  100644
> --- a/gcc/config/aarch64/aarch64-arches.def
> +++ b/gcc/config/aarch64/aarch64-arches.def
> @@ -37,7 +37,7 @@ AARCH64_ARCH("armv8.3-a", generic_armv8_a,   V8_3A, 
> 8,  (V8_2A, PAUTH, R
>  AARCH64_ARCH("armv8.4-a", generic_armv8_a,   V8_4A, 8,  (V8_3A, 
> F16FML, DOTPROD, FLAGM))
>  AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 8,  (V8_4A, SB, 
> SSBS, PREDRES))
>  AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, 
> I8MM, BF16))
> -AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A, 
> LS64))
> +AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A))
>  AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, 
> MOPS))
>  AARCH64_ARCH("armv8.9-a", generic_armv8_a,   V8_9A, 8,  (V8_8A))
>  AARCH64_ARCH("armv8-r",   generic_armv8_a,   V8R  , 8,  (V8_4A))
> diff --git a/gcc/testsuite/g++.target/aarch64/acle/ls64.C 
> b/gcc/testsuite/g++.target/aarch64/acle/ls64.C
> index 
> d9002785b578741bde1202761f0881dc3d47e608..dcfe6f1af6711a7f3ec2562f6aabf56baecf417d
>  100644
> --- a/gcc/testsuite/g++.target/aarch64/acle/ls64.C
> +++ b/gcc/testsuite/g++.target/aarch64/acle/ls64.C
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-additional-options "-march=armv8.7-a" } */
> +/* { dg-additional-options "-march=armv8.7-a+ls64" } */
>  #include 
>  int main()
>  {
> diff --git a/gcc/testsuite/g++.target/aarch64/acle/ls64_lto.C 
> b/gcc/testsuite/g++.target/aarch64/acle/ls64_lto.C
> index 
> 274a4771e1c1d13bcb1a7bdc77c2e499726f024c..0198fe2a1b78627b873bf22e3d8416dbdcc77078
>  100644
> --- a/gcc/testsuite/g++.target/aarch64/acle/ls64_lto.C
> +++ b/gcc/testsuite/g++.target/aarch64/acle/ls64_lto.C
> @@ -1,5 +1,5 @@
>  /* { dg-do link { target aarch64_asm_ls64_ok } } */
> -/* { dg-additional-options "-march=armv8.7-a -flto" } */
> +/* { dg-additional-options "-march=armv8.7-a+ls64 -flto" } */
>  #include 
>  int main()
>  {
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_lto.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/ls64_lto.c
> index 
> 8b4f24277717675badc39dd145d365f75f5ceb27..0e5ae0b052b50b08d35151f4bc113617c1569bd3
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/acle/ls64_lto.c
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/ls64_lto.c
> @@ -1,5 +1,5 @@
>  /* { dg-do link { target aarch64_asm_ls64_ok } } */
> -/* { dg-additional-options "-march=armv8.7-a -flto" } */
> +/* { dg-additional-options "-march=armv8.7-a+ls64 -flto" } */
>  #include 
>  int main(void)
>  {
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/pr110100.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/pr110100.c
> index 
> f56d5e619e8ac23cdf720574bd6ee08fbfd36423..62a82b97c56debad092cc8fd1ed48f0219109cd7
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/acle/pr110100.c
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/pr110100.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-march=armv8.7-a -O2" } */
> +/* { dg-options "-march=armv8.7-a+ls64 -O2" } */
>  #include 
>  void do_st64b(data512_t data) {
>__arm_st64b((void*)0x1000, data);
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/pr110132.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/pr110132.c
> index 
> fb88d633dd20772fd96e976a400fe52ae0bc3647..423d91b9a99f269d01d07428414ade7cc518c711
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/acle/pr110132.c
> +++ 

[PATCH] c-family, c++, v2: Fix up handling of types which may have padding in __atomic_{compare_}exchange

2024-02-20 Thread Jakub Jelinek
On Tue, Feb 20, 2024 at 09:01:10AM +0100, Richard Biener wrote:
> I'm not sure those would be really equivalent (MEM_REF vs. V_C_E
> as well as combined vs. split).  It really depends how RTL expansion
> handles this (as you can see padding can be fun here).
> 
> So I'd be nervous for a match.pd rule here (also we can't match
> memory defs).

Ok.  Perhaps forwprop then; anyway, that would be an optimization.

> As for your patch I'd go with a MEM_REF unconditionally, I don't
> think we want different behavior whether there's padding or not ...

I've made it conditional so that the MEM_REFs don't appear that often in the
FE trees, but maybe that is fine.

The unconditional patch would then be:

2024-02-20  Jakub Jelinek  

gcc/c-family/
* c-common.cc (resolve_overloaded_atomic_exchange): Instead of setting
p1 to VIEW_CONVERT_EXPR (*p1), set it to MEM_REF with p1 and
(typeof (p1)) 0 operands and I_type type.
(resolve_overloaded_atomic_compare_exchange): Similarly for p2.
gcc/cp/
* pt.cc (tsubst_expr): Handle MEM_REF.
gcc/testsuite/
* g++.dg/ext/atomic-5.C: New test.

--- gcc/c-family/c-common.cc.jj 2024-02-17 16:40:42.831571693 +0100
+++ gcc/c-family/c-common.cc2024-02-20 10:58:56.599865656 +0100
@@ -7793,9 +7793,14 @@ resolve_overloaded_atomic_exchange (loca
   /* Convert object pointer to required type.  */
   p0 = build1 (VIEW_CONVERT_EXPR, I_type_ptr, p0);
   (*params)[0] = p0; 
-  /* Convert new value to required type, and dereference it.  */
-  p1 = build_indirect_ref (loc, p1, RO_UNARY_STAR);
-  p1 = build1 (VIEW_CONVERT_EXPR, I_type, p1);
+  /* Convert new value to required type, and dereference it.
+ If *p1 type can have padding or may involve floating point which
+ could e.g. be promoted to wider precision and demoted afterwards,
+ state of padding bits might not be preserved.  */
+  build_indirect_ref (loc, p1, RO_UNARY_STAR);
+  p1 = build2_loc (loc, MEM_REF, I_type,
+  build1 (VIEW_CONVERT_EXPR, I_type_ptr, p1),
+  build_zero_cst (TREE_TYPE (p1)));
   (*params)[1] = p1;
 
   /* Move memory model to the 3rd position, and end param list.  */
@@ -7873,9 +7878,14 @@ resolve_overloaded_atomic_compare_exchan
   p1 = build1 (VIEW_CONVERT_EXPR, I_type_ptr, p1);
   (*params)[1] = p1;
 
-  /* Convert desired value to required type, and dereference it.  */
-  p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
-  p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
+  /* Convert desired value to required type, and dereference it.
+ If *p2 type can have padding or may involve floating point which
+ could e.g. be promoted to wider precision and demoted afterwards,
+ state of padding bits might not be preserved.  */
+  build_indirect_ref (loc, p2, RO_UNARY_STAR);
+  p2 = build2_loc (loc, MEM_REF, I_type,
+  build1 (VIEW_CONVERT_EXPR, I_type_ptr, p2),
+  build_zero_cst (TREE_TYPE (p2)));
   (*params)[2] = p2;
 
   /* The rest of the parameters are fine. NULL means no special return value
--- gcc/cp/pt.cc.jj 2024-02-17 16:40:42.868571182 +0100
+++ gcc/cp/pt.cc2024-02-20 10:57:36.646973603 +0100
@@ -20088,6 +20088,14 @@ tsubst_expr (tree t, tree args, tsubst_f
RETURN (r);
   }
 
+case MEM_REF:
+  {
+   tree op0 = RECUR (TREE_OPERAND (t, 0));
+   tree op1 = RECUR (TREE_OPERAND (t, 0));
+   tree new_type = tsubst (TREE_TYPE (t), args, complain, in_decl);
+   RETURN (build2_loc (EXPR_LOCATION (t), MEM_REF, new_type, op0, op1));
+  }
+
 case NOP_EXPR:
   {
tree type = tsubst (TREE_TYPE (t), args, complain, in_decl);
--- gcc/testsuite/g++.dg/ext/atomic-5.C.jj  2024-02-20 10:57:36.647973589 
+0100
+++ gcc/testsuite/g++.dg/ext/atomic-5.C 2024-02-20 10:57:36.647973589 +0100
@@ -0,0 +1,42 @@
+// { dg-do compile { target c++14 } }
+
+template 
+void
+foo (long double *ptr, long double *val, long double *ret)
+{
+  __atomic_exchange (ptr, val, ret, __ATOMIC_RELAXED);
+}
+
+template 
+bool
+bar (long double *ptr, long double *exp, long double *des)
+{
+  return __atomic_compare_exchange (ptr, exp, des, false,
+   __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+}
+
+bool
+baz (long double *p, long double *q, long double *r)
+{
+  foo<0> (p, q, r);
+  foo<1> (p + 1, q + 1, r + 1);
+  return bar<0> (p + 2, q + 2, r + 2) || bar<1> (p + 3, q + 3, r + 3);
+}
+
+constexpr int
+qux (long double *ptr, long double *val, long double *ret)
+{
+  __atomic_exchange (ptr, val, ret, __ATOMIC_RELAXED);
+  return 0;
+}
+
+constexpr bool
+corge (long double *ptr, long double *exp, long double *des)
+{
+  return __atomic_compare_exchange (ptr, exp, des, false,
+   __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+}
+
+long double a[6];
+const int b = qux (a, a + 1, a + 2);
+const bool c = corge (a + 3, a + 4, a + 5);


Jakub



Re: [PATCH] AArch64: Update system register database.

2024-02-20 Thread Richard Sandiford
Victor Do Nascimento  writes:
> [...]
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 157a0b9dfa5..45e901cda64 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -297,6 +297,26 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
> AARCH64_FL_SM_OFF;
>  #define AARCH64_FL_SCXTNUM  AARCH64_FL_V8_5A
>  #define AARCH64_FL_ID_PFR2  AARCH64_FL_V8_5A
>  
> +/* Armv8.9-A extension feature bits defined in Binutils but absent from GCC,
> +   aliased to their base architecture.  */
> +#define AARCH64_FL_AIE  AARCH64_FL_V8_9A
> +#define AARCH64_FL_DEBUGv8p9AARCH64_FL_V8_9A
> +#define AARCH64_FL_FGT2 AARCH64_FL_V8_9A
> +#define AARCH64_FL_ITE  AARCH64_FL_V8_9A

For the record, I think this leaves things in a bit of an inconsistent
state.  Something like:

#include 

unsigned long long f1() {
  return __arm_rsr64 ("trcitecr_el1");
}

unsigned long long f2() {
  unsigned long long x;
  asm volatile ("mrs %0, trcitecr_el1" : "=r" (x));
  return x;
}

compiles OK with -march=armv8.9-a, but doesn't assemble.  GAS treats ITE
as an independent feature than can be enabled for armv8.8-a, but is not
enabled by default for armv8.9-a.  GCC instead treats it as something
that is enabled by default for armv8.9-a but that cannot be used with
armv8.8-a.

Thanks,
Richard


> +#define AARCH64_FL_PFAR AARCH64_FL_V8_9A
> +#define AARCH64_FL_PMUv3_ICNTR  AARCH64_FL_V8_9A
> +#define AARCH64_FL_PMUv3_SS AARCH64_FL_V8_9A
> +#define AARCH64_FL_PMUv3p9  AARCH64_FL_V8_9A
> +#define AARCH64_FL_RASv2AARCH64_FL_V8_9A
> +#define AARCH64_FL_S1PIEAARCH64_FL_V8_9A
> +#define AARCH64_FL_S1POEAARCH64_FL_V8_9A
> +#define AARCH64_FL_S2PIEAARCH64_FL_V8_9A
> +#define AARCH64_FL_S2POEAARCH64_FL_V8_9A
> +#define AARCH64_FL_SCTLR2   AARCH64_FL_V8_9A
> +#define AARCH64_FL_SEBEPAARCH64_FL_V8_9A
> +#define AARCH64_FL_SPE_FDS  AARCH64_FL_V8_9A
> +#define AARCH64_FL_TCR2 AARCH64_FL_V8_9A
> +
>  /* SHA2 is an optional extension to AdvSIMD.  */
>  #define TARGET_SHA2 (AARCH64_ISA_SHA2)
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-armv8p9.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-armv8p9.c
> new file mode 100644
> index 000..e2f297bbeeb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-armv8p9.c
> @@ -0,0 +1,99 @@
> +/* Ensure support is present for all armv8.9-a system registers.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8.9-a" } */
> +#include 
> +void
> +readwrite_armv8p9a_sysregs ()
> +{
> +  long long int a;
> +
> +  /* Write-only system registers.  */
> +  __arm_wsr64 ("pmzr_el0", a); /* { dg-final { scan-assembler 
> "msr\ts3_3_c9_c13_4, x0" } } */
> +
> +  /* Read/write or write-only system registers.  */
> +  a = __arm_rsr64 ("amair2_el1");/* { { dg-final { 
> scan-assembler "s3_0_c10_c3_1" } } */
> +  a = __arm_rsr64 ("amair2_el12"); /* { { dg-final { scan-assembler 
> "mrs\tx0, s3_5_c10_c3_1" } } */
> +  a = __arm_rsr64 ("amair2_el2"); /* { { dg-final { scan-assembler "mrs\tx0, 
> s3_4_c10_c3_1" } } */
> +  a = __arm_rsr64 ("amair2_el3"); /* { { dg-final { scan-assembler "mrs\tx0, 
> s3_6_c10_c3_1" } } */
> +  a = __arm_rsr64 ("erxgsr_el1"); /* { { dg-final { scan-assembler "mrs\tx0, 
> s3_0_c5_c3_2" } } */
> +  a = __arm_rsr64 ("hdfgrtr2_el2"); /* { { dg-final { scan-assembler 
> "mrs\tx0, s3_4_c3_c1_0" } } */
> +  a = __arm_rsr64 ("hdfgwtr2_el2"); /* { { dg-final { scan-assembler 
> "mrs\tx0, s3_4_c3_c1_1" } } */
> +  a = __arm_rsr64 ("hfgrtr2_el2"); /* { { dg-final { scan-assembler 
> "mrs\tx0, s3_4_c3_c1_2" } } */
> +  a = __arm_rsr64 ("hfgwtr2_el2"); /* { { dg-final { scan-assembler 
> "mrs\tx0, s3_4_c3_c1_3" } } */
> +  a = __arm_rsr64 ("id_aa64mmfr3_el1"); /* { { dg-final { scan-assembler 
> "mrs\tx0, s3_0_c0_c7_3" } } */
> +  a = __arm_rsr64 ("id_aa64mmfr4_el1"); /* { { dg-final { scan-assembler 
> "mrs\tx0, s3_0_c0_c7_4" } } */
> +  a = __arm_rsr64 ("mair2_el1"); /* { { dg-final { scan-assembler "mrs\tx0, 
> s3_0_c10_c2_1" } } */
> +  a = __arm_rsr64 ("mair2_el12"); /* { { dg-final { scan-assembler "mrs\tx0, 
> s3_5_c10_c2_1" } } */
> +  a = __arm_rsr64 ("mair2_el2"); /* { { dg-final { scan-assembler "mrs\tx0, 
> s3_4_c10_c1_1" } } */
> +  a = __arm_rsr64 ("mair2_el3"); /* { { dg-final { scan-assembler "mrs\tx0, 
> s3_6_c10_c1_1" } } */
> +  a = __arm_rsr64 ("mdselr_el1"); /* { { dg-final { scan-assembler "mrs\tx0, 
> s2_0_c0_c4_2" } } */
> +  a = __arm_rsr64 ("pir_el1"); /* { { dg-final { scan-assembler "mrs\tx0, 
> s3_0_c10_c2_3" } } */
> +  a = __arm_rsr64 ("pir_el12"); /* { { dg-final { scan-assembler "mrs\tx0, 
> s3_5_c10_c2_3" } } */
> +  a = __arm_rsr64 ("pir_el2"); /* { { dg-final { scan-assembler "mrs\tx0, 
> s3_4_c10_c2_3" } } */
> +  a = __arm_rsr64 ("pir_el3"); /* { { dg-final { scan-assembler "mrs\tx0, 
> s3_6_c10_c2_3" } } */
> +  a = __arm_rsr64 

[PATCH] c++: Fix explicit instantiation of const variable templates after earlier implicit instantation [PR113976]

2024-02-20 Thread Jakub Jelinek
Hi!

Already previously instantiated const variable templates had
cp_apply_type_quals_to_decl called when they were instantiated,
but if they need runtime initialization, their TREE_READONLY flag
has been subsequently cleared.
Explicit variable template instantiation calls grokdeclarator which
calls cp_apply_type_quals_to_decl on them again, setting TREE_READONLY
flag again, but nothing clears it afterwards, so we emit such
instantiations into rodata sections and segfault when the dynamic
initialization attempts to initialize them.

The following patch fixes that by not calling cp_apply_type_quals_to_decl
on already instantiated variable declarations.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-02-19  Jakub Jelinek  
Patrick Palka  

PR c++/113976
* decl.cc (grokdeclarator): Don't call cp_apply_type_quals_to_decl
on DECL_TEMPLATE_INSTANTIATED VAR_DECLs.

* g++.dg/cpp1y/var-templ87.C: New test.

--- gcc/cp/decl.cc.jj   2024-02-15 09:51:34.460065992 +0100
+++ gcc/cp/decl.cc  2024-02-19 19:18:09.839188137 +0100
@@ -15263,7 +15263,12 @@ grokdeclarator (const cp_declarator *dec
 /* Record constancy and volatility on the DECL itself .  There's
no need to do this when processing a template; we'll do this
for the instantiated declaration based on the type of DECL.  */
-if (!processing_template_decl)
+if (!processing_template_decl
+   /* Don't do it for instantiated variable templates either,
+  cp_apply_type_quals_to_decl should have been called on it
+  already and might have been overridden in cp_finish_decl
+  if initializer needs runtime initialization.  */
+   && (!VAR_P (decl) || !DECL_TEMPLATE_INSTANTIATED (decl)))
   cp_apply_type_quals_to_decl (type_quals, decl);
 
 return decl;
--- gcc/testsuite/g++.dg/cpp1y/var-templ87.C.jj 2024-02-19 19:21:49.668129195 
+0100
+++ gcc/testsuite/g++.dg/cpp1y/var-templ87.C2024-02-19 19:21:42.218232862 
+0100
@@ -0,0 +1,43 @@
+// PR c++/113976
+// { dg-do run { target c++14 } }
+
+int
+foo ()
+{
+  return 42;
+}
+
+template 
+const int a = foo ();
+const int *b =  <0>;
+template 
+const int c = foo ();
+template const int c <0>;
+template 
+const int d = foo ();
+const int *e =  <0>;
+template const int d <0>;
+template 
+const int f = foo ();
+template const int f <0>;
+const int *g =  <0>;
+struct S { int a, b; };
+template 
+const S h = { 42, foo () };
+const S *i =  <0>;
+template 
+const S j =  { 42, foo () };
+template const S j <0>;
+template 
+const S k =  { 42, foo () };
+const S *l =  <0>;
+template const S k <0>;
+template 
+const S m =  { 42, foo () };
+template const S m <0>;
+const S *n =  <0>;
+
+int
+main ()
+{
+}

Jakub



Re: [Committed] analyzer: Fix maybe_undo_optimize_bit_field_compare vs non-scalar types [PR113983]

2024-02-20 Thread Jakub Jelinek
On Mon, Feb 19, 2024 at 01:33:33PM -0800, Andrew Pinski wrote:
>   * gcc.dg/analyzer/torture/vector-extract-1.c: New test.

The testcase fails on i686-linux with
.../gcc/testsuite/gcc.dg/analyzer/torture/vector-extract-1.c:11:1: warning: MMX 
vector return without MMX enabled changes the ABI [-Wpsabi]
Added -Wno-psabi to silence the warning.

Tested with
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32/-march=i686,-m64\} 
analyzer-torture.exp=vector-extract-1.c'
and committed to trunk as obvious.

2024-02-20  Jakub Jelinek  

PR analyzer/113983
* gcc.dg/analyzer/torture/vector-extract-1.c: Add -Wno-psabi as
dg-additional-options.

--- gcc/testsuite/gcc.dg/analyzer/torture/vector-extract-1.c.jj 2024-02-20 
10:26:02.459258772 +0100
+++ gcc/testsuite/gcc.dg/analyzer/torture/vector-extract-1.c2024-02-20 
10:30:39.157416005 +0100
@@ -1,4 +1,5 @@
 /* PR analyzer/113983  */
+/* { dg-additional-options "-Wno-psabi" } */
 
 /* maybe_undo_optimize_bit_field_compare used to ICE on this
because it was not checking for only integer types. */

Jakub



Re: [PATCH] bpf: add inline memmove and memcpy expansion

2024-02-20 Thread Jose E. Marchesi


Hi David.
Thanks for the patch.

See a couple of comments regarding error handling below.

> BPF programs are not typically linked, which means we cannot fall back
> on library calls to implement __builtin_{memmove,memcpy} and should
> always expand them inline if possible.
>
> GCC already successfully expands these builtins inline in many cases,
> but failed to do so for a few for simple cases involving overlapping
> memmove in the kernel BPF selftests and was instead emitting a libcall.
>
> This patch implments a simple inline expansion of memcpy and memmove in

s/implments/implements/.

> the BPF backend in a verifier-friendly way, with the caveat that the
> size must be an integer constant, which is also required by clang.
>
> Tested for bpf-unknown-none on x86_64-linux-gnu host.
>
> Also tested against the BPF verifier by compiling and loading a test
> program with overlapping memmove (essentially the memmove-1.c test)
> which failed before due to a libcall, and now successfully loads and
> passes the verifier.
>
> gcc/
>
>   * config/bpf/bpf-protos.h (bpf_expand_cpymem): New.
>   * config/bpf/bpf.cc: (emit_move_loop, CPYMEM_EXPAND_ERR)
>   (bpf_expand_cpymem): New.
>   * config/bpf/bpf.md: (cpymemdi, movmemdi): New define_expands.
>
> gcc/testsuite/
>
>   * gcc.target/bpf/memcpy-1.c: New test.
>   * gcc.target/bpf/memmove-1.c: New test.
>   * gcc.target/bpf/memmove-2.c: New test.
> ---
>  gcc/config/bpf/bpf-protos.h  |   2 +
>  gcc/config/bpf/bpf.cc| 129 +++
>  gcc/config/bpf/bpf.md|  36 +++
>  gcc/testsuite/gcc.target/bpf/memcpy-1.c  |  26 +
>  gcc/testsuite/gcc.target/bpf/memmove-1.c |  46 
>  gcc/testsuite/gcc.target/bpf/memmove-2.c |  23 
>  6 files changed, 262 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/memcpy-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/memmove-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/memmove-2.c
>
> diff --git a/gcc/config/bpf/bpf-protos.h b/gcc/config/bpf/bpf-protos.h
> index 46d950bd990..366acb87ae4 100644
> --- a/gcc/config/bpf/bpf-protos.h
> +++ b/gcc/config/bpf/bpf-protos.h
> @@ -35,4 +35,6 @@ const char *bpf_add_core_reloc (rtx *operands, const char 
> *templ);
>  class gimple_opt_pass;
>  gimple_opt_pass *make_pass_lower_bpf_core (gcc::context *ctxt);
>  
> +bool bpf_expand_cpymem (rtx *, bool);
> +
>  #endif /* ! GCC_BPF_PROTOS_H */
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index d6ca47eeecb..c90d29d12ff 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -1184,6 +1184,135 @@ bpf_use_by_pieces_infrastructure_p (unsigned 
> HOST_WIDE_INT size,
>  #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
>bpf_use_by_pieces_infrastructure_p
>  
> +/* Helper for bpf_expand_cpymem.  Emit an unrolled loop moving the bytes
> +   from SRC to DST.  */
> +
> +static void
> +emit_move_loop (rtx src, rtx dst, machine_mode mode, int offset, int inc,
> + unsigned iters, unsigned remainder)
> +{
> +  rtx reg = gen_reg_rtx (mode);
> +
> +  /* First copy in chunks as large as alignment permits.  */
> +  for (unsigned int i = 0; i < iters; i++)
> +{
> +  emit_move_insn (reg, adjust_address (src, mode, offset));
> +  emit_move_insn (adjust_address (dst, mode, offset), reg);
> +  offset += inc;
> +}
> +
> +  /* Handle remaining bytes which might be smaller than the chunks
> + used above.  */
> +  if (remainder & 4)
> +{
> +  emit_move_insn (reg, adjust_address (src, SImode, offset));
> +  emit_move_insn (adjust_address (dst, SImode, offset), reg);
> +  offset += (inc < 0 ? -4 : 4);
> +  remainder -= 4;
> +}
> +  if (remainder & 2)
> +{
> +  emit_move_insn (reg, adjust_address (src, HImode, offset));
> +  emit_move_insn (adjust_address (dst, HImode, offset), reg);
> +  offset += (inc < 0 ? -2 : 2);
> +  remainder -= 2;
> +}
> +  if (remainder & 1)
> +{
> +  emit_move_insn (reg, adjust_address (src, QImode, offset));
> +  emit_move_insn (adjust_address (dst, QImode, offset), reg);
> +}
> +}
> +
> +/* Error if we cannot completely expand the memcpy/memmove inline,
> +   unless we are building libgcc.  */
> +#define CPYMEM_EXPAND_ERR(LOC, MSG)  \
> +  do \
> +{
> \
> +  if (flag_building_libgcc)  
> \
> + warning (LOC, MSG " for BPF memcpy/memmove expansion; " \
> +  "a libcall will be emitted");  \
> +  else   \
> + error (MSG " for BPF memcpy/memmove expansion");\
> +} while (0);

For this error I would suggest to use a less "internal" jargon to 

Re: [PATCH] RISC-V: Set require-effective-target rv64 for PR113742

2024-02-20 Thread Monk Chiang
Hi Edwin,
I think just replace to:
/* { dg-options "-O2 -finstrument-functions -mabi=lp64d -march=rv64gc
-mtune=sifive-p600-series" } */

On Thu, Feb 15, 2024 at 7:43 PM Robin Dapp  wrote:

> > Ah oops I glanced over the /* { dg-do compile } */part. It should be
> > fine to add '-march=rv64gc' instead then?
>
> Hmm it's a bit tricky.  So generally -mcpu=sifive-p670 includes rv64
> but it does not override a previously specified -march=rv32 (that might
> have been added by the test harness or the test target).  It looks
> like it does override a (build option and thus not directly specified
> when compiling) --with-arch=rv32.
>
> For now I'd stick with something like -march=rv64gc -mtune=sifive-p670
> (but please check if the original problem does occur with this).
> While you're at it you could delete the redundant '/' in the first
> line.
>
> In general it's a bit counterintuitive a test specifying a
> particular CPU (that supports several extensions) might have
> those overridden when e.g. testing on a rv32 target not supporting
> those.  We also do not support cpu names in the march string
> so there is no nice way of overriding previously specified marchs.
>
> Kito: Any idea regarding this?  I read in your commit message that
> mcpu has lower precedence than march.  Right now that allows us to
> somewhat silently remove architecture options that are specified
> last on the command line.
>
> aarch64 warns in case something is in conflict, maybe we should do
> that as well?
>
> At least I find it a bit annoying that we don't have a way of
> saying:
> "This test always needs to be compiled with all arch features of
> cpu = ..." and rather need to specify -march=rv64gcv_z..._z...
>
> Without having this thought through, can't mcpu be of kind of
> similar precedence to march and we'd let the one specified last
> "win" in case of conflicts?  Possibly with an exception for
> the 32/64 bit.  Does LLVM not have this problem?
>
> Regards
>  Robin
>
>


  1   2   >