[PATCH] attribs: Fix and refactor diag_attr_exclusions

2024-05-16 Thread Andrew Carlotti
The existing implementation of this function was convoluted, and had
multiple control flow errors that became apparent to me while reading
the code:

1. The initial early return only checked the properties of the first
exclusion in the list, when these properties could be different for
subsequent exclusions.

2. excl was not reset within the outer loop, so the inner loop body
would only execute during the first iteration of the outer loop.  This
effectively meant that the value of attrs[1] was ignored.

3. The function called itself recursively twice, with both last_decl and
TREE_TYPE (last_decl) as parameters. The second recursive call should
have been redundant, since attrs[1] = TREE_TYPE (last_decl) during the
first recursive call.

This patch eliminated the early return, and combines the checks with
those present within the inner loop.  It also fixes the inner loop
initialisation, and modifies the outer loop to iterate over nodes
instead of their attributes. This latter change allows the recursion to
be eliminated, by extending the new nodes array to include last_decl
(and its type) as well.

This patch provides an alternative fix for PR114634, although I wasn't
aware of that issue until rebasing on top of Jakub's fix.

I am not aware of any other compiler bugs resulting from these issues.
However, if the exclusions for target_clones were listed in the opposite
order, then it would have broken detection of the always_inline
exclusion on aarch64 (where TARGET_HAS_FMV_TARGET_ATTRIBUTE is false).

Is this ok for master?

gcc/ChangeLog:

* attribs.cc (diag_attr_exclusions): Fix and refactor.


diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 
3ab0b0fd87a4404a593b2de365ea5226e31fe24a..431dd4255e68e92dd8d10bbb21ea079e50811faa
 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -433,84 +433,69 @@ get_attribute_namespace (const_tree attr)
or a TYPE.  */
 
 static bool
-diag_attr_exclusions (tree last_decl, tree node, tree attrname,
+diag_attr_exclusions (tree last_decl, tree base_node, tree attrname,
  const attribute_spec *spec)
 {
-  const attribute_spec::exclusions *excl = spec->exclude;
 
-  tree_code code = TREE_CODE (node);
+  /* BASE_NODE is either the current decl to which the attribute is being
+ applied, or its type.  For the former, consider the attributes on both the
+ decl and its type.  Check both LAST_DECL and its type as well.  */
 
-  if ((code == FUNCTION_DECL && !excl->function
-   && (!excl->type || !spec->affects_type_identity))
-  || (code == VAR_DECL && !excl->variable
- && (!excl->type || !spec->affects_type_identity))
-  || (((code == TYPE_DECL || RECORD_OR_UNION_TYPE_P (node)) && 
!excl->type)))
-return false;
+  tree nodes[4] = { NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE };
 
-  /* True if an attribute that's mutually exclusive with ATTRNAME
- has been found.  */
-  bool found = false;
+  nodes[0] = base_node;
+  if (DECL_P (base_node))
+  nodes[1] = (TREE_TYPE (base_node));
 
-  if (last_decl && last_decl != node && TREE_TYPE (last_decl) != node)
+  if (last_decl)
 {
-  /* Check both the last DECL and its type for conflicts with
-the attribute being added to the current decl or type.  */
-  found |= diag_attr_exclusions (last_decl, last_decl, attrname, spec);
-  tree decl_type = TREE_TYPE (last_decl);
-  found |= diag_attr_exclusions (last_decl, decl_type, attrname, spec);
+  nodes[2] = last_decl;
+  if (DECL_P (last_decl))
+ nodes[3] = TREE_TYPE (last_decl);
 }
 
-  /* NODE is either the current DECL to which the attribute is being
- applied or its TYPE.  For the former, consider the attributes on
- both the DECL and its type.  */
-  tree attrs[2];
-
-  if (DECL_P (node))
-{
-  attrs[0] = DECL_ATTRIBUTES (node);
-  if (TREE_TYPE (node))
-   attrs[1] = TYPE_ATTRIBUTES (TREE_TYPE (node));
-  else
-   /* TREE_TYPE can be NULL e.g. while processing attributes on
-  enumerators.  */
-   attrs[1] = NULL_TREE;
-}
-  else
-{
-  attrs[0] = TYPE_ATTRIBUTES (node);
-  attrs[1] = NULL_TREE;
-}
+  /* True if an attribute that's mutually exclusive with ATTRNAME
+ has been found.  */
+  bool found = false;
 
   /* Iterate over the mutually exclusive attribute names and verify
  that the symbol doesn't contain it.  */
-  for (unsigned i = 0; i != ARRAY_SIZE (attrs); ++i)
+  for (unsigned i = 0; i != ARRAY_SIZE (nodes); ++i)
 {
-  if (!attrs[i])
+  tree node = nodes[i];
+
+  if (!node)
continue;
 
-  for ( ; excl->name; ++excl)
+  tree attr;
+  if DECL_P (node)
+   attr = DECL_ATTRIBUTES (node);
+  else
+   attr = TYPE_ATTRIBUTES (node);
+
+  tree_code code = TREE_CODE (node);
+
+  for (auto excl = spec->exclude; excl->name; ++excl)
{
  /* Avoid checking the attribute against itself.  */
  if (is_attribute_p (excl->name, 

[PATCH 12/12] aarch64: Extend aarch64_feature_flags to 128 bits

2024-05-14 Thread Andrew Carlotti
Replace the existing typedef with a new class containing two private
uint64_t members.

Most of the preparatory work was carried out in previous commits.  The
most notable remaining changes are the addition of the get_isa_mode and
with_isa_mode functions for conversion to or from aarch64_isa_mode
types, and the use of a 'save' member function from within
aarch64_set_asm_isa_flags, to avoid needing to expose the uint64_t
members.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_set_asm_isa_flags): Use new flags.save function.
* config/aarch64/aarch64-opts.h
(class aarch64_feature_flags): New class.
(aarch64_feature_flags_from_index): Update to handle 128 bits.
(AARCH64_NO_FEATURES): Pass a second constructor parameter.
* config/aarch64/aarch64.cc
(aarch64_guard_switch_pstate_sm): Extract isa mode explicitly.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto
(aarch64_set_current_function): Set/extract isa mode explicitly.
* config/aarch64/aarch64.h
(aarch64_get_asm_isa_flags): Use new option struct member.
(aarch64_get_isa_flags): Use new option struct member.
(aarch64_asm_isa_flags): Use second global variable.
(aarch64_isa_flags): Ditto.
(AARCH64_FL_ISA_MODES): Pass a second constructor parameter.
(AARCH64_FL_DEFAULT_ISA_MODE): Ditto.
(AARCH64_ISA_MODE): Extract isa mode explicitly.
* config/aarch64/aarch64.opt
(aarch64_asm_isa_flags_1): Add a second uint64_t for bitmask.
(aarch64_isa_flags_1): Ditto.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
9f583bb80456709e0028c358a1bad23ad59f20f4..a84650086ba9a1054f3ba15022567a00b7fb4313
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -67,18 +67,18 @@ static const struct default_options 
aarch_option_optimization_table[] =
   };
 
 
-/* Set OPTS->x_aarch64_asm_isa_flags_0 to FLAGS and update
-   OPTS->x_aarch64_isa_flags_0 accordingly.  */
+/* Set OPTS->x_aarch64_asm_isa_flags_<0..n> to FLAGS and update
+   OPTS->x_aarch64_isa_flags_<0..n> accordingly.  */
 void
 aarch64_set_asm_isa_flags (gcc_options *opts, aarch64_feature_flags flags)
 {
-  opts->x_aarch64_asm_isa_flags_0 = flags;
+  flags.save(>x_aarch64_asm_isa_flags_0, 
>x_aarch64_asm_isa_flags_1);
   if (opts->x_target_flags & MASK_GENERAL_REGS_ONLY)
 {
   constexpr auto flags_mask = ~feature_deps::get_flags_off (AARCH64_FL_FP);
   flags &= flags_mask;
 }
-  opts->x_aarch64_isa_flags_0 = flags;
+  flags.save(>x_aarch64_isa_flags_0, >x_aarch64_isa_flags_1);
 }
 
 /* Implement TARGET_HANDLE_OPTION.
diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
80926a008aa2ed7dffa79aaa425dd3d7fc9d2581..7571385740d5271ab99bcc3380899a550788592d
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -25,17 +25,110 @@
 #ifndef USED_FOR_TARGET
 typedef uint64_t aarch64_isa_mode;
 
-typedef uint64_t aarch64_feature_flags;
-
 constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
 #define DEF_AARCH64_ISA_MODE(IDENT) + 1
 #include "aarch64-isa-modes.def"
 );
 
+class aarch64_feature_flags
+{
+private:
+  uint64_t flags0;
+  uint64_t flags1;
+
+public:
+  constexpr aarch64_feature_flags (uint64_t flags0_m, uint64_t flags1_m)
+: flags0 (flags0_m), flags1 (flags1_m) {}
+  aarch64_feature_flags () = default;
+
+  void save(uint64_t *save0, uint64_t *save1)
+{
+  *save0 = flags0;
+  *save1 = flags1;
+}
+
+  constexpr aarch64_isa_mode get_isa_mode ()
+{
+  return flags0 & ((1 << AARCH64_NUM_ISA_MODES) - 1);
+}
+
+  constexpr aarch64_feature_flags with_isa_mode (const aarch64_isa_mode mode) 
const
+{
+  return aarch64_feature_flags ((flags0 & ~((1 << AARCH64_NUM_ISA_MODES) - 
1)) | mode,
+   flags1);
+}
+
+  constexpr aarch64_feature_flags operator&(const aarch64_feature_flags other) 
const
+{
+  return aarch64_feature_flags (flags0 & other.flags0,
+   flags1 & other.flags1);
+}
+
+  aarch64_feature_flags operator&=(const aarch64_feature_flags other)
+{
+  flags0 &= other.flags0;
+  flags1 &= other.flags1;
+  return *this;
+}
+
+  constexpr aarch64_feature_flags operator|(const aarch64_feature_flags other) 
const
+{
+  return aarch64_feature_flags (flags0 | other.flags0,
+   flags1 | other.flags1);
+}
+
+  aarch64_feature_flags operator|=(const aarch64_feature_flags other)
+{
+  flags0 |= other.flags0;
+  flags1 |= other.flags1;
+  return *this;
+}
+
+  constexpr aarch64_feature_flags operator^(const aarch64_feature_flags other) 
const
+{
+  return aarch64_feature_flags (flags0 ^ other.flags0,
+

[PATCH 09/12] aarch64: Assign flags to local constexpr variable

2024-05-14 Thread Andrew Carlotti
This guarantees that the constant values are actually evaluated at
compile time.

In previous testing, I have observed GCC failing to evaluate and inline
these constant values, which exposed a separate bug in which some of the
required symbols from feature_deps were missing.  Richard Sandiford has
since fixed that bug, but we still want to ensure we get the benefits of
compile-time evaluation here.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_set_asm_isa_flags): Make constant explicitly constexpr.
* config/aarch64/aarch64.cc
(aarch64_override_options_internal): Ditto.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
2f437b82a24c16d9f808a4367ce2a281a49a77ee..9f583bb80456709e0028c358a1bad23ad59f20f4
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -74,7 +74,10 @@ aarch64_set_asm_isa_flags (gcc_options *opts, 
aarch64_feature_flags flags)
 {
   opts->x_aarch64_asm_isa_flags_0 = flags;
   if (opts->x_target_flags & MASK_GENERAL_REGS_ONLY)
-flags &= ~feature_deps::get_flags_off (AARCH64_FL_FP);
+{
+  constexpr auto flags_mask = ~feature_deps::get_flags_off (AARCH64_FL_FP);
+  flags &= flags_mask;
+}
   opts->x_aarch64_isa_flags_0 = flags;
 }
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
eef0905069232bacc59d574cad0f6edbaf062387..69c3b257982b4a0e282cbf7486802b147d166945
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18305,7 +18305,8 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
  " option %<-march%>, or by using the %"
  " attribute or pragma", "sme");
   opts->x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
-  auto new_flags = isa_flags | feature_deps::SME ().enable;
+  constexpr auto flags_enable_sme = feature_deps::SME ().enable;
+  auto new_flags = isa_flags | flags_enable_sme;
   aarch64_set_asm_isa_flags (opts, new_flags);
 }
 


[RFC 11/12] Add explicit bool casts to .md condition users

2024-05-14 Thread Andrew Carlotti
This patch is one way to fix some issues I discovered when disallowing
implicit casts to bool from aarch64_feature_flags (in a later patch).
That in turn was necessary to prohibit accidental conversion of an
aarch64_feature_flags value to an integer by first implicitly casting to
a bool (and thus setting the resulting integer value to 0 or 1).

Most of the uses of TARGET_ macros occur indirectly in middle end code,
via their use in instruction pattern conditions.  There are also a few
uses in aarch64 backend code, which are also changed in this patch.

The documentation on instruction patterns [1] doesn't explicitly say
that the condition must be a bool.  If we want to assume this, I think
we should update the documentation, and ideally enforce type consistency
within the compiler.

The code generated in genconditions.cc by write_one_condition already
includes an assumption that casting a condition's value to an int is
valid (i.e. that it does not invoke undefined behaviour, and does not
change the result obtained when later converting it to a boolean
result).  Fortunately, for aarch64 at least, this assumption only needs
to hold when the original constant is a compile time constant, whereas
all our problematic usage involves comparisons against the runtime
feature mask.

If the use of non-bool instruction pattern conditions should be
disallowed, then it would be straightforward to fix the type mismatches
in the aarch64 backend, by adding explicit bool casts to all of the
TARGET_* macros.  Indeed, I think that would be a better approach to
fixing this issue.  However, I felt it would be more useful to first
investigate and demonstrate the downstream impact of these type issues.

Note that this patch doesn't compile without the subsequent patch,
due to ambiguous calls to aarch64_def_or_undef(int, ...).  I expect to
replace this patch with one that avoids the issue, so it isn't worth
meddling with the next patch in the series just to make this RFC compile
by itself.

[1] https://gcc.gnu.org/onlinedocs/gccint/Patterns.html


diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index 
b6f25e4db3c06a1addc09a47335fe5184cb4a100..0cfcac6ba6b1e0ae7cdc0fb864eb28ec7de78605
 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1506,7 +1506,7 @@ c_cpp_builtins (cpp_reader *pfile)
 
 #ifdef HAVE_adddf3
  builtin_define_with_int_value ("__LIBGCC_HAVE_HWDBL__",
-HAVE_adddf3);
+(bool) HAVE_adddf3);
 #endif
}
 
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
fe1a20e4e546a68e5f7eddff3bbb0d3e831fbd9b..de4b383cda92c160bd706f9085999daac5d8313a
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -47,6 +47,12 @@ aarch64_def_or_undef (bool def_p, const char *macro, 
cpp_reader *pfile)
 cpp_undef (pfile, macro);
 }
 
+static void
+aarch64_def_or_undef (aarch64_feature_flags def_p, const char *macro, 
cpp_reader *pfile)
+{
+  aarch64_def_or_undef ((bool) def_p, macro, pfile);
+}
+
 /* Define the macros that we always expect to have on AArch64.  */
 
 static void
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
69c3b257982b4a0e282cbf7486802b147d166945..052cf297e7672abf015a085ab357836cb3b235e4
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -6561,10 +6561,10 @@ aarch64_function_value_regno_p (const unsigned int 
regno)
   /* Up to four fp/simd registers can return a function value, e.g. a
  homogeneous floating-point aggregate having four members.  */
   if (regno >= V0_REGNUM && regno < V0_REGNUM + HA_MAX_NUM_FLDS)
-return TARGET_FLOAT;
+return (bool) TARGET_FLOAT;
 
   if (regno >= P0_REGNUM && regno < P0_REGNUM + HA_MAX_NUM_FLDS)
-return TARGET_SVE;
+return (bool) TARGET_SVE;
 
   return false;
 }
diff --git a/gcc/genconditions.cc b/gcc/genconditions.cc
index 
13963dc3ff46aa250c39ce80d0b92356390e41ff..3aee4428ff7ff5c97260f56a5f6b0fffa4e95fc2
 100644
--- a/gcc/genconditions.cc
+++ b/gcc/genconditions.cc
@@ -140,9 +140,9 @@ write_one_condition (void **slot, void * ARG_UNUSED (dummy))
   putchar (*p);
 }
 
-  fputs ("\",\n__builtin_constant_p ", stdout);
+  fputs ("\",\n__builtin_constant_p ((bool)", stdout);
   rtx_reader_ptr->print_c_condition (test->expr);
-  fputs ("\n? (int) ", stdout);
+  fputs (")\n? (int) (bool)", stdout);
   rtx_reader_ptr->print_c_condition (test->expr);
   fputs ("\n: -1 },\n", stdout);
   return 1;
diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
index 
d8682b2a9ad56a0a62b4407741c695489c72795b..0d9cf0de8b93da5884a352858b343f81644f9d3f
 100644
--- a/gcc/genopinit.cc
+++ b/gcc/genopinit.cc
@@ -386,7 +386,7 @@ main (int argc, const char **argv)
  unsigned end = MIN (patterns.length (),
  (i + 1) * patterns_per_function);
  for (j = 

[PATCH 10/12] aarch64: Add aarch64_feature_flags_from_index macro

2024-05-14 Thread Andrew Carlotti
When aarch64_feature_flags grows to 128 bits, constructing a mask with a
specific indexed value set will become more complicated.  Extract this
operation into a separate macro, and preemptively annotate the feature
masks as possibly unused.

gcc/ChangeLog:

* config/aarch64/aarch64-opts.h
(aarch64_feature_flags_from_index): New macro.
* config/aarch64/aarch64.h
(AARCH64_FL_##IDENT): Mark as maybe unused, and use new macro.


diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
c2d68716857b49db8f9c1393f11b3377f51fb60c..80926a008aa2ed7dffa79aaa425dd3d7fc9d2581
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -32,6 +32,9 @@ constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
 #include "aarch64-isa-modes.def"
 );
 
+#define aarch64_feature_flags_from_index(index) \
+  (aarch64_feature_flags (uint64_t (1) << index))
+
 #define AARCH64_NO_FEATURES aarch64_feature_flags (0)
 #endif
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
af256c581aedc04e4194ac0158380fcdb8b65594..dd3437214e1597f03ac947a09c124ea0b04e27e8
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -185,8 +185,8 @@ enum class aarch64_feature : unsigned char {
 
 /* Define unique flags for each of the above.  */
 #define HANDLE(IDENT) \
-  constexpr auto AARCH64_FL_##IDENT \
-= aarch64_feature_flags (1) << int (aarch64_feature::IDENT);
+  constexpr auto AARCH64_FL_##IDENT ATTRIBUTE_UNUSED \
+= aarch64_feature_flags_from_index (int (aarch64_feature::IDENT));
 #define DEF_AARCH64_ISA_MODE(IDENT) HANDLE (IDENT)
 #define AARCH64_OPT_EXTENSION(A, IDENT, C, D, E, F) HANDLE (IDENT)
 #define AARCH64_ARCH(A, B, IDENT, D, E) HANDLE (IDENT)


[PATCH 08/12] aarch64: Decouple feature flag option storage type

2024-05-14 Thread Andrew Carlotti
The awk scripts that process the .opt files are relatively fragile and
only handle a limited set of data types correctly.  The unrecognised
aarch64_feature_flags type is handled as a uint64_t, which happens to be
correct for now.  However, that assumption will change when we extend
the mask to 128 bits.

This patch changes the option members to use uint64_t types, and adds a
"_0" suffix to the names (both for future extensibility, and to allow
the original name to be used for the full aarch64_feature_flags mask
within generator files).

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_set_asm_isa_flags): Reorder, and add suffix to names.
* config/aarch64/aarch64.h
(aarch64_get_asm_isa_flags): Add "_0" suffix.
(aarch64_get_isa_flags): Ditto.
(aarch64_asm_isa_flags): Redefine using renamed uint64_t value.
(aarch64_isa_flags): Ditto.
* config/aarch64/aarch64.opt:
(aarch64_asm_isa_flags): Rename to...
(aarch64_asm_isa_flags_0): ...this, and change to uint64_t.
(aarch64_isa_flags): Rename to...
(aarch64_isa_flags_0): ...this, and change to uint64_t.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
e08a0fc86590b35a595a305599dfb919f83d6906..2f437b82a24c16d9f808a4367ce2a281a49a77ee
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -66,15 +66,16 @@ static const struct default_options 
aarch_option_optimization_table[] =
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
-/* Set OPTS->x_aarch64_asm_isa_flags to FLAGS and update
-   OPTS->x_aarch64_isa_flags accordingly.  */
+
+/* Set OPTS->x_aarch64_asm_isa_flags_0 to FLAGS and update
+   OPTS->x_aarch64_isa_flags_0 accordingly.  */
 void
 aarch64_set_asm_isa_flags (gcc_options *opts, aarch64_feature_flags flags)
 {
-  opts->x_aarch64_asm_isa_flags = flags;
-  opts->x_aarch64_isa_flags = flags;
+  opts->x_aarch64_asm_isa_flags_0 = flags;
   if (opts->x_target_flags & MASK_GENERAL_REGS_ONLY)
-opts->x_aarch64_isa_flags &= ~feature_deps::get_flags_off (AARCH64_FL_FP);
+flags &= ~feature_deps::get_flags_off (AARCH64_FL_FP);
+  opts->x_aarch64_isa_flags_0 = flags;
 }
 
 /* Implement TARGET_HANDLE_OPTION.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
49bdc7565cd5ca80fbe2d4abf30aae12841c340f..af256c581aedc04e4194ac0158380fcdb8b65594
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -23,13 +23,18 @@
 #define GCC_AARCH64_H
 
 #define aarch64_get_asm_isa_flags(opts) \
-  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags))
+  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags_0))
 #define aarch64_get_isa_flags(opts) \
-  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags))
+  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags_0))
 
 /* Make these flags read-only so that all uses go via
aarch64_set_asm_isa_flags.  */
-#ifndef GENERATOR_FILE
+#ifdef GENERATOR_FILE
+#undef aarch64_asm_isa_flags
+#define aarch64_asm_isa_flags (aarch64_feature_flags (aarch64_asm_isa_flags_0))
+#undef aarch64_isa_flags
+#define aarch64_isa_flags (aarch64_feature_flags (aarch64_isa_flags_0))
+#else
 #undef aarch64_asm_isa_flags
 #define aarch64_asm_isa_flags (aarch64_get_asm_isa_flags (_options))
 #undef aarch64_isa_flags
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 
6356c419399bd324929cd599e5a4b926b0383469..45aab49de27bdfa0fb3f67ec06c7dcf0ac242fb3
 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -31,10 +31,10 @@ TargetVariable
 enum aarch64_arch selected_arch = aarch64_no_arch
 
 TargetVariable
-aarch64_feature_flags aarch64_asm_isa_flags = 0
+uint64_t aarch64_asm_isa_flags_0 = 0
 
 TargetVariable
-aarch64_feature_flags aarch64_isa_flags = 0
+uint64_t aarch64_isa_flags_0 = 0
 
 TargetVariable
 unsigned aarch_enable_bti = 2


[PATCH 07/12] aarch64: Define aarch64_get_{asm_|}isa_flags

2024-05-14 Thread Andrew Carlotti
Building an aarch64_feature_flags value from data within a gcc_options
or cl_target_option struct will get more complicated in a later commit.
Use a macro to avoid doing this manually in more than one location.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_handle_option): Use new macro.
* config/aarch64/aarch64.cc
(aarch64_override_options_internal): Ditto.
(aarch64_option_print): Ditto.
(aarch64_set_current_function): Ditto.
(aarch64_can_inline_p): Ditto.
(aarch64_declare_function_name): Ditto.
(aarch64_start_file): Ditto.
* config/aarch64/aarch64.h (aarch64_get_asm_isa_flags): New
(aarch64_get_isa_flags): New.
(aarch64_asm_isa_flags): Use new macro.
(aarch64_isa_flags): Ditto.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
162b622564ab543cadfc24a7341f1fc476733f45..e08a0fc86590b35a595a305599dfb919f83d6906
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -111,7 +111,7 @@ aarch64_handle_option (struct gcc_options *opts,
 
 case OPT_mgeneral_regs_only:
   opts->x_target_flags |= MASK_GENERAL_REGS_ONLY;
-  aarch64_set_asm_isa_flags (opts, opts->x_aarch64_asm_isa_flags);
+  aarch64_set_asm_isa_flags (opts, aarch64_get_asm_isa_flags (opts));
   return true;
 
 case OPT_mfix_cortex_a53_835769:
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
773cc12d5a88f774ab78af8a9099312335c19513..49bdc7565cd5ca80fbe2d4abf30aae12841c340f
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -22,15 +22,18 @@
 #ifndef GCC_AARCH64_H
 #define GCC_AARCH64_H
 
+#define aarch64_get_asm_isa_flags(opts) \
+  (aarch64_feature_flags ((opts)->x_aarch64_asm_isa_flags))
+#define aarch64_get_isa_flags(opts) \
+  (aarch64_feature_flags ((opts)->x_aarch64_isa_flags))
+
 /* Make these flags read-only so that all uses go via
aarch64_set_asm_isa_flags.  */
 #ifndef GENERATOR_FILE
 #undef aarch64_asm_isa_flags
-#define aarch64_asm_isa_flags \
-  ((aarch64_feature_flags) global_options.x_aarch64_asm_isa_flags)
+#define aarch64_asm_isa_flags (aarch64_get_asm_isa_flags (_options))
 #undef aarch64_isa_flags
-#define aarch64_isa_flags \
-  ((aarch64_feature_flags) global_options.x_aarch64_isa_flags)
+#define aarch64_isa_flags (aarch64_get_isa_flags (_options))
 #endif
 
 /* Target CPU builtins.  */
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
b6300fc24c0d674edbb0df8e2d10121f2d39e7d6..eef0905069232bacc59d574cad0f6edbaf062387
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18292,10 +18292,11 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
   && !fixed_regs[R18_REGNUM])
 error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>");
 
-  if ((opts->x_aarch64_isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON))
-  && !(opts->x_aarch64_isa_flags & AARCH64_FL_SME))
+  aarch64_feature_flags isa_flags = aarch64_get_isa_flags (opts);
+  if ((isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON))
+  && !(isa_flags & AARCH64_FL_SME))
 {
-  if (opts->x_aarch64_isa_flags & AARCH64_FL_SM_ON)
+  if (isa_flags & AARCH64_FL_SM_ON)
error ("streaming functions require the ISA extension %qs", "sme");
   else
error ("functions with SME state require the ISA extension %qs",
@@ -18304,8 +18305,7 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
  " option %<-march%>, or by using the %"
  " attribute or pragma", "sme");
   opts->x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
-  auto new_flags = (opts->x_aarch64_asm_isa_flags
-   | feature_deps::SME ().enable);
+  auto new_flags = isa_flags | feature_deps::SME ().enable;
   aarch64_set_asm_isa_flags (opts, new_flags);
 }
 
@@ -18999,9 +18999,9 @@ aarch64_option_print (FILE *file, int indent, struct 
cl_target_option *ptr)
   const struct processor *cpu
 = aarch64_get_tune_cpu (ptr->x_selected_tune);
   const struct processor *arch = aarch64_get_arch (ptr->x_selected_arch);
+  aarch64_feature_flags isa_flags = aarch64_get_asm_isa_flags(ptr);
   std::string extension
-= aarch64_get_extension_string_for_isa_flags (ptr->x_aarch64_asm_isa_flags,
- arch->flags);
+= aarch64_get_extension_string_for_isa_flags (isa_flags, arch->flags);
 
   fprintf (file, "%*sselected tune = %s\n", indent, "", cpu->name);
   fprintf (file, "%*sselected arch = %s%s\n", indent, "",
@@ -19061,7 +19061,7 @@ aarch64_set_current_function (tree fndecl)
   auto new_isa_mode = (fndecl
   ? aarch64_fndecl_isa_mode (fndecl)
   : AARCH64_DEFAULT_ISA_MODE);
-  auto isa_flags = TREE_TARGET_OPTION 

[PATCH 06/12] aarch64: Introduce aarch64_isa_mode type

2024-05-14 Thread Andrew Carlotti
Currently there are many places where an aarch64_feature_flags variable
is used, but only the bottom three isa mode bits are set and read.
Using a separate data type for these value makes it more clear that
they're not expected or required to have any of their upper feature bits
set.  It will also make things simpler and more efficient when we extend
aarch64_feature_flags to 128 bits.

This patch uses explicit casts whenever converting from an
aarch64_feature_flags value to an aarch64_isa_mode value.  This isn't
strictly necessary, but serves to highlight the locations where an
explicit conversion will become necessary later.

gcc/ChangeLog:

* config/aarch64/aarch64-opts.h: Add aarch64_isa_mode typedef.
* config/aarch64/aarch64-protos.h
(aarch64_gen_callee_cookie): Use aarch64_isa_mode parameter.
(aarch64_sme_vq_immediate): Ditto.
* config/aarch64/aarch64.cc
(aarch64_fntype_pstate_sm): Use aarch64_isa_mode values.
(aarch64_fntype_pstate_za): Ditto.
(aarch64_fndecl_pstate_sm): Ditto.
(aarch64_fndecl_pstate_za): Ditto.
(aarch64_fndecl_isa_mode): Ditto.
(aarch64_cfun_incoming_pstate_sm): Ditto.
(aarch64_cfun_enables_pstate_sm): Ditto.
(aarch64_call_switches_pstate_sm): Ditto.
(aarch64_gen_callee_cookie): Ditto.
(aarch64_callee_isa_mode): Ditto.
(aarch64_insn_callee_abi): Ditto.
(aarch64_sme_vq_immediate): Ditto.
(aarch64_add_offset_temporaries): Ditto.
(aarch64_add_offset): Ditto.
(aarch64_add_sp): Ditto.
(aarch64_sub_sp): Ditto.
(aarch64_guard_switch_pstate_sm): Ditto.
(aarch64_switch_pstate_sm): Ditto.
(aarch64_init_cumulative_args): Ditto.
(aarch64_allocate_and_probe_stack_space): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_start_call_args): Ditto.
(aarch64_expand_call): Ditto.
(aarch64_end_call_args): Ditto.
(aarch64_set_current_function): Ditto, with added conversions.
(aarch64_handle_attr_arch): Avoid macro with changed type.
(aarch64_handle_attr_cpu): Ditto.
(aarch64_handle_attr_isa_flags): Ditto.
(aarch64_switch_pstate_sm_for_landing_pad):
Use arch64_isa_mode values.
(aarch64_switch_pstate_sm_for_jump): Ditto.
(pass_switch_pstate_sm::gate): Ditto.
* config/aarch64/aarch64.h
(AARCH64_ISA_MODE_{SM_ON|SM_OFF|ZA_ON}): New macros.
(AARCH64_FL_SM_STATE): Mark as possibly unused.
(AARCH64_ISA_MODE_SM_STATE): New aarch64_isa_mode mask.
(AARCH64_DEFAULT_ISA_MODE): New aarch64_isa_mode value.
(AARCH64_FL_DEFAULT_ISA_MODE): Define using above value.
(AARCH64_ISA_MODE): Change type to aarch64_isa_mode.
(arm_pcs): Use aarch64_isa_mode value.


diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
376d7b5ad25e8838bc83fd9ab1c6f09c6de10835..c2d68716857b49db8f9c1393f11b3377f51fb60c
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -23,6 +23,8 @@
 #define GCC_AARCH64_OPTS_H
 
 #ifndef USED_FOR_TARGET
+typedef uint64_t aarch64_isa_mode;
+
 typedef uint64_t aarch64_feature_flags;
 
 constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
4b1fefdd53843e97d3249bfb4d9fed2ffe60f865..585beee44d51275545775420905e7c7b37e2ce5c
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -768,7 +768,7 @@ bool aarch64_constant_address_p (rtx);
 bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
 tree aarch64_vector_load_decl (tree);
-rtx aarch64_gen_callee_cookie (aarch64_feature_flags, arm_pcs);
+rtx aarch64_gen_callee_cookie (aarch64_isa_mode, arm_pcs);
 void aarch64_expand_call (rtx, rtx, rtx, bool);
 bool aarch64_expand_cpymem_mops (rtx *, bool);
 bool aarch64_expand_cpymem (rtx *, bool);
@@ -809,7 +809,7 @@ int aarch64_add_offset_temporaries (rtx);
 void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx);
 bool aarch64_rdsvl_immediate_p (const_rtx);
 rtx aarch64_sme_vq_immediate (machine_mode mode, HOST_WIDE_INT,
- aarch64_feature_flags);
+ aarch64_isa_mode);
 char *aarch64_output_rdsvl (const_rtx);
 bool aarch64_addsvl_addspl_immediate_p (const_rtx);
 char *aarch64_output_addsvl_addspl (rtx);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
f4ab220271239ce5a750cf211120d5b37d7f8b27..773cc12d5a88f774ab78af8a9099312335c19513
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -187,7 +187,17 @@ enum class aarch64_feature : unsigned char {
 #include "aarch64-arches.def"
 #undef HANDLE
 
-constexpr auto AARCH64_FL_SM_STATE = AARCH64_FL_SM_ON | 

[PATCH 05/12] aarch64: Eliminate a temporary variable.

2024-05-14 Thread Andrew Carlotti
The name would become misleading in a later commit anyway, and I think
this is marginally more readable.

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_override_options): Remove temporary variable.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
e84151c474029b437ce67eb0cd6fca591a823b82..7b4e625190018dc3f16ef45c6eaf8fd3af10c784
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18817,7 +18817,6 @@ aarch64_override_options (void)
   SUBTARGET_OVERRIDE_OPTIONS;
 #endif
 
-  auto isa_mode = AARCH64_FL_DEFAULT_ISA_MODE;
   if (cpu && arch)
 {
   /* If both -mcpu and -march are specified, warn if they are not
@@ -18840,25 +18839,25 @@ aarch64_override_options (void)
}
 
   selected_arch = arch->arch;
-  aarch64_set_asm_isa_flags (arch_isa | isa_mode);
+  aarch64_set_asm_isa_flags (arch_isa | AARCH64_FL_DEFAULT_ISA_MODE);
 }
   else if (cpu)
 {
   selected_arch = cpu->arch;
-  aarch64_set_asm_isa_flags (cpu_isa | isa_mode);
+  aarch64_set_asm_isa_flags (cpu_isa | AARCH64_FL_DEFAULT_ISA_MODE);
 }
   else if (arch)
 {
   cpu = _cores[arch->ident];
   selected_arch = arch->arch;
-  aarch64_set_asm_isa_flags (arch_isa | isa_mode);
+  aarch64_set_asm_isa_flags (arch_isa | AARCH64_FL_DEFAULT_ISA_MODE);
 }
   else
 {
   /* No -mcpu or -march specified, so use the default CPU.  */
   cpu = _cores[TARGET_CPU_DEFAULT];
   selected_arch = cpu->arch;
-  aarch64_set_asm_isa_flags (cpu->flags | isa_mode);
+  aarch64_set_asm_isa_flags (cpu->flags | AARCH64_FL_DEFAULT_ISA_MODE);
 }
 
   selected_tune = tune ? tune->ident : cpu->ident;


[PATCH 03/12] aarch64: Don't use 0 for aarch64_feature_flags

2024-05-14 Thread Andrew Carlotti
Replace all uses of 0 for aarch64_feature_flags variable initialisation
with the (almost) new macro AARCH64_NO_FEATURES.

This is needed because a later commit will disallow casts to
aarch64_feature_flags from integer types.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(all_extensions): Use AARCH64_NO_FEATURES.
(all_cores): Ditto.
(all_architectures): Ditto.
(aarch64_get_extension_string_for_isa_flags): Ditto.
* config/aarch64/aarch64-feature-deps.h (get_flags): Ditto.
(get_enable): Ditto.
(get_flags_off): Ditto.
* config/aarch64/aarch64-opts.h (AARCH64_NO_FEATURES): Define.
* config/aarch64/aarch64-protos.h: Use AARCH64_NO_FEATURES.
* config/aarch64/aarch64-sve-builtins-sme.def
(REQUIRED_EXTENSIONS): Ditto.
* config/aarch64/aarch64-sve-builtins.cc
(function_groups): Ditto.
* config/aarch64/aarch64-sve-builtins.h:
(get_contiguous_base): Ditto.
(sve_switcher): Ditto.
* config/aarch64/aarch64.cc (all_architectures): Ditto.
(all_cores): Ditto.
(AARCH64_NO_FEATURES): Remove superceded #define and #undef.
(aarch64_override_options): Use AARCH64_NO_FEATURES.
(aarch64_process_target_attr): Remove dead initialisation.
* config/aarch64/driver-aarch64.cc
(aarch64_cpu_data): Use AARCH64_NO_FEATURES.
(aarch64_arches): Ditto.
(host_detect_local_cpu): Ditto.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
951d041d3109b935e90a7cb5d714940414e81761..162b622564ab543cadfc24a7341f1fc476733f45
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -158,7 +158,8 @@ static constexpr aarch64_option_extension all_extensions[] =
   {NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \
feature_deps::get_flags_off (feature_deps::root_off_##IDENT)},
 #include "config/aarch64/aarch64-option-extensions.def"
-  {NULL, 0, 0, 0}
+  {NULL, AARCH64_NO_FEATURES, AARCH64_NO_FEATURES,
+AARCH64_NO_FEATURES}
 };
 
 struct processor_name_to_arch
@@ -183,7 +184,7 @@ static constexpr processor_name_to_arch all_cores[] =
   {NAME, AARCH64_ARCH_##ARCH_IDENT, feature_deps::cpu_##CORE_IDENT},
 #include "config/aarch64/aarch64-cores.def"
   {"generic", AARCH64_ARCH_V8A, feature_deps::V8A ().enable},
-  {"", aarch64_no_arch, 0}
+  {"", aarch64_no_arch, AARCH64_NO_FEATURES}
 };
 
 /* Map architecture revisions to their string representation.  */
@@ -192,7 +193,7 @@ static constexpr arch_to_arch_name all_architectures[] =
 #define AARCH64_ARCH(NAME, B, ARCH_IDENT, D, E)\
   {AARCH64_ARCH_##ARCH_IDENT, NAME, feature_deps::ARCH_IDENT ().enable},
 #include "config/aarch64/aarch64-arches.def"
-  {aarch64_no_arch, "", 0}
+  {aarch64_no_arch, "", AARCH64_NO_FEATURES}
 };
 
 /* Parse the architecture extension string STR and update ISA_FLAGS
@@ -299,14 +300,14 @@ aarch64_get_extension_string_for_isa_flags
  However, assemblers with Armv8-R AArch64 support should not have this
  issue, so we don't need this fix when targeting Armv8-R.  */
   auto explicit_flags = (!(current_flags & AARCH64_FL_V8R)
-? AARCH64_FL_CRC : 0);
+? AARCH64_FL_CRC : AARCH64_NO_FEATURES);
 
   /* Add the features in isa_flags & ~current_flags using the smallest
  possible number of extensions.  We can do this by iterating over the
  array in reverse order, since the array is sorted topologically.
  But in order to make the output more readable, it seems better
  to add the strings in definition order.  */
-  aarch64_feature_flags added = 0;
+  aarch64_feature_flags added = AARCH64_NO_FEATURES;
   auto flags_crypto = AARCH64_FL_AES | AARCH64_FL_SHA2;
   for (unsigned int i = ARRAY_SIZE (all_extensions); i-- > 0; )
 {
diff --git a/gcc/config/aarch64/aarch64-feature-deps.h 
b/gcc/config/aarch64/aarch64-feature-deps.h
index 
79126db88254b89f74a8583d50a77bc27865e265..992e133d76935d411ce4cd39480c07ea18c62ddf
 100644
--- a/gcc/config/aarch64/aarch64-feature-deps.h
+++ b/gcc/config/aarch64/aarch64-feature-deps.h
@@ -26,7 +26,7 @@ namespace feature_deps {
 /* Together, these definitions of get_flags take a list of
feature names (representing functions that are defined below)
and return the set of associated flags.  */
-constexpr aarch64_feature_flags get_flags () { return 0; }
+constexpr aarch64_feature_flags get_flags () { return AARCH64_NO_FEATURES; }
 
 template
 constexpr aarch64_feature_flags
@@ -37,7 +37,7 @@ get_flags (T1 i, Ts... args)
 
 /* Like get_flags, but return the transitive closure of those features
and the ones that they rely on.  */
-constexpr aarch64_feature_flags get_enable () { return 0; }
+constexpr aarch64_feature_flags get_enable () { return AARCH64_NO_FEATURES; }
 
 template
 constexpr aarch64_feature_flags
@@ -97,9 +97,10 

[PATCH 04/12] aarch64: Don't compare aarch64_feature_flags to 0.

2024-05-14 Thread Andrew Carlotti
A later commit will disallow such comparisons.  We can instead convert
directly to a boolean value, and make sure all such conversions are
explicit.

TODO: FIX SYSREG GATING.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins.cc
(check_required_extensions): Replace comparison with 0.
(add_overloaded_function): Ditto.
* config/aarch64/aarch64.cc (aarch64_add_offset): Ditto.
(aarch64_guard_switch_pstate_sm): Ditto.
(aarch64_switch_pstate_sm): Ditto.
(aarch64_need_old_pstate_sm): Ditto.
(aarch64_epilogue_uses): Ditto.
(aarch64_update_ipa_fn_target_info): Ditto.
(aarch64_optimize_mode_switching): Ditto.
(aarch64_mode_entry): Ditto.
(aarch64_mode_exit): Ditto.
(aarch64_valid_sysreg_name_p): Ditto.
(aarch64_retrieve_sysreg): Ditto..
* config/aarch64/aarch64.h (TARGET_STREAMING_COMPATIBLE): Ditto.


diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 
d555f350cd79ebed21dab77208b0ce291ab90e79..f033db5b25371d6b20a7c3cc2a4dc5462f8f991a
 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -1125,7 +1125,7 @@ check_required_extensions (location_t location, tree 
fndecl,
   aarch64_feature_flags required_extensions)
 {
   auto missing_extensions = required_extensions & ~aarch64_asm_isa_flags;
-  if (missing_extensions == 0)
+  if (!missing_extensions)
 return check_required_registers (location, fndecl);
 
   if (missing_extensions & AARCH64_FL_SM_OFF)
@@ -1635,8 +1635,8 @@ add_overloaded_function (const function_instance 
,
   tree id = get_identifier (name);
   if (registered_function **map_value = name_map->get (id))
 gcc_assert ((*map_value)->instance == instance
-   && ((*map_value)->required_extensions
-   & ~required_extensions) == 0);
+   && !((*map_value)->required_extensions
+& ~required_extensions));
   else
 {
   registered_function 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
8eb21cfcfc1e80bef051c571ec7cfae47e3393ed..f4ab220271239ce5a750cf211120d5b37d7f8b27
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -275,7 +275,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 
 /* The current function has a streaming-compatible body.  */
 #define TARGET_STREAMING_COMPATIBLE \
-  ((aarch64_isa_flags & AARCH64_FL_SM_STATE) == 0)
+  (!(aarch64_isa_flags & AARCH64_FL_SM_STATE))
 
 /* PSTATE.ZA is enabled in the current function body.  */
 #define TARGET_ZA (AARCH64_ISA_ZA_ON)
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
582dac5129faccee0db3a68f6bdf866e8b41a059..e84151c474029b437ce67eb0cd6fca591a823b82
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4649,7 +4649,7 @@ aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx 
src,
 {
   gcc_assert (offset.coeffs[0] == offset.coeffs[1]);
   rtx offset_rtx;
-  if (force_isa_mode == 0)
+  if (!force_isa_mode)
offset_rtx = gen_int_mode (offset, mode);
   else
offset_rtx = aarch64_sme_vq_immediate (mode, offset.coeffs[0], 0);
@@ -4675,7 +4675,7 @@ aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx 
src,
   && aarch64_sve_addvl_addpl_immediate_p (poly_offset))
 {
   rtx offset_rtx;
-  if (force_isa_mode == 0)
+  if (!force_isa_mode)
offset_rtx = gen_int_mode (poly_offset, mode);
   else
offset_rtx = aarch64_sme_vq_immediate (mode, factor, 0);
@@ -4759,8 +4759,7 @@ aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx 
src,
 a shift and add sequence for the multiplication.
 If CNTB << SHIFT is out of range, stick with the current
 shift factor.  */
- if (force_isa_mode == 0
- && IN_RANGE (low_bit, 2, 16 * 16))
+ if (!force_isa_mode && IN_RANGE (low_bit, 2, 16 * 16))
{
  val = gen_int_mode (poly_int64 (low_bit, low_bit), mode);
  shift = 0;
@@ -4900,7 +4899,7 @@ static rtx_insn *
 aarch64_guard_switch_pstate_sm (rtx old_svcr, aarch64_feature_flags local_mode)
 {
   local_mode &= AARCH64_FL_SM_STATE;
-  gcc_assert (local_mode != 0);
+  gcc_assert (local_mode);
   auto already_ok_cond = (local_mode & AARCH64_FL_SM_ON ? NE : EQ);
   auto *label = gen_label_rtx ();
   auto branch = aarch64_gen_test_and_branch (already_ok_cond, old_svcr, 0,
@@ -4923,7 +4922,7 @@ aarch64_switch_pstate_sm (aarch64_feature_flags old_mode,
   gcc_assert (old_mode != new_mode);
 
   if ((new_mode & AARCH64_FL_SM_ON)
-  || (new_mode == 0 && (old_mode & AARCH64_FL_SM_OFF)))
+  || (!new_mode && (old_mode & AARCH64_FL_SM_OFF)))
 emit_insn (gen_aarch64_smstart_sm ());
   else
 emit_insn (gen_aarch64_smstop_sm ());
@@ -7724,7 

[PATCH 01/12] aarch64: Remove unused global aarch64_tune_flags

2024-05-14 Thread Andrew Carlotti
gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_tune_flags): Remove unused global variable.
(aarch64_override_options_internal): Remove dead assignment.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
662ff5a9b0c715d0cab0ae4ba63af1b3c8ebbd00..4e6ad1023f638c9756ee9503b1ecbd3c1573871a
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -349,9 +349,6 @@ static bool aarch64_print_address_internal (FILE*, 
machine_mode, rtx,
 /* The processor for which instructions should be scheduled.  */
 enum aarch64_processor aarch64_tune = cortexa53;
 
-/* Mask to specify which instruction scheduling options should be used.  */
-uint64_t aarch64_tune_flags = 0;
-
 /* Global flag for PC relative loads.  */
 bool aarch64_pcrelative_literal_loads;
 
@@ -18237,7 +18234,6 @@ void
 aarch64_override_options_internal (struct gcc_options *opts)
 {
   const struct processor *tune = aarch64_get_tune_cpu (opts->x_selected_tune);
-  aarch64_tune_flags = tune->flags;
   aarch64_tune = tune->sched_core;
   /* Make a copy of the tuning parameters attached to the core, which
  we may later overwrite.  */


[PATCH 02/12] aarch64: Move AARCH64_NUM_ISA_MODES definition

2024-05-14 Thread Andrew Carlotti
AARCH64_NUM_ISA_MODES will be used within aarch64-opts.h in a later
commit.

gcc/ChangeLog:

* config/aarch64/aarch64.h (DEF_AARCH64_ISA_MODE): Move to...
* config/aarch64/aarch64-opts.h (DEF_AARCH64_ISA_MODE): ...here.


diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
a05c0d3ded1c69802f15eebb8c150c7dcc62b4ef..06a4fed3833482543891b4f7c778933f7cebd631
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -24,6 +24,11 @@
 
 #ifndef USED_FOR_TARGET
 typedef uint64_t aarch64_feature_flags;
+
+constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
+#define DEF_AARCH64_ISA_MODE(IDENT) + 1
+#include "aarch64-isa-modes.def"
+);
 #endif
 
 /* The various cores that implement AArch64.  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
4fa1dfc79065c291ee5c97cc8f641c1f7c9919ec..8eb21cfcfc1e80bef051c571ec7cfae47e3393ed
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -189,11 +189,6 @@ enum class aarch64_feature : unsigned char {
 
 constexpr auto AARCH64_FL_SM_STATE = AARCH64_FL_SM_ON | AARCH64_FL_SM_OFF;
 
-constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
-#define DEF_AARCH64_ISA_MODE(IDENT) + 1
-#include "aarch64-isa-modes.def"
-);
-
 /* The mask of all ISA modes.  */
 constexpr auto AARCH64_FL_ISA_MODES
   = (aarch64_feature_flags (1) << AARCH64_NUM_ISA_MODES) - 1;


[PATCH 00/12] aarch64: Extend aarch64_feature_flags to 128 bits

2024-05-14 Thread Andrew Carlotti
The end goal of the series is to change the definition of aarch64_feature_flags
from a uint64_t typedef to a class with 128 bits of storage.  This class uses
operator overloading to mimic the existing integer interface as much as
possible, but with added restrictions to facilate type checking and
extensibility.

Patches 01-10 are preliminary enablement work, and have passed regression
testing.  Are these ok for master?

Patch 11 is an RFC, and the only patch that touches the middle end.  I am
seeking clarity on which part(s) of the compiler should be expected to handle
or prevent non-bool types in instruction pattern conditions.  The actual patch
does not compile by itself (though it does in combination with 12/12), but that
is not important to the questions I'm asking.

Patch 12 is then a small patch that actually replaces the uint64_t typedef with
a class.  I think this patch is fine in it's current form, but it depends on a
resolution to the issues in patch 11/12 first.


[PING gcc-14?][PATCH v2] docs: Update function multiversioning documentation

2024-05-06 Thread Andrew Carlotti
Is this patch ok? I was hoping to get it merged before 14.1 releases, if it's
not yet too late for that.

On Tue, Apr 30, 2024 at 05:10:45PM +0100, Andrew Carlotti wrote:
> Add target_version attribute to Common Function Attributes and update
> target and target_clones documentation.  Move shared detail and examples
> to the Function Multiversioning page.  Add target-specific details to
> target-specific pages.
> 
> ---
> 
> Changes since v1:
> - Various typo fixes.
> - Reordered content in 'Function multiversioning' section to put 
> implementation
>   details at the end (as suggested in review).
> - Dropped links to outdated wiki page, and a couple of other unhelpful
>   sentences that the previous version preserved.
> 
> I've built and rechecked the info output.  Ok for master?  And is this ok for
> the GCC-14 branch too?
> 
> gcc/ChangeLog:
> 
>   * doc/extend.texi (Common Function Attributes): Update target
>   and target_clones documentation, and add target_version.
>   (AArch64 Function Attributes): Add ACLE reference and list
>   supported features.
>   (PowerPC Function Attributes): List supported features.
>   (x86 Function Attributes): Mention function multiversioning.
>   (Function Multiversioning): Update, and move shared detail here.
> 
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 
> e290265d68d33f86a7e7ee9882cc0fd6bed00143..fefac70b5fffc350bf23db74a8fc88fa3bb99bd5
>  100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -4178,17 +4178,16 @@ and @option{-Wanalyzer-tainted-size}.
>  Multiple target back ends implement the @code{target} attribute
>  to specify that a function is to
>  be compiled with different target options than specified on the
> -command line.  The original target command-line options are ignored.
> -One or more strings can be provided as arguments.
> -Each string consists of one or more comma-separated suffixes to
> -the @code{-m} prefix jointly forming the name of a machine-dependent
> -option.  @xref{Submodel Options,,Machine-Dependent Options}.
> -
> +command line.  One or more strings can be provided as arguments.
>  The @code{target} attribute can be used for instance to have a function
>  compiled with a different ISA (instruction set architecture) than the
> -default.  @samp{#pragma GCC target} can be used to specify target-specific
> -options for more than one function.  @xref{Function Specific Option Pragmas},
> -for details about the pragma.
> +default.
> +
> +The options supported by the @code{target} attribute are specific to each
> +target; refer to @ref{x86 Function Attributes}, @ref{PowerPC Function
> +Attributes}, @ref{ARM Function Attributes}, @ref{AArch64 Function 
> Attributes},
> +@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> +for details.
>  
>  For instance, on an x86, you could declare one function with the
>  @code{target("sse4.1,arch=core2")} attribute and another with
> @@ -4211,39 +4210,26 @@ multiple options is equivalent to separating the 
> option suffixes with
>  a comma (@samp{,}) within a single string.  Spaces are not permitted
>  within the strings.
>  
> -The options supported are specific to each target; refer to @ref{x86
> -Function Attributes}, @ref{PowerPC Function Attributes},
> -@ref{ARM Function Attributes}, @ref{AArch64 Function Attributes},
> -@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> -for details.
> +@samp{#pragma GCC target} can be used to specify target-specific
> +options for more than one function.  @xref{Function Specific Option Pragmas},
> +for details about the pragma.
> +
> +On x86, the @code{target} attribute can also be used to create multiple
> +versions of a function, compiled with different target-specific options.
> +@xref{Function Multiversioning} for more details.
>  
>  @cindex @code{target_clones} function attribute
>  @item target_clones (@var{options})
>  The @code{target_clones} attribute is used to specify that a function
> -be cloned into multiple versions compiled with different target options
> -than specified on the command line.  The supported options and restrictions
> -are the same as for @code{target} attribute.
> -
> -For instance, on an x86, you could compile a function with
> -@code{target_clones("sse4.1,avx")}.  GCC creates two function clones,
> -one compiled with @option{-msse4.1} and another with @option{-mavx}.
> -
> -On a PowerPC, you can compile a function with
> -@code{target_clones("cpu=power9,default")}.  GCC will create two
> -function clones, one compiled with @option{-mcpu=power9} and another
> -with the default options.  GCC must be configured 

[PATCH v2] docs: Update function multiversioning documentation

2024-04-30 Thread Andrew Carlotti
Add target_version attribute to Common Function Attributes and update
target and target_clones documentation.  Move shared detail and examples
to the Function Multiversioning page.  Add target-specific details to
target-specific pages.

---

Changes since v1:
- Various typo fixes.
- Reordered content in 'Function multiversioning' section to put implementation
  details at the end (as suggested in review).
- Dropped links to outdated wiki page, and a couple of other unhelpful
  sentences that the previous version preserved.

I've built and rechecked the info output.  Ok for master?  And is this ok for
the GCC-14 branch too?

gcc/ChangeLog:

* doc/extend.texi (Common Function Attributes): Update target
and target_clones documentation, and add target_version.
(AArch64 Function Attributes): Add ACLE reference and list
supported features.
(PowerPC Function Attributes): List supported features.
(x86 Function Attributes): Mention function multiversioning.
(Function Multiversioning): Update, and move shared detail here.


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 
e290265d68d33f86a7e7ee9882cc0fd6bed00143..fefac70b5fffc350bf23db74a8fc88fa3bb99bd5
 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -4178,17 +4178,16 @@ and @option{-Wanalyzer-tainted-size}.
 Multiple target back ends implement the @code{target} attribute
 to specify that a function is to
 be compiled with different target options than specified on the
-command line.  The original target command-line options are ignored.
-One or more strings can be provided as arguments.
-Each string consists of one or more comma-separated suffixes to
-the @code{-m} prefix jointly forming the name of a machine-dependent
-option.  @xref{Submodel Options,,Machine-Dependent Options}.
-
+command line.  One or more strings can be provided as arguments.
 The @code{target} attribute can be used for instance to have a function
 compiled with a different ISA (instruction set architecture) than the
-default.  @samp{#pragma GCC target} can be used to specify target-specific
-options for more than one function.  @xref{Function Specific Option Pragmas},
-for details about the pragma.
+default.
+
+The options supported by the @code{target} attribute are specific to each
+target; refer to @ref{x86 Function Attributes}, @ref{PowerPC Function
+Attributes}, @ref{ARM Function Attributes}, @ref{AArch64 Function Attributes},
+@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
+for details.
 
 For instance, on an x86, you could declare one function with the
 @code{target("sse4.1,arch=core2")} attribute and another with
@@ -4211,39 +4210,26 @@ multiple options is equivalent to separating the option 
suffixes with
 a comma (@samp{,}) within a single string.  Spaces are not permitted
 within the strings.
 
-The options supported are specific to each target; refer to @ref{x86
-Function Attributes}, @ref{PowerPC Function Attributes},
-@ref{ARM Function Attributes}, @ref{AArch64 Function Attributes},
-@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
-for details.
+@samp{#pragma GCC target} can be used to specify target-specific
+options for more than one function.  @xref{Function Specific Option Pragmas},
+for details about the pragma.
+
+On x86, the @code{target} attribute can also be used to create multiple
+versions of a function, compiled with different target-specific options.
+@xref{Function Multiversioning} for more details.
 
 @cindex @code{target_clones} function attribute
 @item target_clones (@var{options})
 The @code{target_clones} attribute is used to specify that a function
-be cloned into multiple versions compiled with different target options
-than specified on the command line.  The supported options and restrictions
-are the same as for @code{target} attribute.
-
-For instance, on an x86, you could compile a function with
-@code{target_clones("sse4.1,avx")}.  GCC creates two function clones,
-one compiled with @option{-msse4.1} and another with @option{-mavx}.
-
-On a PowerPC, you can compile a function with
-@code{target_clones("cpu=power9,default")}.  GCC will create two
-function clones, one compiled with @option{-mcpu=power9} and another
-with the default options.  GCC must be configured to use GLIBC 2.23 or
-newer in order to use the @code{target_clones} attribute.
-
-It also creates a resolver function (see
-the @code{ifunc} attribute above) that dynamically selects a clone
-suitable for current architecture.  The resolver is created only if there
-is a usage of a function with @code{target_clones} attribute.
-
-Note that any subsequent call of a function without @code{target_clone}
-from a @code{target_clone} caller will not lead to copying
-(target clone) of the called function.
-If you want to enforce such behaviour,
-we recommend declaring the calling function with the @code{flatten} attribute?
+should be cloned into multiple versions 

Re: [PATCH] docs: Update function multiversioning documentation

2024-04-30 Thread Andrew Carlotti
On Fri, Apr 12, 2024 at 05:41:11PM +0100, Richard Sandiford wrote:
> Hi Andrew,
> 
> Thanks for doing this.  I think it improves the organisation of the
> FMV documentation and adds some details that were previously missing.
> 
> I've made some suggestions below, but documentation is subjective
> and I realise that not everyone will agree with them.
> 
> I've also added Sandra to cc: in case she has time to help with this.
> [original patch: 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649071.html]
> 
> Andrew Carlotti  writes:
> > Add target_version attribute to Common Function Attributes and update
> > target and target_clones documentation.  Move shared detail and examples
> > to the Function Multiversioning page.  Add target-specific details to
> > target-specific pages.
> >
> > ---
> >
> > I've built and checked the info and dvi outputs.  Ok for master?
> >
> > gcc/ChangeLog:
> >
> > * doc/extend.texi (Common Function Attributes): Update target
> > and target_clones documentation, and add target_version.
> > (AArch64 Function Attributes): Add ACLE reference and list
> > supported features.
> > (PowerPC Function Attributes): List supported features.
> > (x86 Function Attributes): Mention function multiversioning.
> > (Function Multiversioning): Update, and move shared detail here.
> >
> >
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index 
> > 7b54a241a7bfde03ce86571be9486b30bcea6200..78cc7ad2903b61a06b618b82ba7ad52ed42d944a
> >  100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -4178,18 +4178,27 @@ and @option{-Wanalyzer-tainted-size}.
> >  Multiple target back ends implement the @code{target} attribute
> >  to specify that a function is to
> >  be compiled with different target options than specified on the
> > -command line.  The original target command-line options are ignored.
> > -One or more strings can be provided as arguments.
> > -Each string consists of one or more comma-separated suffixes to
> > -the @code{-m} prefix jointly forming the name of a machine-dependent
> > -option.  @xref{Submodel Options,,Machine-Dependent Options}.
> > -
> > +command line.  One or more strings can be provided as arguments.
> > +The attribute may override the original target command-line options, or it 
> > may
> > +be combined with them in a target-specific manner.
> 
> It's hard to tell from this what the conditions for "may" are,
> e.g. whether it depends on the arguments, on the back end, or both.
> Could you add a bit more text to clarify (even if it's just a forward
> reference)?

I think it's better just to drop this sentence and leave it to the
target-specific documentation to cover this.

> With that extra text, and perhaps without, I think it's clearer to
> say this after...
> 
> >  The @code{target} attribute can be used for instance to have a function
> >  compiled with a different ISA (instruction set architecture) than the
> > -default.  @samp{#pragma GCC target} can be used to specify target-specific
> > +default.
> 
> ...this.  I.e.:
> 
>   Multiple target back ends implement [...] command-line.  
>   The @code{target} attribute can be used [...] the default.
> 
>   
> 
> > +
> > +@samp{#pragma GCC target} can be used to specify target-specific
> >  options for more than one function.  @xref{Function Specific Option 
> > Pragmas},
> >  for details about the pragma.
> >  
> > +On x86, the @code{target} attribute can also be used to create multiple
> > +versions of a function, compiled with different target-specific options.
> > +@xref{Function Multiversioning} for more details.
> 
> It might be clearer to put this at the end, since the rest of the section
> goes back to talking about the non-FMV usage.  Perhaps the same goes for
> the pragma part.

Agreed - I've reordered this.

> 
> Also, how about saying that, on AArch64, the equivalent functionality
> is provided by the target_version attribute?

After reording, this paragraph immediately precedes the short descriptions of
target_clones and target_version, with the latter explicitly referring to
AArch64.  I don't think another mention of target_version is necessary.

> > +
> > +The options supported by the @code{target} attribute are specific to each
> > +target; refer to @ref{x86 Function Attributes}, @ref{PowerPC Function
> > +Attributes}, @ref{ARM Function Attributes}, @ref{AArch64 Function 
> > Attributes},
> > +@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> > 

Re: [PATCH] aarch64: Add rcpc3 dependency on rcpc2 and rcpc

2024-04-12 Thread Andrew Carlotti
On Fri, Apr 12, 2024 at 06:00:24PM +0100, Andrew Carlotti wrote:
> On Fri, Apr 12, 2024 at 04:49:03PM +0100, Richard Sandiford wrote:
> > Andrew Carlotti  writes:
> > > We don't yet have a separate feature flag for FEAT_LRCPC2 (and adding
> > > one will require extending the feature bitmask).  Instead, make the
> > > FEAT_LRCPC patterns available when either armv8.4-a or +rcpc3 is
> > > specified.  On the other hand, we already have a +rcpc flag, so this
> > > dependency can be specified directly.
> > >
> > > The cpunative test needed updating because it used an invalid Features
> > > list, since lrcpc3 requires both ilrcpc and lrcpc to be present.
> > > Without this change, host_detect_local_cpu would return the architecture
> > > string 'armv8-a+dotprod+crc+crypto+rcpc3+norcpc'.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64-option-extensions.def: Add RCPC to
> > >   RCPC3 dependencies.
> > >   * config/aarch64/aarch64.h (AARCH64_ISA_RCPC8_4): Add test for
> > >   RCPC3 bit
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/aarch64/cpunative/info_24: Include lrcpc and ilrcpc.
> > >
> > > ---
> > >
> > > Bootstrapped and regression tested on aarch64.  I also verified that the
> > > atomic-store.c and ldapr-sext.c tests would pass when replacing 
> > > 'armv8.4-a'
> > > with 'armv8-a+rcpc3'.
> > >
> > > Ok for master?
> > >
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> > > b/gcc/config/aarch64/aarch64-option-extensions.def
> > > index 
> > > 3155eccd39c8e6825b7fc2bb0d0514c2e7e559bf..42ec0eec31e2ddb0cc6f83fdbaf0fd4eac5ca7f4
> > >  100644
> > > --- a/gcc/config/aarch64/aarch64-option-extensions.def
> > > +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> > > @@ -153,7 +153,7 @@ AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
> > >  
> > >  AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
> > >  
> > > -AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "lrcpc3")
> > > +AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (RCPC), (), (), "lrcpc3")
> > >  
> > >  AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
> > >  
> > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > > index 
> > > 45e901cda644dbe4eaae709e685954f1a6f7dbcf..5870e3f812f6cb0674488b8e17ab7278003d2d54
> > >  100644
> > > --- a/gcc/config/aarch64/aarch64.h
> > > +++ b/gcc/config/aarch64/aarch64.h
> > > @@ -242,7 +242,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
> > > AARCH64_FL_SM_OFF;
> > >  #define AARCH64_ISA_SHA3(aarch64_isa_flags & AARCH64_FL_SHA3)
> > >  #define AARCH64_ISA_F16FML  (aarch64_isa_flags & 
> > > AARCH64_FL_F16FML)
> > >  #define AARCH64_ISA_RCPC(aarch64_isa_flags & AARCH64_FL_RCPC)
> > > -#define AARCH64_ISA_RCPC8_4 (aarch64_isa_flags & 
> > > AARCH64_FL_V8_4A)
> > > +#define AARCH64_ISA_RCPC8_4 (aarch64_isa_flags \
> > > + & (AARCH64_FL_V8_4A | AARCH64_FL_RCPC3))
> > 
> > It looks like the effect of these two changes is that:
> > 
> > * armv9-a+rcpc3+norcpc leaves TARGET_RCPC2 true and TARGET_RCPC and
> >   TARGET_RCPC3 false.
> > 
> > * armv8-a+rcpc3+norcpc correctly leaves all three false.
> > 
> > If we add the RCPC3->RCPC dependency then I think we should also
> > require FL_RCPC alongside FL_V8_4A.  I.e.:
> > 
> > #define AARCH64_ISA_RCPC8_4 (AARCH64_ISA_RCPC \
> >  && (aarch64_isa_flags \
> >  & (AARCH64_FL_V8_4A | AARCH64_FL_RCPC3)))
> 
> Good spot! I'll go with the following instead (for formatting reasons), if it
> passes testing:
> 
> #define AARCH64_ISA_RCPC8_4((AARCH64_ISA_RCPC && AARCH_ISA_V8_4A) \
>   || (aarch64_isa_flags & AARCH64_FL_RCPC3))

I missed the 64 in AARCH64_ISA_V8_4A.  The corrected version passed testing and
is now merged.

> > OK with that change, thanks.
> > 
> > Richard
> > 
> > 
> > >  #define AARCH64_ISA_RNG (aarch64_isa_flags & AARCH64_FL_RNG)
> > >  #define AARCH64_ISA_V8_5A   (aarch64_isa_flags & 
> > > AARCH64_FL_V

Re: [PATCH] aarch64: Add rcpc3 dependency on rcpc2 and rcpc

2024-04-12 Thread Andrew Carlotti
On Fri, Apr 12, 2024 at 04:49:03PM +0100, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > We don't yet have a separate feature flag for FEAT_LRCPC2 (and adding
> > one will require extending the feature bitmask).  Instead, make the
> > FEAT_LRCPC patterns available when either armv8.4-a or +rcpc3 is
> > specified.  On the other hand, we already have a +rcpc flag, so this
> > dependency can be specified directly.
> >
> > The cpunative test needed updating because it used an invalid Features
> > list, since lrcpc3 requires both ilrcpc and lrcpc to be present.
> > Without this change, host_detect_local_cpu would return the architecture
> > string 'armv8-a+dotprod+crc+crypto+rcpc3+norcpc'.
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-option-extensions.def: Add RCPC to
> > RCPC3 dependencies.
> > * config/aarch64/aarch64.h (AARCH64_ISA_RCPC8_4): Add test for
> > RCPC3 bit
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/cpunative/info_24: Include lrcpc and ilrcpc.
> >
> > ---
> >
> > Bootstrapped and regression tested on aarch64.  I also verified that the
> > atomic-store.c and ldapr-sext.c tests would pass when replacing 'armv8.4-a'
> > with 'armv8-a+rcpc3'.
> >
> > Ok for master?
> >
> >
> > diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> > b/gcc/config/aarch64/aarch64-option-extensions.def
> > index 
> > 3155eccd39c8e6825b7fc2bb0d0514c2e7e559bf..42ec0eec31e2ddb0cc6f83fdbaf0fd4eac5ca7f4
> >  100644
> > --- a/gcc/config/aarch64/aarch64-option-extensions.def
> > +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> > @@ -153,7 +153,7 @@ AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
> >  
> >  AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
> >  
> > -AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "lrcpc3")
> > +AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (RCPC), (), (), "lrcpc3")
> >  
> >  AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
> >  
> > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > index 
> > 45e901cda644dbe4eaae709e685954f1a6f7dbcf..5870e3f812f6cb0674488b8e17ab7278003d2d54
> >  100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -242,7 +242,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
> > AARCH64_FL_SM_OFF;
> >  #define AARCH64_ISA_SHA3  (aarch64_isa_flags & AARCH64_FL_SHA3)
> >  #define AARCH64_ISA_F16FML(aarch64_isa_flags & AARCH64_FL_F16FML)
> >  #define AARCH64_ISA_RCPC  (aarch64_isa_flags & AARCH64_FL_RCPC)
> > -#define AARCH64_ISA_RCPC8_4   (aarch64_isa_flags & 
> > AARCH64_FL_V8_4A)
> > +#define AARCH64_ISA_RCPC8_4   (aarch64_isa_flags \
> > +   & (AARCH64_FL_V8_4A | AARCH64_FL_RCPC3))
> 
> It looks like the effect of these two changes is that:
> 
> * armv9-a+rcpc3+norcpc leaves TARGET_RCPC2 true and TARGET_RCPC and
>   TARGET_RCPC3 false.
> 
> * armv8-a+rcpc3+norcpc correctly leaves all three false.
> 
> If we add the RCPC3->RCPC dependency then I think we should also
> require FL_RCPC alongside FL_V8_4A.  I.e.:
> 
> #define AARCH64_ISA_RCPC8_4   (AARCH64_ISA_RCPC \
>&& (aarch64_isa_flags \
>& (AARCH64_FL_V8_4A | AARCH64_FL_RCPC3)))

Good spot! I'll go with the following instead (for formatting reasons), if it
passes testing:

#define AARCH64_ISA_RCPC8_4((AARCH64_ISA_RCPC && AARCH_ISA_V8_4A) \
|| (aarch64_isa_flags & AARCH64_FL_RCPC3))

> OK with that change, thanks.
> 
> Richard
> 
> 
> >  #define AARCH64_ISA_RNG   (aarch64_isa_flags & AARCH64_FL_RNG)
> >  #define AARCH64_ISA_V8_5A (aarch64_isa_flags & AARCH64_FL_V8_5A)
> >  #define AARCH64_ISA_TME   (aarch64_isa_flags & AARCH64_FL_TME)
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24 
> > b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> > index 
> > 8d3c16a10910af977c560782f9d659c0e51286fd..3c64e00ca3a416ef565bc0b4a5b3e5bd9cfc41bc
> >  100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> > @@ -1,8 +1,8 @@
> >  processor  : 0
> >  BogoMIPS   : 100.00
> > -Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc3
> > +Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc 
> > ilrcpc lrcpc3
> >  CPU implementer: 0xfe
> >  CPU architecture: 8
> >  CPU variant: 0x0
> >  CPU part   : 0xd08
> > -CPU revision   : 2
> > \ No newline at end of file
> > +CPU revision   : 2


[PATCH] aarch64: Add rcpc3 dependency on rcpc2 and rcpc

2024-04-12 Thread Andrew Carlotti
We don't yet have a separate feature flag for FEAT_LRCPC2 (and adding
one will require extending the feature bitmask).  Instead, make the
FEAT_LRCPC patterns available when either armv8.4-a or +rcpc3 is
specified.  On the other hand, we already have a +rcpc flag, so this
dependency can be specified directly.

The cpunative test needed updating because it used an invalid Features
list, since lrcpc3 requires both ilrcpc and lrcpc to be present.
Without this change, host_detect_local_cpu would return the architecture
string 'armv8-a+dotprod+crc+crypto+rcpc3+norcpc'.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def: Add RCPC to
RCPC3 dependencies.
* config/aarch64/aarch64.h (AARCH64_ISA_RCPC8_4): Add test for
RCPC3 bit

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_24: Include lrcpc and ilrcpc.

---

Bootstrapped and regression tested on aarch64.  I also verified that the
atomic-store.c and ldapr-sext.c tests would pass when replacing 'armv8.4-a'
with 'armv8-a+rcpc3'.

Ok for master?


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
3155eccd39c8e6825b7fc2bb0d0514c2e7e559bf..42ec0eec31e2ddb0cc6f83fdbaf0fd4eac5ca7f4
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -153,7 +153,7 @@ AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
 
 AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
 
-AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "lrcpc3")
+AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (RCPC), (), (), "lrcpc3")
 
 AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
45e901cda644dbe4eaae709e685954f1a6f7dbcf..5870e3f812f6cb0674488b8e17ab7278003d2d54
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -242,7 +242,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 #define AARCH64_ISA_SHA3  (aarch64_isa_flags & AARCH64_FL_SHA3)
 #define AARCH64_ISA_F16FML(aarch64_isa_flags & AARCH64_FL_F16FML)
 #define AARCH64_ISA_RCPC  (aarch64_isa_flags & AARCH64_FL_RCPC)
-#define AARCH64_ISA_RCPC8_4   (aarch64_isa_flags & AARCH64_FL_V8_4A)
+#define AARCH64_ISA_RCPC8_4   (aarch64_isa_flags \
+   & (AARCH64_FL_V8_4A | AARCH64_FL_RCPC3))
 #define AARCH64_ISA_RNG   (aarch64_isa_flags & AARCH64_FL_RNG)
 #define AARCH64_ISA_V8_5A (aarch64_isa_flags & AARCH64_FL_V8_5A)
 #define AARCH64_ISA_TME   (aarch64_isa_flags & AARCH64_FL_TME)
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24 
b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
index 
8d3c16a10910af977c560782f9d659c0e51286fd..3c64e00ca3a416ef565bc0b4a5b3e5bd9cfc41bc
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
@@ -1,8 +1,8 @@
 processor  : 0
 BogoMIPS   : 100.00
-Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc3
+Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc 
ilrcpc lrcpc3
 CPU implementer: 0xfe
 CPU architecture: 8
 CPU variant: 0x0
 CPU part   : 0xd08
-CPU revision   : 2
\ No newline at end of file
+CPU revision   : 2


[PATCH] aarch64: Enable +cssc for armv8.9-a

2024-04-12 Thread Andrew Carlotti
FEAT_CSSC is mandatory in the architecture from Armv8.9.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def: Add CSSC to V8_9A
dependencies.

---

Bootstrapped and regression tested on aarch64.  Ok for master?


diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 
9bec30e9203bac01155281ef3474846c402bb29e..4634b272e28006b5c6c2d6705a2f1010cbd9ab9b
 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -39,7 +39,7 @@ AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 
8,  (V8_4A, SB, SSBS
 AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
 AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A))
 AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, MOPS))
-AARCH64_ARCH("armv8.9-a", generic_armv8_a,   V8_9A, 8,  (V8_8A))
+AARCH64_ARCH("armv8.9-a", generic_armv8_a,   V8_9A, 8,  (V8_8A, CSSC))
 AARCH64_ARCH("armv8-r",   generic_armv8_a,   V8R  , 8,  (V8_4A))
 AARCH64_ARCH("armv9-a",   generic_armv9_a,   V9A  , 9,  (V8_5A, SVE2))
 AARCH64_ARCH("armv9.1-a", generic_armv9_a,   V9_1A, 9,  (V8_6A, V9A))


Re: [PATCH] docs: Update function multiversioning documentation

2024-04-12 Thread Andrew Carlotti
Resending to CC some relevant reviewers.

I'll remove "memtag", "ssbs" and "ls64" from the AArch64 feature list before
committing, following changes to my recent AArch64 patch series.

On Tue, Apr 09, 2024 at 02:35:48PM +0100, Andrew Carlotti wrote:
> Add target_version attribute to Common Function Attributes and update
> target and target_clones documentation.  Move shared detail and examples
> to the Function Multiversioning page.  Add target-specific details to
> target-specific pages.
> 
> ---
> 
> I've built and checked the info and dvi outputs.  Ok for master?
> 
> gcc/ChangeLog:
> 
>   * doc/extend.texi (Common Function Attributes): Update target
>   and target_clones documentation, and add target_version.
>   (AArch64 Function Attributes): Add ACLE reference and list
>   supported features.
>   (PowerPC Function Attributes): List supported features.
>   (x86 Function Attributes): Mention function multiversioning.
>   (Function Multiversioning): Update, and move shared detail here.
> 
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 
> 7b54a241a7bfde03ce86571be9486b30bcea6200..78cc7ad2903b61a06b618b82ba7ad52ed42d944a
>  100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -4178,18 +4178,27 @@ and @option{-Wanalyzer-tainted-size}.
>  Multiple target back ends implement the @code{target} attribute
>  to specify that a function is to
>  be compiled with different target options than specified on the
> -command line.  The original target command-line options are ignored.
> -One or more strings can be provided as arguments.
> -Each string consists of one or more comma-separated suffixes to
> -the @code{-m} prefix jointly forming the name of a machine-dependent
> -option.  @xref{Submodel Options,,Machine-Dependent Options}.
> -
> +command line.  One or more strings can be provided as arguments.
> +The attribute may override the original target command-line options, or it 
> may
> +be combined with them in a target-specific manner.
>  The @code{target} attribute can be used for instance to have a function
>  compiled with a different ISA (instruction set architecture) than the
> -default.  @samp{#pragma GCC target} can be used to specify target-specific
> +default.
> +
> +@samp{#pragma GCC target} can be used to specify target-specific
>  options for more than one function.  @xref{Function Specific Option Pragmas},
>  for details about the pragma.
>  
> +On x86, the @code{target} attribute can also be used to create multiple
> +versions of a function, compiled with different target-specific options.
> +@xref{Function Multiversioning} for more details.
> +
> +The options supported by the @code{target} attribute are specific to each
> +target; refer to @ref{x86 Function Attributes}, @ref{PowerPC Function
> +Attributes}, @ref{ARM Function Attributes}, @ref{AArch64 Function 
> Attributes},
> +@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> +for details.
> +
>  For instance, on an x86, you could declare one function with the
>  @code{target("sse4.1,arch=core2")} attribute and another with
>  @code{target("sse4a,arch=amdfam10")}.  This is equivalent to
> @@ -4211,39 +4220,18 @@ multiple options is equivalent to separating the 
> option suffixes with
>  a comma (@samp{,}) within a single string.  Spaces are not permitted
>  within the strings.
>  
> -The options supported are specific to each target; refer to @ref{x86
> -Function Attributes}, @ref{PowerPC Function Attributes},
> -@ref{ARM Function Attributes}, @ref{AArch64 Function Attributes},
> -@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> -for details.
> -
>  @cindex @code{target_clones} function attribute
>  @item target_clones (@var{options})
>  The @code{target_clones} attribute is used to specify that a function
> -be cloned into multiple versions compiled with different target options
> -than specified on the command line.  The supported options and restrictions
> -are the same as for @code{target} attribute.
> -
> -For instance, on an x86, you could compile a function with
> -@code{target_clones("sse4.1,avx")}.  GCC creates two function clones,
> -one compiled with @option{-msse4.1} and another with @option{-mavx}.
> -
> -On a PowerPC, you can compile a function with
> -@code{target_clones("cpu=power9,default")}.  GCC will create two
> -function clones, one compiled with @option{-mcpu=power9} and another
> -with the default options.  GCC must be configured to use GLIBC 2.23 or
> -newer in order to use the @code{target_clones} attribute.
> -
> -It also creates a resolver function (see
> -the @code{i

[committed 5/5 v2] aarch64: Remove FMV features whose names may change

2024-04-11 Thread Andrew Carlotti
Some architecture features have been combined under a single command
line flag, but have been assigned multiple FMV feature names with the
command line flag name enabling only a subset of these features in
the FMV specification.  I've proposed reallocating names in the FMV
specification to match the command line flags [1], but for GCC 14 we'll
just remove them from the FMV feature list.

[1] https://github.com/ARM-software/acle/pull/315

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def:
Remove "memtag", "memtag2", "ssbs", "ssbs2", "ls64", "ls64_v"
and "ls64_accdata" FMV features.


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
54bbf9c41e794786dffd69dd103fcbbca0a49f1f..3155eccd39c8e6825b7fc2bb0d0514c2e7e559bf
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -194,17 +194,13 @@ AARCH64_FMV_FEATURE("sve2-sm4", SVE_SM4, (SVE2_SM4))
 
 AARCH64_OPT_FMV_EXTENSION("sme", SME, (BF16, SVE2), (), (), "sme")
 
-AARCH64_OPT_FMV_EXTENSION("memtag", MEMTAG, (), (), (), "")
-
-AARCH64_FMV_FEATURE("memtag2", MEMTAG2, (MEMTAG))
+AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "")
 
 AARCH64_OPT_FMV_EXTENSION("sb", SB, (), (), (), "sb")
 
 AARCH64_OPT_FMV_EXTENSION("predres", PREDRES, (), (), (), "")
 
-AARCH64_OPT_FMV_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
-
-AARCH64_FMV_FEATURE("ssbs2", SSBS2, (SSBS))
+AARCH64_OPT_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
 
 AARCH64_OPT_EXTENSION("profile", PROFILE, (), (), (), "")
 
@@ -214,12 +210,6 @@ AARCH64_OPT_EXTENSION("pauth", PAUTH, (), (), (), "paca 
pacg")
 
 AARCH64_OPT_EXTENSION("ls64", LS64, (), (), (), "")
 
-AARCH64_FMV_FEATURE("ls64", LS64, ())
-
-AARCH64_FMV_FEATURE("ls64_v", LS64_V, ())
-
-AARCH64_FMV_FEATURE("ls64_accdata", LS64_ACCDATA, (LS64))
-
 AARCH64_OPT_EXTENSION("sme-f64f64", SME_F64F64, (SME), (), (), "")
 
 AARCH64_FMV_FEATURE("sme-f64f64", SME_F64, (SME_F64F64))


[committed 3/5] aarch64: Fix typo and make rdma/rdm alias for FMV

2024-04-11 Thread Andrew Carlotti
gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def:
Fix "rmd"->"rdm", and add FMV to "rdma".
* config/aarch64/aarch64.cc (FEAT_RDMA): Define as FEAT_RDM.


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
0078dd092884a94d2a339b5238b8d19747ff9fa1..b7b307b24eadd83a6d083955f5b30814b7212712
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -117,9 +117,10 @@ AARCH64_OPT_FMV_EXTENSION("sm4", SM4, (SIMD), (), (), "sm3 
sm4")
 
 /* An explicit +rdma implies +simd, but +rdma+nosimd still enables scalar
RDMA instructions.  */
-AARCH64_OPT_EXTENSION("rdma", RDMA, (), (SIMD), (), "asimdrdm")
+AARCH64_OPT_FMV_EXTENSION("rdma", RDMA, (), (SIMD), (), "asimdrdm")
 
-AARCH64_FMV_FEATURE("rmd", RDM, (RDMA))
+/* rdm is an alias for rdma.  */
+AARCH64_FMV_FEATURE("rdm", RDM, (RDMA))
 
 AARCH64_OPT_FMV_EXTENSION("crc", CRC, (), (), (), "crc32")
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
4f708213551523b4af966ac8521754df5eb6bf3c..e6703e06d55d33163da5abb7ea518671e76c26d3
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19657,6 +19657,10 @@ typedef struct
 #define AARCH64_FMV_FEATURE(NAME, FEAT_NAME, C) \
   {NAME, 1ULL << FEAT_##FEAT_NAME, ::feature_deps::fmv_deps_##FEAT_NAME},
 
+/* The "rdma" alias uses a different FEAT_NAME to avoid a duplicate
+   feature_deps name.  */
+#define FEAT_RDMA FEAT_RDM
+
 /* FMV features are listed in priority order, to make it easier to sort target
strings.  */
 static aarch64_fmv_feature_datum aarch64_fmv_feature_data[] = {


[committed 4/5] aarch64: Remove unsupported FMV features

2024-04-11 Thread Andrew Carlotti
It currently isn't possible to support function multiversioning features
properly in GCC without also enabling the extension in the command line
options (with the exception of features such as "rpres" that do not
require assembler support).  We therefore remove unsupported features
from GCC's list of FMV features.

Some of these features ("fcma", "jscvt", "frintts", "flagm2", "wfxt",
"rcpc2", and perhaps "dpb" and "dpb2") will be added back in the future
once support for the command line option has been added.

The rest of the removed features I have proposed removing from the ACLE
specification as well, since it doesn't seem worthwhile to include support
for them; see the ACLE pull request for more detailed justification:
https://github.com/ARM-software/acle/pull/315

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def:
Remove "flagm2", "sha1", "pmull", "dit", "dpb", "dpb2", "jscvt",
"fcma", "rcpc2", "frintts", "dgh", "ebf16", "sve-bf16",
"sve-ebf16", "sve-i8mm", "sve2-pmull128", "memtag3", "bti" and
"wfxt" entries.


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
b7b307b24eadd83a6d083955f5b30814b7212712..54bbf9c41e794786dffd69dd103fcbbca0a49f1f
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -103,8 +103,6 @@ AARCH64_OPT_FMV_EXTENSION("rng", RNG, (), (), (), "rng")
 
 AARCH64_OPT_FMV_EXTENSION("flagm", FLAGM, (), (), (), "flagm")
 
-AARCH64_FMV_FEATURE("flagm2", FLAGM2, (FLAGM))
-
 AARCH64_OPT_FMV_EXTENSION("lse", LSE, (), (), (), "atomics")
 
 AARCH64_OPT_FMV_EXTENSION("fp", FP, (), (), (), "fp")
@@ -124,16 +122,12 @@ AARCH64_FMV_FEATURE("rdm", RDM, (RDMA))
 
 AARCH64_OPT_FMV_EXTENSION("crc", CRC, (), (), (), "crc32")
 
-AARCH64_FMV_FEATURE("sha1", SHA1, ())
-
 AARCH64_OPT_FMV_EXTENSION("sha2", SHA2, (SIMD), (), (), "sha1 sha2")
 
 AARCH64_FMV_FEATURE("sha3", SHA3, (SHA3))
 
 AARCH64_OPT_FMV_EXTENSION("aes", AES, (SIMD), (), (), "aes")
 
-AARCH64_FMV_FEATURE("pmull", PMULL, ())
-
 /* +nocrypto disables AES, SHA2 and SM4, and anything that depends on them
(such as SHA3 and the SVE2 crypto extensions).  */
 AARCH64_OPT_EXTENSION("crypto", CRYPTO, (AES, SHA2), (), (AES, SHA2, SM4),
@@ -157,44 +151,20 @@ AARCH64_OPT_EXTENSION("fp16fml", F16FML, (), (F16), (), 
"asimdfhm")
 
 AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
 
-AARCH64_FMV_FEATURE("dit", DIT, ())
-
-AARCH64_FMV_FEATURE("dpb", DPB, ())
-
-AARCH64_FMV_FEATURE("dpb2", DPB2, ())
-
-AARCH64_FMV_FEATURE("jscvt", JSCVT, ())
-
-AARCH64_FMV_FEATURE("fcma", FCMA, (SIMD))
-
 AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
 
-AARCH64_FMV_FEATURE("rcpc2", RCPC2, (RCPC))
-
 AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "lrcpc3")
 
-AARCH64_FMV_FEATURE("frintts", FRINTTS, ())
-
-AARCH64_FMV_FEATURE("dgh", DGH, ())
-
 AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
 
 /* An explicit +bf16 implies +simd, but +bf16+nosimd still enables scalar BF16
instructions.  */
 AARCH64_OPT_FMV_EXTENSION("bf16", BF16, (FP), (SIMD), (), "bf16")
 
-AARCH64_FMV_FEATURE("ebf16", EBF16, (BF16))
-
 AARCH64_FMV_FEATURE("rpres", RPRES, ())
 
 AARCH64_OPT_FMV_EXTENSION("sve", SVE, (SIMD, F16), (), (), "sve")
 
-AARCH64_FMV_FEATURE("sve-bf16", SVE_BF16, (SVE, BF16))
-
-AARCH64_FMV_FEATURE("sve-ebf16", SVE_EBF16, (SVE, BF16))
-
-AARCH64_FMV_FEATURE("sve-i8mm", SVE_I8MM, (SVE, I8MM))
-
 AARCH64_OPT_EXTENSION("f32mm", F32MM, (SVE), (), (), "f32mm")
 
 AARCH64_FMV_FEATURE("f32mm", SVE_F32MM, (F32MM))
@@ -209,8 +179,6 @@ AARCH64_OPT_EXTENSION("sve2-aes", SVE2_AES, (SVE2, AES), 
(), (), "sveaes")
 
 AARCH64_FMV_FEATURE("sve2-aes", SVE_AES, (SVE2_AES))
 
-AARCH64_FMV_FEATURE("sve2-pmull128", SVE_PMULL128, (SVE2))
-
 AARCH64_OPT_EXTENSION("sve2-bitperm", SVE2_BITPERM, (SVE2), (), (),
  "svebitperm")
 
@@ -230,8 +198,6 @@ AARCH64_OPT_FMV_EXTENSION("memtag", MEMTAG, (), (), (), "")
 
 AARCH64_FMV_FEATURE("memtag2", MEMTAG2, (MEMTAG))
 
-AARCH64_FMV_FEATURE("memtag3", MEMTAG3, (MEMTAG))
-
 AARCH64_OPT_FMV_EXTENSION("sb", SB, (), (), (), "sb")
 
 AARCH64_OPT_FMV_EXTENSION("predres", PREDRES, (), (), (), "")
@@ -240,8 +206,6 @@ AARCH64_OPT_FMV_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
 
 AARCH64_FMV_FEATURE("ssbs2", SSBS2, (SSBS))
 
-AARCH64_FMV_FEATURE("bti", BTI, ())
-
 AARCH64_OPT_EXTENSION("profile", PROFILE, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("tme", TME, (), (), (), "")
@@ -256,8 +220,6 @@ AARCH64_FMV_FEATURE("ls64_v", LS64_V, ())
 
 AARCH64_FMV_FEATURE("ls64_accdata", LS64_ACCDATA, (LS64))
 
-AARCH64_FMV_FEATURE("wfxt", WFXT, ())
-
 AARCH64_OPT_EXTENSION("sme-f64f64", SME_F64F64, (SME), (), (), "")
 
 AARCH64_FMV_FEATURE("sme-f64f64", SME_F64, (SME_F64F64))


[committed 2/5 v2] aarch64: Fix FMV array iteration bounds

2024-04-11 Thread Andrew Carlotti
There was an assumption in some places that the aarch64_fmv_feature_data
array contained FEAT_MAX elements.  While this assumption held up till
now, it is safer and more flexible to use the array size directly.

Also fix the lower bound in compare_feature_masks to use ">=0" instead
of ">0", and add a test using the features at index 0 and 1. However,
the test already passed, because the earlier popcount check makes it
impossible to reach the loop if the masks differ in exactly one
location.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (compare_feature_masks):
Use ARRAY_SIZE and >=0 for iteration bounds.
(aarch64_mangle_decl_assembler_name): Use ARRAY_SIZE.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-1.C: New test.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
1ea84c8bd7386e399f6ffa3a5e36408cf8831fc6..4f708213551523b4af966ac8521754df5eb6bf3c
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19700,7 +19700,7 @@ aarch64_parse_fmv_features (const char *str, 
aarch64_feature_flags *isa_flags,
   if (len == 0)
return AARCH_PARSE_MISSING_ARG;
 
-  static const int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
+  int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
   int i;
   for (i = 0; i < num_features; i++)
{
@@ -19899,7 +19899,8 @@ compare_feature_masks (aarch64_fmv_feature_mask mask1,
   auto diff_mask = mask1 ^ mask2;
   if (diff_mask == 0ULL)
 return 0;
-  for (int i = FEAT_MAX - 1; i > 0; i--)
+  int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
+  for (int i = num_features - 1; i >= 0; i--)
 {
   auto bit_mask = aarch64_fmv_feature_data[i].feature_mask;
   if (diff_mask & bit_mask)
@@ -19982,7 +19983,8 @@ aarch64_mangle_decl_assembler_name (tree decl, tree id)
 
   name += "._";
 
-  for (int i = 0; i < FEAT_MAX; i++)
+  int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
+  for (int i = 0; i < num_features; i++)
{
  if (feature_mask & aarch64_fmv_feature_data[i].feature_mask)
{
diff --git a/gcc/testsuite/g++.target/aarch64/mv-1.C 
b/gcc/testsuite/g++.target/aarch64/mv-1.C
new file mode 100644
index 
..b4b0e5e3fea0ed481de1918d30cd8248bb4f7ab5
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/mv-1.C
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+__attribute__((target_version("default")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target_version("rng")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target_version("flagm")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target_version("rng+flagm")))
+int foo ()
+{
+  return 1;
+}
+
+int bar()
+{
+  return foo ();
+}
+
+/* Check usage of the first two FMV features, in case of off-by-one errors.  */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.default:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._Mrng:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._MrngMflagm:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._Mflagm:\n" 1 } } */


[committed 1/5 v2] aarch64: Reorder FMV feature priorities

2024-04-11 Thread Andrew Carlotti
Some higher priority FMV features were dependent subsets of lower
priority features.  Fix this, using the new priorities specified in
https://github.com/ARM-software/acle/pull/279.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def: Reorder FMV entries.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Reorder features.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
aa3cd99f791c83c5b15291503f3375a7cf2732cd..0078dd092884a94d2a339b5238b8d19747ff9fa1
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -99,17 +99,17 @@ AARCH64_OPT_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON, 
EXPLICIT_OFF, \
 AARCH64_FMV_FEATURE(NAME, IDENT, (IDENT))
 
 
-AARCH64_OPT_EXTENSION("fp", FP, (), (), (), "fp")
-
-AARCH64_OPT_EXTENSION("simd", SIMD, (FP), (), (), "asimd")
-
 AARCH64_OPT_FMV_EXTENSION("rng", RNG, (), (), (), "rng")
 
 AARCH64_OPT_FMV_EXTENSION("flagm", FLAGM, (), (), (), "flagm")
 
 AARCH64_FMV_FEATURE("flagm2", FLAGM2, (FLAGM))
 
-AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
+AARCH64_OPT_FMV_EXTENSION("lse", LSE, (), (), (), "atomics")
+
+AARCH64_OPT_FMV_EXTENSION("fp", FP, (), (), (), "fp")
+
+AARCH64_OPT_FMV_EXTENSION("simd", SIMD, (FP), (), (), "asimd")
 
 AARCH64_OPT_FMV_EXTENSION("dotprod", DOTPROD, (SIMD), (), (), "asimddp")
 
@@ -121,12 +121,6 @@ AARCH64_OPT_EXTENSION("rdma", RDMA, (), (SIMD), (), 
"asimdrdm")
 
 AARCH64_FMV_FEATURE("rmd", RDM, (RDMA))
 
-AARCH64_OPT_FMV_EXTENSION("lse", LSE, (), (), (), "atomics")
-
-AARCH64_FMV_FEATURE("fp", FP, (FP))
-
-AARCH64_FMV_FEATURE("simd", SIMD, (SIMD))
-
 AARCH64_OPT_FMV_EXTENSION("crc", CRC, (), (), (), "crc32")
 
 AARCH64_FMV_FEATURE("sha1", SHA1, ())
@@ -160,6 +154,8 @@ AARCH64_FMV_FEATURE("fp16", FP16, (F16))
-march=armv8.4-a+nofp16+fp16 enables F16 but not F16FML.  */
 AARCH64_OPT_EXTENSION("fp16fml", F16FML, (), (F16), (), "asimdfhm")
 
+AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
+
 AARCH64_FMV_FEATURE("dit", DIT, ())
 
 AARCH64_FMV_FEATURE("dpb", DPB, ())
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
index 
920e1d65711cbcb77b07441597180c0159ccabf9..1d90e9ec9d971ae0f085fd832099058488c817b8
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch 
armv8-a\+flagm\+dotprod\+rdma\+lse\+crc\+fp16fml\+rcpc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n}
 } } */
+/* { dg-final { scan-assembler {\.arch 
armv8-a\+flagm\+lse\+dotprod\+rdma\+crc\+fp16fml\+rcpc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n}
 } } */
 
 /* Check that an Armv8-A core doesn't fall apart on extensions without midr
values.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
index 
416a29b514ab7599a7092e26e3716ec8a50cc895..17050a0b72c98ecfd87ec5f7f522cce4db9efc16
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch 
armv8-a\+flagm\+dotprod\+rdma\+lse\+crc\+fp16fml\+rcpc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n}
 } } */
+/* { dg-final { scan-assembler {\.arch 
armv8-a\+flagm\+lse\+dotprod\+rdma\+crc\+fp16fml\+rcpc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n}
 } } */
 
 /* Check that an Armv8-A core doesn't fall apart on extensions without midr
values and that it enables optional features.  */


[committed 0/5 v2] aarch64: FMV feature list fixes

2024-04-11 Thread Andrew Carlotti
This includes the following changes from v1:

1/5: Add missing testcase update. (I misread my test diffs, but Linaro
precommit picked this up as well.)
2/5: Address review comments and add a testcase.
5/5: Remove "memtag", "ssbs" and "ls64" instead of renaming them.

Bootstrapped and regression tested as a series, and committed to master.

I'll remove "memtag", "ssbs" and "ls64" from my documentation patch before
committing that.


Re: [PATCH 0/5] aarch64: FMV feature list fixes

2024-04-10 Thread Andrew Carlotti
On Wed, Apr 10, 2024 at 07:51:44PM +0100, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > On Wed, Apr 10, 2024 at 05:42:05PM +0100, Richard Sandiford wrote:
> >> Andrew Carlotti  writes:
> >> > On Tue, Apr 09, 2024 at 04:43:16PM +0100, Richard Sandiford wrote:
> >> >> Andrew Carlotti  writes:
> >> >> > The first three patches are trivial changes to the feature list to 
> >> >> > reflect
> >> >> > recent changes in the ACLE.  Patch 4 removes most of the FMV 
> >> >> > multiversioning
> >> >> > features that don't work at the moment, and should be entirely 
> >> >> > uncontroversial.
> >> >> >
> >> >> > Patch 5 handles the remaining cases, where there's an inconsistency 
> >> >> > in how
> >> >> > features are named in the current FMV specification compared to the 
> >> >> > existing
> >> >> > command line options.  It might be better to instead preserve the 
> >> >> > "memtag2",
> >> >> > "ssbs2" and "ls64_accdata" names for now; I'd be happy to commit 
> >> >> > either
> >> >> > version.
> >> >> 
> >> >> Yeah, I suppose patch 5 leaves things in a somewhat awkward state,
> >> >> since e.g.:
> >> >> 
> >> >> -AARCH64_OPT_FMV_EXTENSION("memtag", MEMTAG, (), (), (), "")
> >> >> +AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "")
> >> >>  
> >> >> -AARCH64_FMV_FEATURE("memtag2", MEMTAG2, (MEMTAG))
> >> >> +AARCH64_FMV_FEATURE("memtag", MEMTAG2, (MEMTAG))
> >> >> 
> >> >> seems to drop "memtag2" and FEAT_MEMTAG, but keep "memtag" and
> >> >> FEAT_MEMTAG2.  Is that right?
> >> >
> >> > That's deliberate. The FEAT_MEMTAG bit in __aarch64_cpu_features is 
> >> > defined to
> >> > match the definition of FEAT_MTE in the architecture, and likewise for
> >> > FEAT_MEMTAG2/FEAT_MTE2.  However, in Binutils the "+memtag" extension 
> >> > enables
> >> > both FEAT_MTE and FEAT_MTE2 instructions (although none of the FEAT_MTE2
> >> > instructions can be generated from GCC without inline assembly).  The FMV
> >> > specification in the ACLE currently uses names "memtag" and "memtag2" 
> >> > that
> >> > match the architecture names, but arguably don't match the command line
> >> > extension names.  I'm advocating for that to change to match the 
> >> > extension
> >> > names in command line options.
> >> 
> >> Hmm, ok.  I agree it makes sense for the user-visible FMV namnes to match
> >> the command line.  But shouldn't __aarch64_cpu_features either (a) use 
> >> exactly
> >> the same names as the architecture or (b) use exactly the same names as the
> >> command-line (mangled where necessary)?  It seems that we're instead
> >> using a third convention that doesn't exactly match the other two.
> >
> > I agree that the name isn't one I would choose now, but I don't think it 
> > matters much that it's inconsistent.
> 
> I kind-of think it does though.  Given...
> 
> >> That is, I can see the rationale for "memtag" => FEAT_MTE2 and
> >> "memtag" => FEAT_MEMTAG.  It just seems odd to have "memtag" => 
> >> FEAT_MEMTAG2
> >> (where MEMTAG2 is an alias of MTE2).
> >> 
> >> How much leeway do we have to change the __aarch64_cpu_features names?
> >> Is it supposed to be a public API (as opposed to ABI)?
> >
> > I think we're designing it to be capable of being a public API, but we 
> > haven't
> > yet made it one.  That's partly why I've kept the enum value names the same 
> > as
> > in LLVM so far.
> 
> ...this, I don't want to sleep-walk into a situation where we have
> one naming convention for the architecture, one for the attributes,
> and a third one for the API.  If we're not in a position to commit
> to a consistent naming scheme for the API by GCC 14 then it might be
> better to remove the FMV features in 5/5 for GCC 14 and revisit in GCC 15.
> 
> A patch to do that is pre-approved if you agree (but please say
> if you don't).

I'm happy to remove those features for GCC 14 (pending agreement on the
attribute names in particular), but I don't think that does anything to solve
the enum names issue.  I'll remove the names from my FMV documentation patch as
well.

> Thanks,
> Richard


Re: [PATCH 0/5] aarch64: FMV feature list fixes

2024-04-10 Thread Andrew Carlotti
On Wed, Apr 10, 2024 at 05:42:05PM +0100, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > On Tue, Apr 09, 2024 at 04:43:16PM +0100, Richard Sandiford wrote:
> >> Andrew Carlotti  writes:
> >> > The first three patches are trivial changes to the feature list to 
> >> > reflect
> >> > recent changes in the ACLE.  Patch 4 removes most of the FMV 
> >> > multiversioning
> >> > features that don't work at the moment, and should be entirely 
> >> > uncontroversial.
> >> >
> >> > Patch 5 handles the remaining cases, where there's an inconsistency in 
> >> > how
> >> > features are named in the current FMV specification compared to the 
> >> > existing
> >> > command line options.  It might be better to instead preserve the 
> >> > "memtag2",
> >> > "ssbs2" and "ls64_accdata" names for now; I'd be happy to commit either
> >> > version.
> >> 
> >> Yeah, I suppose patch 5 leaves things in a somewhat awkward state,
> >> since e.g.:
> >> 
> >> -AARCH64_OPT_FMV_EXTENSION("memtag", MEMTAG, (), (), (), "")
> >> +AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "")
> >>  
> >> -AARCH64_FMV_FEATURE("memtag2", MEMTAG2, (MEMTAG))
> >> +AARCH64_FMV_FEATURE("memtag", MEMTAG2, (MEMTAG))
> >> 
> >> seems to drop "memtag2" and FEAT_MEMTAG, but keep "memtag" and
> >> FEAT_MEMTAG2.  Is that right?
> >
> > That's deliberate. The FEAT_MEMTAG bit in __aarch64_cpu_features is defined 
> > to
> > match the definition of FEAT_MTE in the architecture, and likewise for
> > FEAT_MEMTAG2/FEAT_MTE2.  However, in Binutils the "+memtag" extension 
> > enables
> > both FEAT_MTE and FEAT_MTE2 instructions (although none of the FEAT_MTE2
> > instructions can be generated from GCC without inline assembly).  The FMV
> > specification in the ACLE currently uses names "memtag" and "memtag2" that
> > match the architecture names, but arguably don't match the command line
> > extension names.  I'm advocating for that to change to match the extension
> > names in command line options.
> 
> Hmm, ok.  I agree it makes sense for the user-visible FMV namnes to match
> the command line.  But shouldn't __aarch64_cpu_features either (a) use exactly
> the same names as the architecture or (b) use exactly the same names as the
> command-line (mangled where necessary)?  It seems that we're instead
> using a third convention that doesn't exactly match the other two.

I agree that the name isn't one I would choose now, but I don't think it 
matters much that it's inconsistent.

> That is, I can see the rationale for "memtag" => FEAT_MTE2 and
> "memtag" => FEAT_MEMTAG.  It just seems odd to have "memtag" => FEAT_MEMTAG2
> (where MEMTAG2 is an alias of MTE2).
> 
> How much leeway do we have to change the __aarch64_cpu_features names?
> Is it supposed to be a public API (as opposed to ABI)?

I think we're designing it to be capable of being a public API, but we haven't
yet made it one.  That's partly why I've kept the enum value names the same as
in LLVM so far.

> > The LS64 example is definitely an inconsistency, since GCC uses "+ls64" to
> > enable intrinsics for all of the FEAT_LS64/FEAT_LS64_V/FEAT_LS64_ACCDATA
> > intrinsics.
> 
> Ok, thanks.  If we go for option (a) above then I agree that the ls64
> change is correct.  If we go for option (b) then I suppose it should
> stay as LS64.
> 
> > There were similar issues with "sha1", "pmull" and "sve2-pmull128", but in
> > these cases their presence architecturally is implied by the presence of the
> > features checked for "sha2", "aes" and "sve2-aes" so it's fine to just 
> > delete
> > the ones without command line flags.
> >
> >> Apart from that and the comment on patch 2, the series looks good to me.
> >> 
> >> While rechecking aarch64-option-extensions.def against the ACLE list:
> >> it seems that the .def doesn't treat mops as an FMV feature.  Is that
> >> deliberate?
> >
> > "mops" was added to the ACLE list later, and libgcc doesn't yet support
> > detecting it.  I didn't think it was sensible to add new FMV feature 
> > support at
> > this stage.
> 
> Ah, ok, makes sense.
> 
> Richard


Re: [PATCH 2/5] aarch64: Don't use FEAT_MAX as array length

2024-04-10 Thread Andrew Carlotti
On Tue, Apr 09, 2024 at 04:33:10PM +0100, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > There was an assumption in some places that the aarch64_fmv_feature_data
> > array contained FEAT_MAX elements.  While this assumption held up till
> > now, it is safer and more flexible to use the array size directly.
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.cc (compare_feature_masks):
> > Use ARRAY_SIZE to determine iteration bounds.
> > (aarch64_mangle_decl_assembler_name): Ditto.
> >
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 
> > 1ea84c8bd7386e399f6ffa3a5e36408cf8831fc6..5de842fcc212c78beba1fa99639e79562d718579
> >  100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -19899,7 +19899,8 @@ compare_feature_masks (aarch64_fmv_feature_mask 
> > mask1,
> >auto diff_mask = mask1 ^ mask2;
> >if (diff_mask == 0ULL)
> >  return 0;
> > -  for (int i = FEAT_MAX - 1; i > 0; i--)
> > +  static const int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
> 
> There doesn't seem any need for this to be static (or const).  Same for
> the second hunk.

Agreed - I'll fix that, and the other instance I added in a previous patch.

I originally copied this pattern from my driver-aarch64.c:252, which was added
by Kyrill back in 2015.

> > +  for (int i = num_features - 1; i > 0; i--)
> 
> Pre-existing, but is > 0 rather than >= 0 deliberate?  Shouldn't we look
> at index 0 as well?

That was probably left over from when "default" was handled as part of the
list.  I think a different instance of this mistake was mentioned in a previous
review.  I'll fix this mistake and add a test.

> LGTM otherwise.
> 
> Thanks,
> Richard
> 
> >  {
> >auto bit_mask = aarch64_fmv_feature_data[i].feature_mask;
> >if (diff_mask & bit_mask)
> > @@ -19982,7 +19983,8 @@ aarch64_mangle_decl_assembler_name (tree decl, tree 
> > id)
> >  
> >name += "._";
> >  
> > -  for (int i = 0; i < FEAT_MAX; i++)
> > +  static const int num_features = ARRAY_SIZE 
> > (aarch64_fmv_feature_data);
> > +  for (int i = 0; i < num_features; i++)
> > {
> >   if (feature_mask & aarch64_fmv_feature_data[i].feature_mask)
> > {


Re: [PATCH 0/5] aarch64: FMV feature list fixes

2024-04-10 Thread Andrew Carlotti
On Tue, Apr 09, 2024 at 04:43:16PM +0100, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > The first three patches are trivial changes to the feature list to reflect
> > recent changes in the ACLE.  Patch 4 removes most of the FMV multiversioning
> > features that don't work at the moment, and should be entirely 
> > uncontroversial.
> >
> > Patch 5 handles the remaining cases, where there's an inconsistency in how
> > features are named in the current FMV specification compared to the existing
> > command line options.  It might be better to instead preserve the "memtag2",
> > "ssbs2" and "ls64_accdata" names for now; I'd be happy to commit either
> > version.
> 
> Yeah, I suppose patch 5 leaves things in a somewhat awkward state,
> since e.g.:
> 
> -AARCH64_OPT_FMV_EXTENSION("memtag", MEMTAG, (), (), (), "")
> +AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "")
>  
> -AARCH64_FMV_FEATURE("memtag2", MEMTAG2, (MEMTAG))
> +AARCH64_FMV_FEATURE("memtag", MEMTAG2, (MEMTAG))
> 
> seems to drop "memtag2" and FEAT_MEMTAG, but keep "memtag" and
> FEAT_MEMTAG2.  Is that right?

That's deliberate. The FEAT_MEMTAG bit in __aarch64_cpu_features is defined to
match the definition of FEAT_MTE in the architecture, and likewise for
FEAT_MEMTAG2/FEAT_MTE2.  However, in Binutils the "+memtag" extension enables
both FEAT_MTE and FEAT_MTE2 instructions (although none of the FEAT_MTE2
instructions can be generated from GCC without inline assembly).  The FMV
specification in the ACLE currently uses names "memtag" and "memtag2" that
match the architecture names, but arguably don't match the command line
extension names.  I'm advocating for that to change to match the extension
names in command line options.

The LS64 example is definitely an inconsistency, since GCC uses "+ls64" to
enable intrinsics for all of the FEAT_LS64/FEAT_LS64_V/FEAT_LS64_ACCDATA
intrinsics.

There were similar issues with "sha1", "pmull" and "sve2-pmull128", but in
these cases their presence architecturally is implied by the presence of the
features checked for "sha2", "aes" and "sve2-aes" so it's fine to just delete
the ones without command line flags.

> Apart from that and the comment on patch 2, the series looks good to me.
> 
> While rechecking aarch64-option-extensions.def against the ACLE list:
> it seems that the .def doesn't treat mops as an FMV feature.  Is that
> deliberate?

"mops" was added to the ACLE list later, and libgcc doesn't yet support
detecting it.  I didn't think it was sensible to add new FMV feature support at
this stage.

> Thanks,
> Richard


[PATCH] docs: Update function multiversioning documentation

2024-04-09 Thread Andrew Carlotti
Add target_version attribute to Common Function Attributes and update
target and target_clones documentation.  Move shared detail and examples
to the Function Multiversioning page.  Add target-specific details to
target-specific pages.

---

I've built and checked the info and dvi outputs.  Ok for master?

gcc/ChangeLog:

* doc/extend.texi (Common Function Attributes): Update target
and target_clones documentation, and add target_version.
(AArch64 Function Attributes): Add ACLE reference and list
supported features.
(PowerPC Function Attributes): List supported features.
(x86 Function Attributes): Mention function multiversioning.
(Function Multiversioning): Update, and move shared detail here.


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 
7b54a241a7bfde03ce86571be9486b30bcea6200..78cc7ad2903b61a06b618b82ba7ad52ed42d944a
 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -4178,18 +4178,27 @@ and @option{-Wanalyzer-tainted-size}.
 Multiple target back ends implement the @code{target} attribute
 to specify that a function is to
 be compiled with different target options than specified on the
-command line.  The original target command-line options are ignored.
-One or more strings can be provided as arguments.
-Each string consists of one or more comma-separated suffixes to
-the @code{-m} prefix jointly forming the name of a machine-dependent
-option.  @xref{Submodel Options,,Machine-Dependent Options}.
-
+command line.  One or more strings can be provided as arguments.
+The attribute may override the original target command-line options, or it may
+be combined with them in a target-specific manner.
 The @code{target} attribute can be used for instance to have a function
 compiled with a different ISA (instruction set architecture) than the
-default.  @samp{#pragma GCC target} can be used to specify target-specific
+default.
+
+@samp{#pragma GCC target} can be used to specify target-specific
 options for more than one function.  @xref{Function Specific Option Pragmas},
 for details about the pragma.
 
+On x86, the @code{target} attribute can also be used to create multiple
+versions of a function, compiled with different target-specific options.
+@xref{Function Multiversioning} for more details.
+
+The options supported by the @code{target} attribute are specific to each
+target; refer to @ref{x86 Function Attributes}, @ref{PowerPC Function
+Attributes}, @ref{ARM Function Attributes}, @ref{AArch64 Function Attributes},
+@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
+for details.
+
 For instance, on an x86, you could declare one function with the
 @code{target("sse4.1,arch=core2")} attribute and another with
 @code{target("sse4a,arch=amdfam10")}.  This is equivalent to
@@ -4211,39 +4220,18 @@ multiple options is equivalent to separating the option 
suffixes with
 a comma (@samp{,}) within a single string.  Spaces are not permitted
 within the strings.
 
-The options supported are specific to each target; refer to @ref{x86
-Function Attributes}, @ref{PowerPC Function Attributes},
-@ref{ARM Function Attributes}, @ref{AArch64 Function Attributes},
-@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
-for details.
-
 @cindex @code{target_clones} function attribute
 @item target_clones (@var{options})
 The @code{target_clones} attribute is used to specify that a function
-be cloned into multiple versions compiled with different target options
-than specified on the command line.  The supported options and restrictions
-are the same as for @code{target} attribute.
-
-For instance, on an x86, you could compile a function with
-@code{target_clones("sse4.1,avx")}.  GCC creates two function clones,
-one compiled with @option{-msse4.1} and another with @option{-mavx}.
-
-On a PowerPC, you can compile a function with
-@code{target_clones("cpu=power9,default")}.  GCC will create two
-function clones, one compiled with @option{-mcpu=power9} and another
-with the default options.  GCC must be configured to use GLIBC 2.23 or
-newer in order to use the @code{target_clones} attribute.
-
-It also creates a resolver function (see
-the @code{ifunc} attribute above) that dynamically selects a clone
-suitable for current architecture.  The resolver is created only if there
-is a usage of a function with @code{target_clones} attribute.
-
-Note that any subsequent call of a function without @code{target_clone}
-from a @code{target_clone} caller will not lead to copying
-(target clone) of the called function.
-If you want to enforce such behaviour,
-we recommend declaring the calling function with the @code{flatten} attribute?
+should be cloned into multiple versions compiled with different target options
+than specified on the command line.  @xref{Function Multiversioning} for more
+details.
+
+@cindex @code{target_version} function attribute
+@item target_version (@var{options})
+The 

[PATCH 5/5] aarch64: Combine some FMV features

2024-04-09 Thread Andrew Carlotti
Some architecture features have been combined under a single command
line flag, but have been assigned multiple FMV feature names with the
command line flag name enabling only a subset of these features in
the FMV specification.  Remove the unsupported FMV subfeatures, and
rename the remaining features with the corresponding command line flag
names.  This change is also proposed in the specification:
https://github.com/ARM-software/acle/pull/315

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def:
Combine "memtag2" into "memtag", "ssbs2" into "ssbs", and
"ls64_v and ls64_accdata" into "ls64".


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
54bbf9c41e794786dffd69dd103fcbbca0a49f1f..164ee3b8194396e66a61f43d45c199c523d2e7cf
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -194,17 +194,17 @@ AARCH64_FMV_FEATURE("sve2-sm4", SVE_SM4, (SVE2_SM4))
 
 AARCH64_OPT_FMV_EXTENSION("sme", SME, (BF16, SVE2), (), (), "sme")
 
-AARCH64_OPT_FMV_EXTENSION("memtag", MEMTAG, (), (), (), "")
+AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "")
 
-AARCH64_FMV_FEATURE("memtag2", MEMTAG2, (MEMTAG))
+AARCH64_FMV_FEATURE("memtag", MEMTAG2, (MEMTAG))
 
 AARCH64_OPT_FMV_EXTENSION("sb", SB, (), (), (), "sb")
 
 AARCH64_OPT_FMV_EXTENSION("predres", PREDRES, (), (), (), "")
 
-AARCH64_OPT_FMV_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
+AARCH64_OPT_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
 
-AARCH64_FMV_FEATURE("ssbs2", SSBS2, (SSBS))
+AARCH64_FMV_FEATURE("ssbs", SSBS2, (SSBS))
 
 AARCH64_OPT_EXTENSION("profile", PROFILE, (), (), (), "")
 
@@ -214,11 +214,7 @@ AARCH64_OPT_EXTENSION("pauth", PAUTH, (), (), (), "paca 
pacg")
 
 AARCH64_OPT_EXTENSION("ls64", LS64, (), (), (), "")
 
-AARCH64_FMV_FEATURE("ls64", LS64, ())
-
-AARCH64_FMV_FEATURE("ls64_v", LS64_V, ())
-
-AARCH64_FMV_FEATURE("ls64_accdata", LS64_ACCDATA, (LS64))
+AARCH64_FMV_FEATURE("ls64", LS64_ACCDATA, (LS64))
 
 AARCH64_OPT_EXTENSION("sme-f64f64", SME_F64F64, (SME), (), (), "")
 


[PATCH 3/5] aarch64: Fix typo and make rdma/rdm alias for FMV

2024-04-09 Thread Andrew Carlotti
gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def:
Fix "rmd"->"rdm", and add FMV to "rdma".
* config/aarch64/aarch64.cc (FEAT_RDMA): Define as FEAT_RDM.


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
0078dd092884a94d2a339b5238b8d19747ff9fa1..b7b307b24eadd83a6d083955f5b30814b7212712
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -117,9 +117,10 @@ AARCH64_OPT_FMV_EXTENSION("sm4", SM4, (SIMD), (), (), "sm3 
sm4")
 
 /* An explicit +rdma implies +simd, but +rdma+nosimd still enables scalar
RDMA instructions.  */
-AARCH64_OPT_EXTENSION("rdma", RDMA, (), (SIMD), (), "asimdrdm")
+AARCH64_OPT_FMV_EXTENSION("rdma", RDMA, (), (SIMD), (), "asimdrdm")
 
-AARCH64_FMV_FEATURE("rmd", RDM, (RDMA))
+/* rdm is an alias for rdma.  */
+AARCH64_FMV_FEATURE("rdm", RDM, (RDMA))
 
 AARCH64_OPT_FMV_EXTENSION("crc", CRC, (), (), (), "crc32")
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
5de842fcc212c78beba1fa99639e79562d718579..b5c51b2cf1dd2f15e0ee1ac21d7959507c05c298
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19657,6 +19657,10 @@ typedef struct
 #define AARCH64_FMV_FEATURE(NAME, FEAT_NAME, C) \
   {NAME, 1ULL << FEAT_##FEAT_NAME, ::feature_deps::fmv_deps_##FEAT_NAME},
 
+/* The "rdma" alias uses a different FEAT_NAME to avoid a duplicate
+   feature_deps name.  */
+#define FEAT_RDMA FEAT_RDM
+
 /* FMV features are listed in priority order, to make it easier to sort target
strings.  */
 static aarch64_fmv_feature_datum aarch64_fmv_feature_data[] = {


[PATCH 4/5] aarch64: Remove unsupported FMV features

2024-04-09 Thread Andrew Carlotti
It currently isn't possible to support function multiversioning features
properly in GCC without also enabling the extension in the command line
options (with the exception of features such as "rpres" that do not
require assembler support).  We therefore remove unsupported features
from GCC's list of FMV features.

Some of these features ("fcma", "jscvt", "frintts", "flagm2", "wfxt",
"rcpc2", and perhaps "dpb" and "dpb2") will be added back in the future
once support for the command line option has been added.

The rest of the removed features I have proposed removing from the ACLE
specification as well, since it doesn't seem worthwhile to include support
for them; see the ACLE pull request for more detailed justification:
https://github.com/ARM-software/acle/pull/315

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def:
Remove "flagm2", "sha1", "pmull", "dit", "dpb", "dpb2", "jscvt",
"fcma", "rcpc2", "frintts", "dgh", "ebf16", "sve-bf16",
"sve-ebf16", "sve-i8mm", "sve2-pmull128", "memtag3", "bti" and
"wfxt" entries.


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
b7b307b24eadd83a6d083955f5b30814b7212712..54bbf9c41e794786dffd69dd103fcbbca0a49f1f
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -103,8 +103,6 @@ AARCH64_OPT_FMV_EXTENSION("rng", RNG, (), (), (), "rng")
 
 AARCH64_OPT_FMV_EXTENSION("flagm", FLAGM, (), (), (), "flagm")
 
-AARCH64_FMV_FEATURE("flagm2", FLAGM2, (FLAGM))
-
 AARCH64_OPT_FMV_EXTENSION("lse", LSE, (), (), (), "atomics")
 
 AARCH64_OPT_FMV_EXTENSION("fp", FP, (), (), (), "fp")
@@ -124,16 +122,12 @@ AARCH64_FMV_FEATURE("rdm", RDM, (RDMA))
 
 AARCH64_OPT_FMV_EXTENSION("crc", CRC, (), (), (), "crc32")
 
-AARCH64_FMV_FEATURE("sha1", SHA1, ())
-
 AARCH64_OPT_FMV_EXTENSION("sha2", SHA2, (SIMD), (), (), "sha1 sha2")
 
 AARCH64_FMV_FEATURE("sha3", SHA3, (SHA3))
 
 AARCH64_OPT_FMV_EXTENSION("aes", AES, (SIMD), (), (), "aes")
 
-AARCH64_FMV_FEATURE("pmull", PMULL, ())
-
 /* +nocrypto disables AES, SHA2 and SM4, and anything that depends on them
(such as SHA3 and the SVE2 crypto extensions).  */
 AARCH64_OPT_EXTENSION("crypto", CRYPTO, (AES, SHA2), (), (AES, SHA2, SM4),
@@ -157,44 +151,20 @@ AARCH64_OPT_EXTENSION("fp16fml", F16FML, (), (F16), (), 
"asimdfhm")
 
 AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
 
-AARCH64_FMV_FEATURE("dit", DIT, ())
-
-AARCH64_FMV_FEATURE("dpb", DPB, ())
-
-AARCH64_FMV_FEATURE("dpb2", DPB2, ())
-
-AARCH64_FMV_FEATURE("jscvt", JSCVT, ())
-
-AARCH64_FMV_FEATURE("fcma", FCMA, (SIMD))
-
 AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
 
-AARCH64_FMV_FEATURE("rcpc2", RCPC2, (RCPC))
-
 AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "lrcpc3")
 
-AARCH64_FMV_FEATURE("frintts", FRINTTS, ())
-
-AARCH64_FMV_FEATURE("dgh", DGH, ())
-
 AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
 
 /* An explicit +bf16 implies +simd, but +bf16+nosimd still enables scalar BF16
instructions.  */
 AARCH64_OPT_FMV_EXTENSION("bf16", BF16, (FP), (SIMD), (), "bf16")
 
-AARCH64_FMV_FEATURE("ebf16", EBF16, (BF16))
-
 AARCH64_FMV_FEATURE("rpres", RPRES, ())
 
 AARCH64_OPT_FMV_EXTENSION("sve", SVE, (SIMD, F16), (), (), "sve")
 
-AARCH64_FMV_FEATURE("sve-bf16", SVE_BF16, (SVE, BF16))
-
-AARCH64_FMV_FEATURE("sve-ebf16", SVE_EBF16, (SVE, BF16))
-
-AARCH64_FMV_FEATURE("sve-i8mm", SVE_I8MM, (SVE, I8MM))
-
 AARCH64_OPT_EXTENSION("f32mm", F32MM, (SVE), (), (), "f32mm")
 
 AARCH64_FMV_FEATURE("f32mm", SVE_F32MM, (F32MM))
@@ -209,8 +179,6 @@ AARCH64_OPT_EXTENSION("sve2-aes", SVE2_AES, (SVE2, AES), 
(), (), "sveaes")
 
 AARCH64_FMV_FEATURE("sve2-aes", SVE_AES, (SVE2_AES))
 
-AARCH64_FMV_FEATURE("sve2-pmull128", SVE_PMULL128, (SVE2))
-
 AARCH64_OPT_EXTENSION("sve2-bitperm", SVE2_BITPERM, (SVE2), (), (),
  "svebitperm")
 
@@ -230,8 +198,6 @@ AARCH64_OPT_FMV_EXTENSION("memtag", MEMTAG, (), (), (), "")
 
 AARCH64_FMV_FEATURE("memtag2", MEMTAG2, (MEMTAG))
 
-AARCH64_FMV_FEATURE("memtag3", MEMTAG3, (MEMTAG))
-
 AARCH64_OPT_FMV_EXTENSION("sb", SB, (), (), (), "sb")
 
 AARCH64_OPT_FMV_EXTENSION("predres", PREDRES, (), (), (), "")
@@ -240,8 +206,6 @@ AARCH64_OPT_FMV_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
 
 AARCH64_FMV_FEATURE("ssbs2", SSBS2, (SSBS))
 
-AARCH64_FMV_FEATURE("bti", BTI, ())
-
 AARCH64_OPT_EXTENSION("profile", PROFILE, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("tme", TME, (), (), (), "")
@@ -256,8 +220,6 @@ AARCH64_FMV_FEATURE("ls64_v", LS64_V, ())
 
 AARCH64_FMV_FEATURE("ls64_accdata", LS64_ACCDATA, (LS64))
 
-AARCH64_FMV_FEATURE("wfxt", WFXT, ())
-
 AARCH64_OPT_EXTENSION("sme-f64f64", SME_F64F64, (SME), (), (), "")
 
 AARCH64_FMV_FEATURE("sme-f64f64", SME_F64, (SME_F64F64))


[PATCH 2/5] aarch64: Don't use FEAT_MAX as array length

2024-04-09 Thread Andrew Carlotti
There was an assumption in some places that the aarch64_fmv_feature_data
array contained FEAT_MAX elements.  While this assumption held up till
now, it is safer and more flexible to use the array size directly.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (compare_feature_masks):
Use ARRAY_SIZE to determine iteration bounds.
(aarch64_mangle_decl_assembler_name): Ditto.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
1ea84c8bd7386e399f6ffa3a5e36408cf8831fc6..5de842fcc212c78beba1fa99639e79562d718579
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19899,7 +19899,8 @@ compare_feature_masks (aarch64_fmv_feature_mask mask1,
   auto diff_mask = mask1 ^ mask2;
   if (diff_mask == 0ULL)
 return 0;
-  for (int i = FEAT_MAX - 1; i > 0; i--)
+  static const int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
+  for (int i = num_features - 1; i > 0; i--)
 {
   auto bit_mask = aarch64_fmv_feature_data[i].feature_mask;
   if (diff_mask & bit_mask)
@@ -19982,7 +19983,8 @@ aarch64_mangle_decl_assembler_name (tree decl, tree id)
 
   name += "._";
 
-  for (int i = 0; i < FEAT_MAX; i++)
+  static const int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
+  for (int i = 0; i < num_features; i++)
{
  if (feature_mask & aarch64_fmv_feature_data[i].feature_mask)
{


[PATCH 1/5] aarch64: Reorder FMV feature priorities

2024-04-09 Thread Andrew Carlotti
Some higher priority FMV features were dependent subsets of lower
priority features.  Fix this, using the new priorities specified in
https://github.com/ARM-software/acle/pull/279.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def: Reorder FMV entries.


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
aa3cd99f791c83c5b15291503f3375a7cf2732cd..0078dd092884a94d2a339b5238b8d19747ff9fa1
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -99,17 +99,17 @@ AARCH64_OPT_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON, 
EXPLICIT_OFF, \
 AARCH64_FMV_FEATURE(NAME, IDENT, (IDENT))
 
 
-AARCH64_OPT_EXTENSION("fp", FP, (), (), (), "fp")
-
-AARCH64_OPT_EXTENSION("simd", SIMD, (FP), (), (), "asimd")
-
 AARCH64_OPT_FMV_EXTENSION("rng", RNG, (), (), (), "rng")
 
 AARCH64_OPT_FMV_EXTENSION("flagm", FLAGM, (), (), (), "flagm")
 
 AARCH64_FMV_FEATURE("flagm2", FLAGM2, (FLAGM))
 
-AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
+AARCH64_OPT_FMV_EXTENSION("lse", LSE, (), (), (), "atomics")
+
+AARCH64_OPT_FMV_EXTENSION("fp", FP, (), (), (), "fp")
+
+AARCH64_OPT_FMV_EXTENSION("simd", SIMD, (FP), (), (), "asimd")
 
 AARCH64_OPT_FMV_EXTENSION("dotprod", DOTPROD, (SIMD), (), (), "asimddp")
 
@@ -121,12 +121,6 @@ AARCH64_OPT_EXTENSION("rdma", RDMA, (), (SIMD), (), 
"asimdrdm")
 
 AARCH64_FMV_FEATURE("rmd", RDM, (RDMA))
 
-AARCH64_OPT_FMV_EXTENSION("lse", LSE, (), (), (), "atomics")
-
-AARCH64_FMV_FEATURE("fp", FP, (FP))
-
-AARCH64_FMV_FEATURE("simd", SIMD, (SIMD))
-
 AARCH64_OPT_FMV_EXTENSION("crc", CRC, (), (), (), "crc32")
 
 AARCH64_FMV_FEATURE("sha1", SHA1, ())
@@ -160,6 +154,8 @@ AARCH64_FMV_FEATURE("fp16", FP16, (F16))
-march=armv8.4-a+nofp16+fp16 enables F16 but not F16FML.  */
 AARCH64_OPT_EXTENSION("fp16fml", F16FML, (), (F16), (), "asimdfhm")
 
+AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
+
 AARCH64_FMV_FEATURE("dit", DIT, ())
 
 AARCH64_FMV_FEATURE("dpb", DPB, ())


[PATCH 0/5] aarch64: FMV feature list fixes

2024-04-09 Thread Andrew Carlotti
The first three patches are trivial changes to the feature list to reflect
recent changes in the ACLE.  Patch 4 removes most of the FMV multiversioning
features that don't work at the moment, and should be entirely uncontroversial.

Patch 5 handles the remaining cases, where there's an inconsistency in how
features are named in the current FMV specification compared to the existing
command line options.  It might be better to instead preserve the "memtag2",
"ssbs2" and "ls64_accdata" names for now; I'd be happy to commit either
version.

Bootstrapped and regression tested on aarch64. Ok for master?


[committed] aarch64: Fix function multiversioning mangling

2024-02-06 Thread Andrew Carlotti
It would be neater if the middle end for target_clones used a target
hook for version name mangling, so we only do version name mangling
once.  However, that would require more intrusive refactoring that will
have to wait till Stage 1.

I've made the changes Richard Sandiford requested, and merged the new tests
into this patch. I'd have sent this sooner, but my initial testing failed due
to a broken master.  This is now successfully bootstrapped, regression tested
and pushed to master.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_mangle_decl_assembler_name):
Move before new caller, and add ".default" suffix.
(get_suffixed_assembler_name): New.
(make_resolver_func): Use get_suffixed_assembler_name.
(aarch64_generate_version_dispatcher_body): Redo name mangling.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-symbols1.C: New test.
* g++.target/aarch64/mv-symbols2.C: Ditto.
* g++.target/aarch64/mv-symbols3.C: Ditto.
* g++.target/aarch64/mv-symbols4.C: Ditto.
* g++.target/aarch64/mv-symbols5.C: Ditto.
* g++.target/aarch64/mvc-symbols1.C: Ditto.
* g++.target/aarch64/mvc-symbols2.C: Ditto.
* g++.target/aarch64/mvc-symbols3.C: Ditto.
* g++.target/aarch64/mvc-symbols4.C: Ditto.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
4556b8dd5045cc992f9e392e0dff903267adca0e..356695feb06257a477c72eb359c7628f8ecea963
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19870,6 +19870,62 @@ build_ifunc_arg_type ()
   return pointer_type;
 }
 
+/* Implement TARGET_MANGLE_DECL_ASSEMBLER_NAME, to add function multiversioning
+   suffixes.  */
+
+tree
+aarch64_mangle_decl_assembler_name (tree decl, tree id)
+{
+  /* For function version, add the target suffix to the assembler name.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL
+  && DECL_FUNCTION_VERSIONED (decl))
+{
+  aarch64_fmv_feature_mask feature_mask = get_feature_mask_for_version 
(decl);
+
+  std::string name = IDENTIFIER_POINTER (id);
+
+  /* For the default version, append ".default".  */
+  if (feature_mask == 0ULL)
+   {
+ name += ".default";
+ return get_identifier (name.c_str());
+   }
+
+  name += "._";
+
+  for (int i = 0; i < FEAT_MAX; i++)
+   {
+ if (feature_mask & aarch64_fmv_feature_data[i].feature_mask)
+   {
+ name += "M";
+ name += aarch64_fmv_feature_data[i].name;
+   }
+   }
+
+  if (DECL_ASSEMBLER_NAME_SET_P (decl))
+   SET_DECL_RTL (decl, NULL);
+
+  id = get_identifier (name.c_str());
+}
+  return id;
+}
+
+/* Return an identifier for the base assembler name of a versioned function.
+   This is computed by taking the default version's assembler name, and
+   stripping off the ".default" suffix if it's already been appended.  */
+
+static tree
+get_suffixed_assembler_name (tree default_decl, const char *suffix)
+{
+  std::string name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (default_decl));
+
+  auto size = name.size ();
+  if (size >= 8 && name.compare (size - 8, 8, ".default") == 0)
+name.resize (size - 8);
+  name += suffix;
+  return get_identifier (name.c_str());
+}
+
 /* Make the resolver function decl to dispatch the versions of
a multi-versioned function,  DEFAULT_DECL.  IFUNC_ALIAS_DECL is
ifunc alias that will point to the created resolver.  Create an
@@ -19883,8 +19939,9 @@ make_resolver_func (const tree default_decl,
 {
   tree decl, type, t;
 
-  /* Create resolver function name based on default_decl.  */
-  tree decl_name = clone_function_name (default_decl, "resolver");
+  /* Create resolver function name based on default_decl.  We need to remove an
+ existing ".default" suffix if this has already been appended.  */
+  tree decl_name = get_suffixed_assembler_name (default_decl, ".resolver");
   const char *resolver_name = IDENTIFIER_POINTER (decl_name);
 
   /* The resolver function should have signature
@@ -20231,6 +20288,28 @@ aarch64_generate_version_dispatcher_body (void *node_p)
   dispatch_function_versions (resolver_decl, _ver_vec, _bb);
   cgraph_edge::rebuild_edges ();
   pop_cfun ();
+
+  /* Fix up symbol names.  First we need to obtain the base name, which may
+ have already been mangled.  */
+  tree base_name = get_suffixed_assembler_name (default_ver_decl, "");
+
+  /* We need to redo the version mangling on the non-default versions for the
+ target_clones case.  Redoing the mangling for the target_version case is
+ redundant but does no harm.  We need to skip the default version, because
+ expand_clones will append ".default" later; fortunately that suffix is the
+ one we want anyway.  */
+  for (versn_info = node_version_info->next->next; versn_info;
+   versn_info = versn_info->next)
+{
+  tree version_decl = versn_info->this_node->decl;
+  tree name = 

[PATCH] aarch64: Fix function multiversioning mangling

2024-01-16 Thread Andrew Carlotti
It would be neater if the middle end for target_clones used a target
hook for version name mangling, so we only do version name mangling
once.  However, that would require more intrusive refactoring that will
have to wait till Stage 1.


This patch builds upon the testsuite additions in patch 1/5 of the
previous series. I could commit just the aarch64 tests for now if that's
preferred. Is this version of the fix ok for master?

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(get_suffixed_assembler_name): New.
(make_resolver_func): Use get_suffixed_assembler_name.
(aarch64_mangle_decl_assembler_name): Add ".default" suffix.
(aarch64_generate_version_dispatcher_body): Redo name mangling.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-symbols1.C: Update for mangling fixes.
* g++.target/aarch64/mv-symbols2.C: Ditto.
* g++.target/aarch64/mv-symbols3.C: Ditto.
* g++.target/aarch64/mv-symbols4.C: Ditto.
* g++.target/aarch64/mv-symbols5.C: Ditto.
* g++.target/aarch64/mvc-symbols1.C: Ditto.
* g++.target/aarch64/mvc-symbols2.C: Ditto.
* g++.target/aarch64/mvc-symbols3.C: Ditto.
* g++.target/aarch64/mvc-symbols4.C: Ditto.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
7d1f8c65ce41044d6850262300cf08a23d606617..bf698a2c3bb105375a2be37ca032397161bf4334
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19832,6 +19832,21 @@ build_ifunc_arg_type ()
   return pointer_type;
 }
 
+/* Return an identifier for the base assembler name of a versioned function.
+   This is computed by taking the default version's assembler name, and
+   stripping off the ".default" suffix if it's already been appended.  */
+
+tree get_suffixed_assembler_name (tree default_decl, const char *suffix)
+{
+  std::string name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (default_decl));
+
+  auto size = name.size ();
+  if (size >= 8 && name.compare (size - 8, 8, ".default") == 0)
+name.resize (size - 8);
+  name += suffix;
+  return get_identifier (name.c_str());
+}
+
 /* Make the resolver function decl to dispatch the versions of
a multi-versioned function,  DEFAULT_DECL.  IFUNC_ALIAS_DECL is
ifunc alias that will point to the created resolver.  Create an
@@ -19845,8 +19860,9 @@ make_resolver_func (const tree default_decl,
 {
   tree decl, type, t;
 
-  /* Create resolver function name based on default_decl.  */
-  tree decl_name = clone_function_name (default_decl, "resolver");
+  /* Create resolver function name based on default_decl.  We need to remove an
+ existing ".default" suffix if this has already been appended.  */
+  tree decl_name = get_suffixed_assembler_name (default_decl, ".resolver");
   const char *resolver_name = IDENTIFIER_POINTER (decl_name);
 
   /* The resolver function should have signature
@@ -20137,6 +20153,8 @@ dispatch_function_versions (tree dispatch_decl,
   return 0;
 }
 
+tree aarch64_mangle_decl_assembler_name (tree, tree);
+
 /* Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY.  */
 
 tree
@@ -20193,6 +20211,28 @@ aarch64_generate_version_dispatcher_body (void *node_p)
   dispatch_function_versions (resolver_decl, _ver_vec, _bb);
   cgraph_edge::rebuild_edges ();
   pop_cfun ();
+
+  /* Fix up symbol names.  First we need to obtain the base name, which may
+ have already been mangled.  */
+  tree base_name = get_suffixed_assembler_name (default_ver_decl, "");
+
+  /* We need to redo the version mangling on the non-default versions for the
+ target_clones case.  Redoing the mangling for the target_version case is
+ redundant but does no harm.  We need to skip the default version, because
+ expand_clones will append ".default" later; fortunately that suffix is the
+ one we want anyway.  */
+  for (versn_info = node_version_info->next->next; versn_info;
+   versn_info = versn_info->next)
+{
+  tree version_decl = versn_info->this_node->decl;
+  tree name = aarch64_mangle_decl_assembler_name (version_decl,
+ base_name);
+  symtab->change_decl_assembler_name (version_decl, name);
+}
+
+  /* We also need to use the base name for the ifunc declaration.  */
+  symtab->change_decl_assembler_name (node->decl, base_name);
+
   return resolver_decl;
 }
 
@@ -20317,11 +20357,15 @@ aarch64_mangle_decl_assembler_name (tree decl, tree 
id)
 {
   aarch64_fmv_feature_mask feature_mask = get_feature_mask_for_version 
(decl);
 
-  /* No suffix for the default version.  */
+  std::string name = IDENTIFIER_POINTER (id);
+
+  /* For the default version, append ".default".  */
   if (feature_mask == 0ULL)
-   return id;
+   {
+ name += ".default";
+ return get_identifier (name.c_str());
+   }
 
-  std::string name = IDENTIFIER_POINTER (id);
   name += "._";
 
   for (int i = 0; i < 

Re: [PATCH 2/5] tree: Extend DECL_FUNCTION_VERSIONED to an enum

2024-01-16 Thread Andrew Carlotti
On Mon, Jan 15, 2024 at 01:28:04PM +0100, Richard Biener wrote:
> On Mon, Jan 15, 2024 at 12:27 PM Andrew Carlotti
>  wrote:
> >
> > This allows code to determine why a particular function is
> > multiversioned.  For now, this will primarily be used to preserve
> > existing name mangling quirks when subsequent commits change all
> > function multiversioning name mangling to use explicit target hooks.
> > However, this can also be used in future to allow more of the
> > multiversioning logic to be moved out of target hooks, and to allow
> > targets to simultaneously enable multiversioning with both 'target' and
> > 'target_version' attributes.
> 
> Why does module.cc need to stream the bits?  target_clone runs long
> after the FE finished.  Instead I wonder why LTO doesn't stream the bits
> (tree-streamer-{in,out}.cc)?
>
> You have four states but only mention 'target' and 'target_version', what's 
> the
> states actually?  Can you amend the function_version_source enum
> comment accordingly?

All four states are used, although currently not all within a single target,
and in many places you could also work out whether you're in the
"target_clones" case or the "target"/"target_version" case based upon whether
you're in the frontend code or the target_clone pass.  So perhaps the second
bit can be made redundant in most places, but it's useful in some parts of the
code.

The main benefits of this design are:
- that it allows backend target hooks to distinguish between the different
  causes of function multiversioning without having to create separate hooks,
  or pass an extra argument down the call stack;
- that it allows a backend to choose to support mutltiversioning with both
  "target" and "target_version" attributes (I remember Martin Liška suggested
  supporting "target_version" on x86 as well, and it could provide a way to
  improve how multiversioning works without running into backwards
  compatibility issues for the "target" attribute).

> This looks like stage1 material to me.

I considered it as stage 3 material, because it fixes a bug on aarch64 (where
the existing code didn't comply with the ACLE spec, and would have cause
problems when applied to public symbols).  Admittedly, I did miss the end of
stage 3 by a couple of days; I hadn't realised how early stage 4 began, and
spent the last couple of weeks focussing on Binutils work instead.

However, I've now realised that I can fix this bug entirely within the aarch64
backend (by overwriting the incorrect mangling applied in target-agnostic
code).  I'll send out that patch later today, which should hopefully be more
acceptable at this stage.

Thanks,
Andrew

> Thanks,
> Richard.
> 
> > gcc/ChangeLog:
> >
> > * multiple_target.cc (expand_target_clones): Use new enum value.
> > * tree-core.h (enum function_version_source): New enum.
> > (struct tree_function_decl): Extend versioned_function to two
> > bits.
> >
> > gcc/cp/ChangeLog:
> >
> > * decl.cc (maybe_mark_function_versioned): Use new enum value.
> > (duplicate_decls): Preserve DECL_FUNCTION_VERSIONED enum value.
> > * module.cc (trees_out::core_bools): Use two bits for
> > function_decl.versioned_function.
> > (trees_in::core_bools): Ditto.
> >
> >
> > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > index 
> > b10a72a87bf0a1cabab52c1e4b657bc8a379b91e..527931cd90a0a779a508a096b2623351fd65a2e8
> >  100644
> > --- a/gcc/cp/decl.cc
> > +++ b/gcc/cp/decl.cc
> > @@ -1254,7 +1254,10 @@ maybe_mark_function_versioned (tree decl)
> >  {
> >if (!DECL_FUNCTION_VERSIONED (decl))
> >  {
> > -  DECL_FUNCTION_VERSIONED (decl) = 1;
> > +  if (TARGET_HAS_FMV_TARGET_ATTRIBUTE)
> > +   DECL_FUNCTION_VERSIONED (decl) = FUNCTION_VERSION_TARGET;
> > +  else
> > +   DECL_FUNCTION_VERSIONED (decl) = FUNCTION_VERSION_TARGET_VERSION;
> >/* If DECL_ASSEMBLER_NAME has already been set, re-mangle
> >  to include the version marker.  */
> >if (DECL_ASSEMBLER_NAME_SET_P (decl))
> > @@ -3159,7 +3162,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
> > hiding, bool was_hidden)
> >&& DECL_FUNCTION_VERSIONED (olddecl))
> >  {
> >/* Set the flag for newdecl so that it gets copied to olddecl.  */
> > -  DECL_FUNCTION_VERSIONED (newdecl) = 1;
> > +  DECL_FUNCTION_VERSIONED (newdecl) = DECL_FUNCTION_VERSIONED 
> > (olddecl);
> >/* newdecl will be purged after copying to olddecl and is no longer
> >

[PATCH 1/5] testsuite: Add tests for fmv symbol presence and mangling

2024-01-15 Thread Andrew Carlotti
These tests are not intended to designate "correct" behaviour, but are
instead intended to demonstrate current behaviour, and provide a warning
if subsequent patches might lead to compatibility issues for targets
with existing function multiversioning support.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-symbols1.C: New test.
* g++.target/aarch64/mv-symbols2.C: New test.
* g++.target/aarch64/mv-symbols3.C: New test.
* g++.target/aarch64/mv-symbols4.C: New test.
* g++.target/aarch64/mv-symbols5.C: New test.
* g++.target/aarch64/mvc-symbols1.C: New test.
* g++.target/aarch64/mvc-symbols2.C: New test.
* g++.target/aarch64/mvc-symbols3.C: New test.
* g++.target/aarch64/mvc-symbols4.C: New test.
* g++.target/i386/mv-symbols1.C: New test.
* g++.target/i386/mv-symbols2.C: New test.
* g++.target/i386/mv-symbols3.C: New test.
* g++.target/i386/mv-symbols4.C: New test.
* g++.target/i386/mv-symbols5.C: New test.
* g++.target/i386/mvc-symbols1.C: New test.
* g++.target/i386/mvc-symbols2.C: New test.
* g++.target/i386/mvc-symbols3.C: New test.
* g++.target/i386/mvc-symbols4.C: New test.
* g++.target/powerpc/mvc-symbols1.C: New test.
* g++.target/powerpc/mvc-symbols2.C: New test.
* g++.target/powerpc/mvc-symbols3.C: New test.
* g++.target/powerpc/mvc-symbols4.C: New test.


diff --git a/gcc/testsuite/g++.target/aarch64/mv-symbols1.C 
b/gcc/testsuite/g++.target/aarch64/mv-symbols1.C
new file mode 100644
index 
..afbd9cacfc72e89ff4a06e3baae7ccc63ed64fc0
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/mv-symbols1.C
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target_version("dotprod")))
+int foo ()
+{
+  return 3;
+}
+__attribute__((target_version("sve+sve2")))
+int foo ()
+{
+  return 5;
+}
+
+__attribute__((target_version("sve+sve2")))
+int foo (int)
+{
+  return 6;
+}
+
+__attribute__((target_version("dotprod")))
+int foo (int)
+{
+  return 4;
+}
+
+int foo (int)
+{
+  return 2;
+}
+
+
+int bar()
+{
+  return foo ();
+}
+
+int bar(int x)
+{
+  return foo (x);
+}
+
+/* When updating any of the symbol names in these tests, make sure to also
+   update any tests for their absence in mv-symbolsN.C */
+
+/* { dg-final { scan-assembler-times "\n_Z3foov:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._Mdotprod:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._MsveMsve2:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tbl\t_Z7_Z3foovv\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z7_Z3foovv, 
%gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times 
"\n\t\.set\t_Z7_Z3foovv,_Z3foov\.resolver\n" 1 } } */
+
+/* { dg-final { scan-assembler-times "\n_Z3fooi:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\._Mdotprod:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\._MsveMsve2:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tbl\t_Z7_Z3fooii\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z7_Z3fooii, 
%gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times 
"\n\t\.set\t_Z7_Z3fooii,_Z3fooi\.resolver\n" 1 } } */
diff --git a/gcc/testsuite/g++.target/aarch64/mv-symbols2.C 
b/gcc/testsuite/g++.target/aarch64/mv-symbols2.C
new file mode 100644
index 
..54d2396f40705b6a6f7839ded78dcfddd911f7dd
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/mv-symbols2.C
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+__attribute__((target_version("default")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target_version("dotprod")))
+int foo ()
+{
+  return 3;
+}
+__attribute__((target_version("sve+sve2")))
+int foo ()
+{
+  return 5;
+}
+
+__attribute__((target_version("sve+sve2")))
+int foo (int)
+{
+  return 6;
+}
+
+__attribute__((target_version("dotprod")))
+int foo (int)
+{
+  return 4;
+}
+
+__attribute__((target_version("default")))
+int foo (int)
+{
+  return 2;
+}
+
+/* { dg-final { scan-assembler-times "\n_Z3foov:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._Mdotprod:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\._MsveMsve2:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 0 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z7_Z3foovv, 
%gnu_indirect_function\n" 0 } } */
+/* { dg-final { scan-assembler-times 
"\n\t\.set\t_Z7_Z3foovv,_Z3foov\.resolver\n" 0 } } */
+
+/* { dg-final { scan-assembler-times "\n_Z3fooi:\n" 1 } } */
+/* { dg-final { scan-assembler-times 

[PATCH 4/5] cp: Use get_mangled_id in more places in mangle_decl

2024-01-15 Thread Andrew Carlotti
There's no functional change here, but it makes it clearer that all
three locations should be doing the same thing (aside from changes to
flag_abi_version).

gcc/cp/ChangeLog:

* mangle.cc (mangle_decl): Consistently use get_mangled_id.


diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 
a04bc584586f28cb80d21b5c6d647416aa8843df..9bd684608b9e3378292cdb042184ba603b3d69aa
 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -4503,8 +4503,7 @@ mangle_decl (const tree decl)
return;
 
  flag_abi_version = flag_abi_compat_version;
- id2 = mangle_decl_string (decl);
- id2 = targetm.mangle_decl_assembler_name (decl, id2);
+ id2 = get_mangled_id (decl);
  flag_abi_version = save_ver;
 
  if (id2 != id)
@@ -4519,8 +4518,7 @@ mangle_decl (const tree decl)
  || id2 == NULL_TREE)
{
  flag_abi_version = warn_abi_version;
- id2 = mangle_decl_string (decl);
- id2 = targetm.mangle_decl_assembler_name (decl, id2);
+ id2 = get_mangled_id (decl);
}
  flag_abi_version = save_ver;
 


[PATCH 0/5] Fix fmv mangling for AArch64

2024-01-15 Thread Andrew Carlotti
This patch series should have no functional change besides the mangling of some 
symbol names on AArch64.

Patch 1/5 adds lots of tests to verify that existing mangling behaviour on x86 
and PowerPC is unchanged.

Patch 2/5 extends DECL_FUNCTION_VERSIONED to a 2-bit enum.

Patches 3/5 and 4/5 are trivial refactorings.

Patch 5/5 is the only patch with any functional change, and that should be
minimal.  I've bootstrapped and tested the entire series on both AArch64 and
x86.  I've also run the new x86 and PowerPC tests on a cross-compiler (with a
temporary hack to disable ifunc availability checks) to verify that function
multiversioning still works on those platforms, with the symbol mangling
unchanged.

I'm aware now that we just started of Stage 4, and this isn't actually a
regression, but is this still ok for master?



Some other things I previously tried that I couldn't make work:
- I had hoped to create an explicit target hook for the ifunc symbol name
mangling as well, but it turned out to be rather tricky to replicate the
existing double mangling weirdness for x86 (I didn't work out how to convince
the frontend to apply C++ mangling to the new symbol on-demand without breaking
other things).

- It's also awkward to try to access the base assembler name after applying
function version mangling - this is why I resorted to just reversing the
default version mangling in the AArch64 backend.  I tried delaying function
version mangling until after the resolver was generated, but that led to issues
with duplicate comdat group names from make_decl_one_only.

There may be less hacky solutions or workarounds for these issues, but they
would involve a more substantial refactoring and will have to wait until GCC 15
(or later).


[PATCH 5/5] Add target hook for function version name mangling

2024-01-15 Thread Andrew Carlotti
When using "target" or "target_version" attributes, some parts of the
code assume that the default version has no function-specific mangling
while generating names for the resolver and ifunc.  Since aarch64 now
breaks that assumption, we add an explicit workaround for this issue.

Ideally we'd also use a target hook to generate the ifunc name, but it
turns out to be rather tricky to reproduce the existing x86 double
mangling quirk.

There should be no functional change, except on aarch64 where the
mangling is changed to match the latest proposed spec.

gcc/ChangeLog:

* cgraph.h (create_version_clone_with_body): Update comment.
* cgraphclones.cc: Set assembler name after attaching new
  attributes, and use new target hook.
* config/aarch64/aarch64.cc
(make_resolver_func): Change ifunc and resolver assembler names.
(aarch64_mangle_decl_assembler_name): Rename to ...
(aarch64_mangle_function_version_name): ... this, and adjust
mangling for default version.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): Don't use this hook.
(TARGET_MANGLE_FUNCTION_VERSION_NAME): Use this hook instead.
* config/i386/i386-features.cc
(is_valid_asm_symbol): Copy from multiple_target.cc.
(ix86_mangle_function_version_assembler_name): Rename to ...
(ix86_mangle_function_version_name): ... this, and add different
handling for target clones.
(ix86_mangle_decl_assembler_name): Remove target version mangling.
* config/i386/i386-features.h
(ix86_mangle_function_version_name): New declaration.
* config/i386/i386.cc
(TARGET_MANGLE_FUNCTION_VERSION_NAME): Implement this hook.
* config/rs6000/rs6000.cc
(TARGET_MANGLE_FUNCTION_VERSION_NAME): Implement this hook.
(is_valid_asm_symbol): Copy from multiple_target.cc.
(rs6000_mangle_function_version_name): New hook implementation.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add TARGET_MANGLE_FUNCTION_VERSION_NAME hook.
* multiple_target.cc
(create_dispatcher_calls): Use new target hook for mangling.
(is_valid_asm_symbol): Move helper function to targets.
(create_new_asm_name): Move and inline into target hooks.
(create_target_clone): Use new target hook for mangling, and
pass "target_version" instead of 'name' parameter for dump info.
(expand_target_clones): Use new target hook for name mangling.
* target.def (name): Define mangle_function_version_name hook.

gcc/cp/ChangeLog:

* mangle.cc (get_mangled_id): Call the separate target hook for
  target version magnling.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-symbols1.C: Update for mangling fixes.
* g++.target/aarch64/mv-symbols2.C: Ditto.
* g++.target/aarch64/mv-symbols3.C: Ditto.
* g++.target/aarch64/mv-symbols4.C: Ditto.
* g++.target/aarch64/mv-symbols5.C: Ditto.
* g++.target/aarch64/mvc-symbols1.C: Ditto.
* g++.target/aarch64/mvc-symbols2.C: Ditto.
* g++.target/aarch64/mvc-symbols3.C: Ditto.
* g++.target/aarch64/mvc-symbols4.C: Ditto.


diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 
16e2b2d045767206d5ccf12ee226f92ee10511d9..4150c5ea7fce01f49971134a6f8e47cf4e1533b0
 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1015,8 +1015,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
  that will promote value of the attribute DECL_FUNCTION_SPECIFIC_TARGET
  of the declaration.
 
- If TARGET_VERSION is set true, use clone_function_name to set new names.
- Otherwise, use clone_function_name_numbered.
+ If TARGET_VERSION is set true, use targetm.mangle_function_version_name
+ to set new names.  Otherwise, use clone_function_name_numbered.
 
  Return the new version's cgraph node.  */
   cgraph_node *create_version_clone_with_body
diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
index 
ab9a0fe7ccc5fcf9a0a03363c66016466d39427e..ab8818e7057da3c0bc59f086abcdb5c577d1d935
 100644
--- a/gcc/cgraphclones.cc
+++ b/gcc/cgraphclones.cc
@@ -1033,11 +1033,6 @@ cgraph_node::create_version_clone_with_body
   else
 new_decl = copy_node (old_decl);
 
-  /* Generate a new name for the new version. */
-  tree fnname = (target_version ? clone_function_name (old_decl, suffix)
-   : clone_function_name_numbered (old_decl, suffix));
-  DECL_NAME (new_decl) = fnname;
-  SET_DECL_ASSEMBLER_NAME (new_decl, fnname);
   SET_DECL_RTL (new_decl, NULL);
 
   DECL_VIRTUAL_P (new_decl) = 0;
@@ -1065,6 +1060,18 @@ cgraph_node::create_version_clone_with_body
return NULL;
 }
 
+  /* Generate a new name for the new version.  */
+  tree fnname;
+  if (target_version)
+{
+  fnname = DECL_ASSEMBLER_NAME (old_decl);
+  fnname = targetm.mangle_function_version_name (new_decl, fnname);
+}
+  else
+fnname = 

[PATCH 3/5] Change create_version_clone_with_body parameter name

2024-01-15 Thread Andrew Carlotti
The new name better describes where it is used, and will be more
suitable when subsequent commits make further changes to this function.

gcc/ChangeLog:

* cgraph.h (create_version_clone_with_body): Rename parameter
and change default value.
* cgraphclones.cc: Rename parameter.
* multiple_target.cc (create_target_clone): Update for inverted
boolean parameter.


diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 
b4f028b3f3034056de1050ea1ab93a682197d0e1..16e2b2d045767206d5ccf12ee226f92ee10511d9
 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1015,8 +1015,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
  that will promote value of the attribute DECL_FUNCTION_SPECIFIC_TARGET
  of the declaration.
 
- If VERSION_DECL is set true, use clone_function_name_numbered for the
- function clone.  Otherwise, use clone_function_name.
+ If TARGET_VERSION is set true, use clone_function_name to set new names.
+ Otherwise, use clone_function_name_numbered.
 
  Return the new version's cgraph node.  */
   cgraph_node *create_version_clone_with_body
@@ -1024,7 +1024,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
  vec *tree_map,
  ipa_param_adjustments *param_adjustments,
  bitmap bbs_to_copy, basic_block new_entry_block, const char *clone_name,
- tree target_attributes = NULL_TREE, bool version_decl = true);
+ tree target_attributes = NULL_TREE, bool target_version = false);
 
   /* Insert a new cgraph_function_version_info node into cgraph_fnver_htab
  corresponding to cgraph_node.  */
diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
index 
6d7bc402a29161f473aaa34fb11b24264a7e8b7c..ab9a0fe7ccc5fcf9a0a03363c66016466d39427e
 100644
--- a/gcc/cgraphclones.cc
+++ b/gcc/cgraphclones.cc
@@ -1013,7 +1013,7 @@ cgraph_node::create_version_clone_with_body
vec *tree_map,
ipa_param_adjustments *param_adjustments,
bitmap bbs_to_copy, basic_block new_entry_block, const char *suffix,
-   tree target_attributes, bool version_decl)
+   tree target_attributes, bool target_version)
 {
   tree old_decl = decl;
   cgraph_node *new_version_node = NULL;
@@ -1034,8 +1034,8 @@ cgraph_node::create_version_clone_with_body
 new_decl = copy_node (old_decl);
 
   /* Generate a new name for the new version. */
-  tree fnname = (version_decl ? clone_function_name_numbered (old_decl, suffix)
-   : clone_function_name (old_decl, suffix));
+  tree fnname = (target_version ? clone_function_name (old_decl, suffix)
+   : clone_function_name_numbered (old_decl, suffix));
   DECL_NAME (new_decl) = fnname;
   SET_DECL_ASSEMBLER_NAME (new_decl, fnname);
   SET_DECL_RTL (new_decl, NULL);
diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index 
56a1934fe820e91b2fa451dcf6989382c906b98c..5fa13ee78035924e5dbd2aec1dd05192342c1a59
 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -281,7 +281,7 @@ create_target_clone (cgraph_node *node, bool definition, 
char *name,
 {
   new_node
= node->create_version_clone_with_body (vNULL, NULL, NULL, NULL, NULL,
-   name, attributes, false);
+   name, attributes, true);
   if (new_node == NULL)
return NULL;
   new_node->force_output = true;


[PATCH 2/5] tree: Extend DECL_FUNCTION_VERSIONED to an enum

2024-01-15 Thread Andrew Carlotti
This allows code to determine why a particular function is
multiversioned.  For now, this will primarily be used to preserve
existing name mangling quirks when subsequent commits change all
function multiversioning name mangling to use explicit target hooks.
However, this can also be used in future to allow more of the
multiversioning logic to be moved out of target hooks, and to allow
targets to simultaneously enable multiversioning with both 'target' and
'target_version' attributes.

gcc/ChangeLog:

* multiple_target.cc (expand_target_clones): Use new enum value.
* tree-core.h (enum function_version_source): New enum.
(struct tree_function_decl): Extend versioned_function to two
bits.

gcc/cp/ChangeLog:

* decl.cc (maybe_mark_function_versioned): Use new enum value.
(duplicate_decls): Preserve DECL_FUNCTION_VERSIONED enum value.
* module.cc (trees_out::core_bools): Use two bits for
function_decl.versioned_function.
(trees_in::core_bools): Ditto.


diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 
b10a72a87bf0a1cabab52c1e4b657bc8a379b91e..527931cd90a0a779a508a096b2623351fd65a2e8
 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1254,7 +1254,10 @@ maybe_mark_function_versioned (tree decl)
 {
   if (!DECL_FUNCTION_VERSIONED (decl))
 {
-  DECL_FUNCTION_VERSIONED (decl) = 1;
+  if (TARGET_HAS_FMV_TARGET_ATTRIBUTE)
+   DECL_FUNCTION_VERSIONED (decl) = FUNCTION_VERSION_TARGET;
+  else
+   DECL_FUNCTION_VERSIONED (decl) = FUNCTION_VERSION_TARGET_VERSION;
   /* If DECL_ASSEMBLER_NAME has already been set, re-mangle
 to include the version marker.  */
   if (DECL_ASSEMBLER_NAME_SET_P (decl))
@@ -3159,7 +3162,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool hiding, 
bool was_hidden)
   && DECL_FUNCTION_VERSIONED (olddecl))
 {
   /* Set the flag for newdecl so that it gets copied to olddecl.  */
-  DECL_FUNCTION_VERSIONED (newdecl) = 1;
+  DECL_FUNCTION_VERSIONED (newdecl) = DECL_FUNCTION_VERSIONED (olddecl);
   /* newdecl will be purged after copying to olddecl and is no longer
  a version.  */
   cgraph_node::delete_function_version_by_decl (newdecl);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 
aa75e2809d8fdca14443c6b911bf725f6d286d20..ba60d0753f91ef91d45fb5d62f26118be4e34840
 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5473,7 +5473,11 @@ trees_out::core_bools (tree t)
   WB (t->function_decl.looping_const_or_pure_flag);
 
   WB (t->function_decl.has_debug_args_flag);
-  WB (t->function_decl.versioned_function);
+
+  /* versioned_function is a 2 bit enum.  */
+  unsigned vf = t->function_decl.versioned_function;
+  WB ((vf >> 0) & 1);
+  WB ((vf >> 1) & 1);
 
   /* decl_type is a (misnamed) 2 bit discriminator. */
   unsigned kind = t->function_decl.decl_type;
@@ -5618,7 +5622,12 @@ trees_in::core_bools (tree t)
   RB (t->function_decl.looping_const_or_pure_flag);
   
   RB (t->function_decl.has_debug_args_flag);
-  RB (t->function_decl.versioned_function);
+
+  /* versioned_function is a 2 bit enum.  */
+  unsigned vf = 0;
+  vf |= unsigned (b ()) << 0;
+  vf |= unsigned (b ()) << 1;
+  t->function_decl.versioned_function = function_version_source (vf);
 
   /* decl_type is a (misnamed) 2 bit discriminator. */
   unsigned kind = 0;
diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index 
1fdd279da04a7acc5e8c50f528139f19cadcd5ff..56a1934fe820e91b2fa451dcf6989382c906b98c
 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -383,7 +383,7 @@ expand_target_clones (struct cgraph_node *node, bool 
definition)
   if (decl1_v == NULL)
 decl1_v = node->insert_new_function_version ();
   before = decl1_v;
-  DECL_FUNCTION_VERSIONED (node->decl) = 1;
+  DECL_FUNCTION_VERSIONED (node->decl) = FUNCTION_VERSION_TARGET_CLONES;
 
   for (i = 0; i < attrnum; i++)
 {
@@ -421,7 +421,8 @@ expand_target_clones (struct cgraph_node *node, bool 
definition)
 
   before->next = after;
   after->prev = before;
-  DECL_FUNCTION_VERSIONED (new_node->decl) = 1;
+  DECL_FUNCTION_VERSIONED (new_node->decl)
+   = FUNCTION_VERSION_TARGET_CLONES;
 }
 
   XDELETEVEC (attrs);
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 
8a89462bd7ecac52fcdc11c0b57ccf7c190572b3..e159d53f9d11ba848c49499aa963daa2fbcbc648
 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1955,6 +1955,19 @@ enum function_decl_type
   /* 0 values left */
 };
 
+/* Enumerate function multiversioning attributes.  This is used to record which
+   attribute enabled multiversioning on a function, and allows targets to
+   adjust their behaviour accordingly.  */
+
+enum function_version_source
+{
+  FUNCTION_VERSION_NONE = 0,
+  FUNCTION_VERSION_TARGET = 1,
+  FUNCTION_VERSION_TARGET_CLONES = 2,
+  FUNCTION_VERSION_TARGET_VERSION = 3
+};
+
+
 /* 

[committed v4 5/5] aarch64: Add function multiversioning support

2023-12-15 Thread Andrew Carlotti
This adds initial support for function multiversioning on aarch64 using
the target_version and target_clones attributes.  This loosely follows
the Beta specification in the ACLE [1], although with some differences
that still need to be resolved (possibly as follow-up patches).

Existing function multiversioning implementations are broken in various
ways when used across translation units.  This includes placing
resolvers in the wrong translation units, and using symbol mangling that
callers to unintentionally bypass the resolver in some circumstances.
Fixing these issues for aarch64 will require modifications to our ACLE
specification.  It will also require further adjustments to existing
middle end code, to facilitate different mangling and resolver
placement while preserving existing target behaviours.

The list of function multiversioning features specified in the ACLE is
also inconsistent with the list of features supported in target option
extensions.  I intend to resolve some or all of these inconsistencies at
a later stage.

The target_version attribute is currently only supported in C++, since
this is the only frontend with existing support for multiversioning
using the target attribute.  On the other hand, this patch happens to
enable multiversioning with the target_clones attribute in Ada and D, as
well as the entire C family, using their existing frontend support.

This patch also does not support the following aspects of the Beta
specification:

- The target_clones attribute should allow an implicit unlisted
  "default" version.
- There should be an option to disable function multiversioning at
  compile time.
- Unrecognised target names in a target_clones attribute should be
  ignored (with an optional warning).  This current patch raises an
  error instead.

[1] 
https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning

Committed as approved with the coding convention fix, plus some adjustments to
aarch64-option-extensions.def to accommodate recent changes on master. The
series passed regression testing as a whole post-rebase on aarch64.

gcc/ChangeLog:

* config/aarch64/aarch64-feature-deps.h (fmv_deps_):
Define aarch64_feature_flags mask foreach FMV feature.
* config/aarch64/aarch64-option-extensions.def: Use new macros
to define FMV feature extensions.
* config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
Check for target_version attribute after processing target
attribute.
(aarch64_fmv_feature_data): New.
(aarch64_parse_fmv_features): New.
(aarch64_process_target_version_attr): New.
(aarch64_option_valid_version_attribute_p): New.
(get_feature_mask_for_version): New.
(compare_feature_masks): New.
(aarch64_compare_version_priority): New.
(build_ifunc_arg_type): New.
(make_resolver_func): New.
(add_condition_to_bb): New.
(dispatch_function_versions): New.
(aarch64_generate_version_dispatcher_body): New.
(aarch64_get_function_versions_dispatcher): New.
(aarch64_common_function_versions): New.
(aarch64_mangle_decl_assembler_name): New.
(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
(TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
(TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
(TARGET_COMPARE_VERSION_PRIORITY): New implementation.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
* config/aarch64/aarch64.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE):
Set target macro.
* config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
new value to report duplicate FMV feature.
* common/config/aarch64/cpuinfo.h: New file.

libgcc/ChangeLog:

* config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
copy in gcc/common

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_17.c: Reorder expected flags.
* gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.


diff --git a/gcc/common/config/aarch64/cpuinfo.h 
b/gcc/common/config/aarch64/cpuinfo.h
new file mode 100644
index 

[committed v4 4/5] Add support for target_version attribute

2023-12-15 Thread Andrew Carlotti
This patch adds support for the "target_version" attribute to the middle
end and the C++ frontend, which will be used to implement function
multiversioning in the aarch64 backend.

On targets that don't use the "target" attribute for multiversioning,
there is no conflict between the "target" and "target_clones"
attributes.  This patch therefore makes the mutual exclusion in
C-family, D and Ada conditonal upon the value of the
expanded_clones_attribute target hook.

The "target_version" attribute is only added to C++ in this patch,
because this is currently the only frontend which supports
multiversioning using the "target" attribute.  Support for the
"target_version" attribute will be extended to C at a later date.

Targets that currently use the "target" attribute for function
multiversioning (i.e. i386 and rs6000) are not affected by this patch.

Committed as approved with adjustments to comments in c-attribs.

gcc/ChangeLog:

* attribs.cc (decl_attributes): Pass attribute name to target.
(is_function_default_version): Update comment to specify
incompatibility with target_version attributes.
* cgraphclones.cc (cgraph_node::create_version_clone_with_body):
Call valid_version_attribute_p for target_version attributes.
* defaults.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): New macro.
* target.def (valid_version_attribute_p): New hook.
* doc/tm.texi.in: Add new hook.
* doc/tm.texi: Regenerate.
* multiple_target.cc (create_dispatcher_calls): Remove redundant
is_function_default_version check.
(expand_target_clones): Use target macro to pick attribute name.
* targhooks.cc (default_target_option_valid_version_attribute_p):
New.
* targhooks.h (default_target_option_valid_version_attribute_p):
New.
* tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
target_version attributes.

gcc/c-family/ChangeLog:

* c-attribs.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto, and add target_version.
(attr_target_version_exclusions): New.
(c_common_attribute_table): Add target_version.
(handle_target_version_attribute): New.
(handle_target_attribute): Amend comment.
(handle_target_clones_attribute): Ditto.

gcc/ada/ChangeLog:

* gcc-interface/utils.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto.

gcc/d/ChangeLog:

* d-attribs.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto.

gcc/cp/ChangeLog:

* decl2.cc (check_classfn): Update comment to include
target_version attributes.


diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 
3eabbec6bd34116910a0589b4ebf269b916cc607..17f6afd687d1dbd7648d52d86417414b04c0d896
 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -146,14 +146,16 @@ static const struct attribute_spec::exclusions 
attr_noinline_exclusions[] =
 
 static const struct attribute_spec::exclusions attr_target_exclusions[] =
 {
-  { "target_clones", true, true, true },
+  { "target_clones", TARGET_HAS_FMV_TARGET_ATTRIBUTE,
+TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE },
   { NULL, false, false, false },
 };
 
 static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
 {
   { "always_inline", true, true, true },
-  { "target", true, true, true },
+  { "target", TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE,
+TARGET_HAS_FMV_TARGET_ATTRIBUTE },
   { NULL, false, false, false },
 };
 
diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 
4e313d38f0f0608991c3267f55f43e3f0dd9d74a..0ca2779788569b7a02a79eab4db558df112aff87
 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -675,7 +675,8 @@ decl_attributes (tree *node, tree attributes, int flags,
  options to the attribute((target(...))) list.  */
   if (TREE_CODE (*node) == FUNCTION_DECL
   && current_target_pragma
-  && targetm.target_option.valid_attribute_p (*node, NULL_TREE,
+  && targetm.target_option.valid_attribute_p (*node,
+ get_identifier ("target"),
  current_target_pragma, 0))
 {
   tree cur_attr = lookup_attribute ("target", attributes);
@@ -1276,8 +1277,9 @@ make_dispatcher_decl (const tree decl)
   return func_decl;  
 }
 
-/* Returns true if decl is multi-versioned and DECL is the default function,
-   that is it is not tagged with target specific optimization.  */
+/* Returns true if DECL is multi-versioned using the target attribute, and this
+   is the default version.  This function can only be used for targets that do
+   

[PATCH v2] aarch64: Fix +nopredres, +nols64 and +nomops

2023-12-13 Thread Andrew Carlotti
On Sat, Dec 09, 2023 at 07:22:49PM +, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > ...
> 
> This is the only use of native_detect_p, so it'd be good to remove
> the field itself.

Done
 
> > ...
> >
> > @@ -447,6 +451,13 @@ host_detect_local_cpu (int argc, const char **argv)
> >if (tune)
> >  return res;
> >  
> > +  if (!processed_exts)
> > +goto not_found;
> 
> Could you explain this part?  It seems like more of a parsing change
> (i.e. being more strict about what we accept).
> 
> If that's the intention, it probably belongs in:
> 
>   if (n_cores == 0
>   || n_cores > 2
>   || (n_cores == 1 && n_variants != 1)
>   || imp == INVALID_IMP)
> goto not_found;
> 
> But maybe it should be a separate patch.

I added it because I realised that the parsing behaviour didn't make sense in
that case, and my patch happens to change the behaviour as well (the outcome
without the check would be no enabled features, whereas previously it would
enable only the features with no native detection).

I agree that it makes sense to put it with the original check, so I've made 
that change.

> Looks good otherwise, thanks.
> 
> Richard

New patch version below, ok for master?

---

For native cpu feature detection, certain features have no entry in
/proc/cpuinfo, so have to be assumed to be present whenever the detected
cpu is supposed to support that feature.

However, the logic for this was mistakenly implemented by excluding
these features from part of aarch64_get_extension_string_for_isa_flags.
This function is also used elsewhere when canonicalising explicit
feature sets, which may require removing features that are normally
implied by the specified architecture version.

This change reenables generation of +nopredres, +nols64 and +nomops
during canonicalisation, by relocating the misplaced native cpu
detection logic.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(struct aarch64_option_extension): Remove unused field.
(all_extensions): Ditto.
(aarch64_get_extension_string_for_isa_flags): Remove filtering
of features without native detection.
* config/aarch64/driver-aarch64.cc (host_detect_local_cpu):
Explicitly add expected features that lack cpuinfo detection.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_28.c: New test.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
c2a6d357c0bc17996a25ea5c3a40f69d745c7931..4d0431d3a2cad5414790646bce0c09877c0366b2
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -149,9 +149,6 @@ struct aarch64_option_extension
   aarch64_feature_flags flags_on;
   /* If this feature is turned off, these bits also need to be turned off.  */
   aarch64_feature_flags flags_off;
-  /* Indicates whether this feature is taken into account during native cpu
- detection.  */
-  bool native_detect_p;
 };
 
 /* ISA extensions in AArch64.  */
@@ -159,10 +156,9 @@ static constexpr aarch64_option_extension all_extensions[] 
=
 {
 #define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, FEATURE_STRING) \
   {NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \
-   feature_deps::get_flags_off (feature_deps::root_off_##IDENT), \
-   FEATURE_STRING[0]},
+   feature_deps::get_flags_off (feature_deps::root_off_##IDENT)},
 #include "config/aarch64/aarch64-option-extensions.def"
-  {NULL, 0, 0, 0, false}
+  {NULL, 0, 0, 0}
 };
 
 struct processor_name_to_arch
@@ -358,8 +354,7 @@ aarch64_get_extension_string_for_isa_flags
/* If either crypto flag needs removing here, then both do.  */
flags = flags_crypto;
 
-  if (opt.native_detect_p
- && (flags & current_flags & ~isa_flags))
+  if (flags & current_flags & ~isa_flags)
{
  current_flags &= ~opt.flags_off;
  outstr += "+no";
diff --git a/gcc/config/aarch64/driver-aarch64.cc 
b/gcc/config/aarch64/driver-aarch64.cc
index 
8e318892b10aa2288421fad418844744a2f5a3b4..c18f065aa41e7328d71b45a53c82a3b703ae44d5
 100644
--- a/gcc/config/aarch64/driver-aarch64.cc
+++ b/gcc/config/aarch64/driver-aarch64.cc
@@ -262,6 +262,7 @@ host_detect_local_cpu (int argc, const char **argv)
   unsigned int n_variants = 0;
   bool processed_exts = false;
   aarch64_feature_flags extension_flags = 0;
+  aarch64_feature_flags unchecked_extension_flags = 0;
   aarch64_feature_flags default_flags = 0;
   std::string buf;
   size_t sep_pos = -1;
@@ -348,7 +349,10 @@ host_detect_local_cpu (int argc, const char **argv)
  /* If the feature contains no HWCAPS string then ignore it for the
 auto detection.  */
  if (val.empty ())
-   continue;
+   {
+  

[PATCH v2] aarch64: Fix +nocrypto handling

2023-12-13 Thread Andrew Carlotti
Additionally, replace all checks for the AARCH64_FL_CRYPTO bit with
checks for (AARCH64_FL_AES | AARCH64_FL_SHA2) instead.  The value of the
AARCH64_FL_CRYPTO bit within isa_flags is now ignored, but it is
retained because removing it would make processing the data in
option-extensions.def significantly more complex.

This bug should have been picked up by an existing test, but a missing
newline meant that the pattern incorrectly allowed "+crypto+nocrypto".

Ok for master?

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_get_extension_string_for_isa_flags): Fix generation of
the "+nocrypto" extension.
* config/aarch64/aarch64.h (AARCH64_ISA_CRYPTO): Remove.
(TARGET_CRYPTO): Remove.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Don't use TARGET_CRYPTO.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_4.c: Add terminating newline.
* gcc.target/aarch64/options_set_27.c: New test.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
8fb901029ec2980a048177586b84201b3b398f9e..c2a6d357c0bc17996a25ea5c3a40f69d745c7931
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -311,6 +311,7 @@ aarch64_get_extension_string_for_isa_flags
  But in order to make the output more readable, it seems better
  to add the strings in definition order.  */
   aarch64_feature_flags added = 0;
+  auto flags_crypto = AARCH64_FL_AES | AARCH64_FL_SHA2;
   for (unsigned int i = ARRAY_SIZE (all_extensions); i-- > 0; )
 {
   auto  = all_extensions[i];
@@ -320,7 +321,7 @@ aarch64_get_extension_string_for_isa_flags
 per-feature crypto flags.  */
   auto flags = opt.flag_canonical;
   if (flags == AARCH64_FL_CRYPTO)
-   flags = AARCH64_FL_AES | AARCH64_FL_SHA2;
+   flags = flags_crypto;
 
   if ((flags & isa_flags & (explicit_flags | ~current_flags)) == flags)
{
@@ -339,14 +340,32 @@ aarch64_get_extension_string_for_isa_flags
  not have an HWCAPs then it shouldn't be taken into account for feature
  detection because one way or another we can't tell if it's available
  or not.  */
+
   for (auto  : all_extensions)
-if (opt.native_detect_p
-   && (opt.flag_canonical & current_flags & ~isa_flags))
-  {
-   current_flags &= ~opt.flags_off;
-   outstr += "+no";
-   outstr += opt.name;
-  }
+{
+  auto flags = opt.flag_canonical;
+  /* As a special case, don't emit "+noaes" or "+nosha2" when we could emit
+"+nocrypto" instead, in order to support assemblers that predate the
+separate per-feature crypto flags.  Only allow "+nocrypto" when "sm4"
+is not already enabled (to avoid dependending on whether "+nocrypto"
+also disables "sm4").  */
+  if (flags & flags_crypto
+ && (flags_crypto & current_flags & ~isa_flags) == flags_crypto
+ && !(current_flags & AARCH64_FL_SM4))
+ continue;
+
+  if (flags == AARCH64_FL_CRYPTO)
+   /* If either crypto flag needs removing here, then both do.  */
+   flags = flags_crypto;
+
+  if (opt.native_detect_p
+ && (flags & current_flags & ~isa_flags))
+   {
+ current_flags &= ~opt.flags_off;
+ outstr += "+no";
+ outstr += opt.name;
+   }
+}
 
   return outstr;
 }
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
115a2a8b7568c43a712d819e03147ff84ff182c0..cdc4e453a2054b1a1d2c70bf0b528e497ae0b9ad
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -188,7 +188,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_ILP32, "_ILP32", pfile);
   aarch64_def_or_undef (TARGET_ILP32, "__ILP32__", pfile);
 
-  aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile);
+  aarch64_def_or_undef (TARGET_AES && TARGET_SHA2, "__ARM_FEATURE_CRYPTO", 
pfile);
   aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
   aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE", pfile);
   cpp_undef (pfile, "__ARM_FEATURE_SVE_BITS");
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
2cd0bc552ebadac06a2838ae2767852c036d0db4..501bb7478a0755fa76c488ec03dcfab6c272851c
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -204,7 +204,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 
 #endif
 
-/* Macros to test ISA flags.  */
+/* Macros to test ISA flags.
+
+   There is intentionally no macro for AARCH64_FL_CRYPTO, since this flag bit
+   is not always set when its constituent features are present.
+   Check (TARGET_AES && TARGET_SHA2) instead.  */
 
 #define AARCH64_ISA_SM_OFF (aarch64_isa_flags & AARCH64_FL_SM_OFF)
 #define AARCH64_ISA_SM_ON  (aarch64_isa_flags & 

[committed v2] aarch64 testsuite: Check entire .arch string

2023-12-13 Thread Andrew Carlotti
Add a terminating newline to various tests, and add missing
extensions to some test strings.  The current output is broken for
options_set_4.c, so this test is left unchanged, to be fixed in a
subsequent patch.

Committed as obvious, with options_set_4.c removed compared to v1.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_18.c: Add \+nopauth\n
* gcc.target/aarch64/options_set_7.c: Add \+crc\n
* gcc.target/aarch64/options_set_8.c: Add \+crc\+nodotprod\n
* gcc.target/aarch64/cpunative/native_cpu_0.c: Add \n
* gcc.target/aarch64/cpunative/native_cpu_1.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_2.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_3.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_4.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_5.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_8.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_9.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_10.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_11.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_12.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_14.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_15.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/options_set_1.c: Ditto.
* gcc.target/aarch64/options_set_2.c: Ditto.
* gcc.target/aarch64/options_set_3.c: Ditto.
* gcc.target/aarch64/options_set_5.c: Ditto.
* gcc.target/aarch64/options_set_6.c: Ditto.
* gcc.target/aarch64/options_set_9.c: Ditto.
* gcc.target/aarch64/options_set_11.c: Ditto.
* gcc.target/aarch64/options_set_12.c: Ditto.
* gcc.target/aarch64/options_set_13.c: Ditto.
* gcc.target/aarch64/options_set_14.c: Ditto.
* gcc.target/aarch64/options_set_15.c: Ditto.
* gcc.target/aarch64/options_set_16.c: Ditto.
* gcc.target/aarch64/options_set_17.c: Ditto.
* gcc.target/aarch64/options_set_18.c: Ditto.
* gcc.target/aarch64/options_set_19.c: Ditto.
* gcc.target/aarch64/options_set_20.c: Ditto.
* gcc.target/aarch64/options_set_21.c: Ditto.
* gcc.target/aarch64/options_set_22.c: Ditto.
* gcc.target/aarch64/options_set_23.c: Ditto.
* gcc.target/aarch64/options_set_24.c: Ditto.
* gcc.target/aarch64/options_set_25.c: Ditto.
* gcc.target/aarch64/options_set_26.c: Ditto.


diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
index 
8499f87c39b173491a89626af56f4e193b1d12b5..fb5a7a18ad1a2d09ac4b231150a1bd9e72d6fab6
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto\n} } } */
 
 /* Test a normal looking procinfo.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c
index 
2cf0e89994b1cc0dc9fac67f4dc431c003498048..cb50e3b73057994432cc3ed15e3d5b57c7a3cb7b
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+nosimd} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+nosimd\n} } } */
 
 /* Test one where fp is on by default so turn off simd.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c
index 
ddb06b8227576807fe068b76dabed91a0223e4fa..6a524bad371c55fc32698ff0994f4ad431be49ca
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+nofp} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+nofp\n} } } */
 
 /* Test one with no entry in feature list.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c
index 
96b9ca434ebbf007ddaa45d55a8c2b8e7a19a715..644f4792275bdd32a9f84241f0c329b046cbd909
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 

[committed v2] aarch64: Add missing driver-aarch64 dependencies

2023-12-13 Thread Andrew Carlotti
On Sat, Dec 09, 2023 at 06:42:17PM +, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> The .def files are included in TM_H by:
> 
> TM_H += $(srcdir)/config/aarch64/aarch64-fusion-pairs.def \
>   $(srcdir)/config/aarch64/aarch64-tuning-flags.def \
>   $(srcdir)/config/aarch64/aarch64-option-extensions.def \
>   $(srcdir)/config/aarch64/aarch64-cores.def \
>   $(srcdir)/config/aarch64/aarch64-isa-modes.def \
>   $(srcdir)/config/aarch64/aarch64-arches.def

They are included now, but only because you added them last week.

I've removed them in v2 of the patch, committed as below:

---

gcc/ChangeLog:

* config/aarch64/x-aarch64: Add missing dependencies.


diff --git a/gcc/config/aarch64/x-aarch64 b/gcc/config/aarch64/x-aarch64
index 
3cf701a0a01ab00eaaafdfad14bd90ebbb1d498f..ee828c9af53a11885c2bcef8f112c0ebaf161c59
 100644
--- a/gcc/config/aarch64/x-aarch64
+++ b/gcc/config/aarch64/x-aarch64
@@ -1,3 +1,5 @@
 driver-aarch64.o: $(srcdir)/config/aarch64/driver-aarch64.cc \
-  $(CONFIG_H) $(SYSTEM_H)
+  $(CONFIG_H) $(SYSTEM_H) $(TM_H) $(CORETYPES_H) \
+  $(srcdir)/config/aarch64/aarch64-protos.h \
+  $(srcdir)/config/aarch64/aarch64-feature-deps.h
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<


[committed] aarch64 testsuite: Only run aarch64-ssve tests once

2023-12-13 Thread Andrew Carlotti
Results verified by running
`RUNTESTFLAGS="aarch64-ssve.exp=*" make -k -j 56 check-gcc`
before and after the change.  I initally spotted the issue because the tests
were being run a nondeterministic number of time during unrelated regresison
testing.

Committed as obvious.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/sve/aarch64-ssve.exp:


diff --git a/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp 
b/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp
index 
d6a5a561a33ea98d7363af0cfa4d73955baabd1b..98242a97b46e9793f34a26f4365a3d1f39d58da5
 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp
+++ b/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp
@@ -27,6 +27,10 @@ if {![istarget aarch64*-*-*] } {
 
 load_lib gcc-defs.exp
 
+if ![gcc_parallel_test_run_p aarch64-ssve] {
+  return
+}
+
 gcc_parallel_test_enable 0
 
 # Code shared by all tests.


[PATCH v3 5/5] aarch64: Add function multiversioning support

2023-12-06 Thread Andrew Carlotti
This adds initial support for function multiversioning on aarch64 using
the target_version and target_clones attributes.  This loosely follows
the Beta specification in the ACLE [1], although with some differences
that still need to be resolved (possibly as follow-up patches).

Existing function multiversioning implementations are broken in various
ways when used across translation units.  This includes placing
resolvers in the wrong translation units, and using symbol mangling that
callers to unintentionally bypass the resolver in some circumstances.
Fixing these issues for aarch64 will require modifications to our ACLE
specification.  It will also require further adjustments to existing
middle end code, to facilitate different mangling and resolver
placement while preserving existing target behaviours.

The list of function multiversioning features specified in the ACLE is
also inconsistent with the list of features supported in target option
extensions.  I intend to resolve some or all of these inconsistencies at
a later stage.

The target_version attribute is currently only supported in C++, since
this is the only frontend with existing support for multiversioning
using the target attribute.  On the other hand, this patch happens to
enable multiversioning with the target_clones attribute in Ada and D, as
well as the entire C family, using their existing frontend support.

This patch also does not support the following aspects of the Beta
specification:

- The target_clones attribute should allow an implicit unlisted
  "default" version.
- There should be an option to disable function multiversioning at
  compile time.
- Unrecognised target names in a target_clones attribute should be
  ignored (with an optional warning).  This current patch raises an
  error instead.

[1] 
https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning

---

I believe the support present in this patch correctly handles function
multiversioning within a single translation unit for all features in the ACLE
specification with option extension support.

Is it ok to push this patch in its current state? I'd then continue working on
incremental improvements to the supported feature extensions and the ABI issues
in followup patches, along with corresponding changes and improvements to the
ACLE specification.


gcc/ChangeLog:

* config/aarch64/aarch64-feature-deps.h (fmv_deps_):
Define aarch64_feature_flags mask foreach FMV feature.
* config/aarch64/aarch64-option-extensions.def: Use new macros
to define FMV feature extensions.
* config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
Check for target_version attribute after processing target
attribute.
(aarch64_fmv_feature_data): New.
(aarch64_parse_fmv_features): New.
(aarch64_process_target_version_attr): New.
(aarch64_option_valid_version_attribute_p): New.
(get_feature_mask_for_version): New.
(compare_feature_masks): New.
(aarch64_compare_version_priority): New.
(build_ifunc_arg_type): New.
(make_resolver_func): New.
(add_condition_to_bb): New.
(dispatch_function_versions): New.
(aarch64_generate_version_dispatcher_body): New.
(aarch64_get_function_versions_dispatcher): New.
(aarch64_common_function_versions): New.
(aarch64_mangle_decl_assembler_name): New.
(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
(TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
(TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
(TARGET_COMPARE_VERSION_PRIORITY): New implementation.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
* config/aarch64/aarch64.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE):
Set target macro.
* config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
  new value to report duplicate FMV feature.
* common/config/aarch64/cpuinfo.h: New file.

libgcc/ChangeLog:

* config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
  copy in gcc/common

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_17.c: Reorder expected flags.
* gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
* 

[PATCH v3 4/5] Add support for target_version attribute

2023-12-06 Thread Andrew Carlotti
This patch adds support for the "target_version" attribute to the middle
end and the C++ frontend, which will be used to implement function
multiversioning in the aarch64 backend.

On targets that don't use the "target" attribute for multiversioning,
there is no conflict between the "target" and "target_clones"
attributes.  This patch therefore makes the mutual exclusion in
C-family, D and Ada conditonal upon the value of the
expanded_clones_attribute target hook.

The "target_version" attribute is only added to C++ in this patch,
because this is currently the only frontend which supports
multiversioning using the "target" attribute.  Support for the
"target_version" attribute will be extended to C at a later date.

Targets that currently use the "target" attribute for function
multiversioning (i.e. i386 and rs6000) are not affected by this patch.

Ok for master?

gcc/ChangeLog:

* attribs.cc (decl_attributes): Pass attribute name to target.
(is_function_default_version): Update comment to specify
incompatibility with target_version attributes.
* cgraphclones.cc (cgraph_node::create_version_clone_with_body):
Call valid_version_attribute_p for target_version attributes.
* defaults.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): New macro.
* target.def (valid_version_attribute_p): New hook.
* doc/tm.texi.in: Add new hook.
* doc/tm.texi: Regenerate.
* multiple_target.cc (create_dispatcher_calls): Remove redundant
is_function_default_version check.
(expand_target_clones): Use target macro to pick attribute name.
* targhooks.cc (default_target_option_valid_version_attribute_p):
New.
* targhooks.h (default_target_option_valid_version_attribute_p):
New.
* tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
target_version attributes.

gcc/c-family/ChangeLog:

* c-attribs.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto, and add target_version.
(attr_target_version_exclusions): New.
(c_common_attribute_table): Add target_version.
(handle_target_version_attribute): New.

gcc/ada/ChangeLog:

* gcc-interface/utils.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto.

gcc/d/ChangeLog:

* d-attribs.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto.

gcc/cp/ChangeLog:

* decl2.cc (check_classfn): Update comment to include
target_version attributes.


diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 
f2c504ddf8d3df11abe81aec695c9eea0b39da6c..5d946c33b212c5ea50e7a73524e8c1d062280956
 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -145,14 +145,16 @@ static const struct attribute_spec::exclusions 
attr_noinline_exclusions[] =
 
 static const struct attribute_spec::exclusions attr_target_exclusions[] =
 {
-  { "target_clones", true, true, true },
+  { "target_clones", TARGET_HAS_FMV_TARGET_ATTRIBUTE,
+TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE },
   { NULL, false, false, false },
 };
 
 static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
 {
   { "always_inline", true, true, true },
-  { "target", true, true, true },
+  { "target", TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE,
+TARGET_HAS_FMV_TARGET_ATTRIBUTE },
   { NULL, false, false, false },
 };
 
diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 
c7209c26acc9faf699774b0ef669ec6748b9073d..19cccf2d7ca4fdd6a46a01884393c6779333dbc5
 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -657,7 +657,8 @@ decl_attributes (tree *node, tree attributes, int flags,
  options to the attribute((target(...))) list.  */
   if (TREE_CODE (*node) == FUNCTION_DECL
   && current_target_pragma
-  && targetm.target_option.valid_attribute_p (*node, NULL_TREE,
+  && targetm.target_option.valid_attribute_p (*node,
+ get_identifier ("target"),
  current_target_pragma, 0))
 {
   tree cur_attr = lookup_attribute ("target", attributes);
@@ -1241,8 +1242,9 @@ make_dispatcher_decl (const tree decl)
   return func_decl;  
 }
 
-/* Returns true if decl is multi-versioned and DECL is the default function,
-   that is it is not tagged with target specific optimization.  */
+/* Returns true if DECL is multi-versioned using the target attribute, and this
+   is the default version.  This function can only be used for targets that do
+   not support the "target_version" attribute.  */
 
 bool
 is_function_default_version (const tree decl)
diff --git a/gcc/c-family/c-attribs.cc 

[PATCH v3 3/5] ada: Improve attribute exclusion handling

2023-12-06 Thread Andrew Carlotti
Change the handling of some attribute mutual exclusions to use the
generic attribute exclusion lists, and fix some asymmetric exclusions by
adding the exclusions for always_inline after noinline or target_clones.

Aside from the new always_inline exclusions, the only change is
functionality is the choice of warning message displayed.  All warnings
about attribute mutual exclusions now use the same message.

gcc/ada/ChangeLog:

* gcc-interface/utils.cc (attr_noinline_exclusions): New.
(attr_always_inline_exclusions): Ditto.
(attr_target_exclusions): Ditto.
(attr_target_clones_exclusions): Ditto.
(gnat_internal_attribute_table): Add new exclusion lists.
(handle_noinline_attribute): Remove custom exclusion handling.
(handle_target_attribute): Ditto.
(handle_target_clones_attribute): Ditto.


diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 
e7b5c7783b1f1c702130c8879c79b7e329764b09..f2c504ddf8d3df11abe81aec695c9eea0b39da6c
 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -130,6 +130,32 @@ static const struct attribute_spec::exclusions 
attr_stack_protect_exclusions[] =
   { NULL, false, false, false },
 };
 
+static const struct attribute_spec::exclusions attr_always_inline_exclusions[] 
=
+{
+  { "noinline", true, true, true },
+  { "target_clones", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
+{
+  { "always_inline", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_target_exclusions[] =
+{
+  { "target_clones", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
+{
+  { "always_inline", true, true, true },
+  { "target", true, true, true },
+  { NULL, false, false, false },
+};
+
 /* Fake handler for attributes we don't properly support, typically because
they'd require dragging a lot of the common-c front-end circuitry.  */
 static tree fake_attribute_handler (tree *, tree, tree, int, bool *);
@@ -165,7 +191,7 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "strub",   0, 1, false, true, false, true,
 handle_strub_attribute, NULL },
   { "noinline", 0, 0,  true,  false, false, false,
-handle_noinline_attribute, NULL },
+handle_noinline_attribute, attr_noinline_exclusions },
   { "noclone",  0, 0,  true,  false, false, false,
 handle_noclone_attribute, NULL },
   { "no_icf",   0, 0,  true,  false, false, false,
@@ -175,7 +201,7 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "leaf", 0, 0,  true,  false, false, false,
 handle_leaf_attribute, NULL },
   { "always_inline",0, 0,  true,  false, false, false,
-handle_always_inline_attribute, NULL },
+handle_always_inline_attribute, attr_always_inline_exclusions },
   { "malloc",   0, 0,  true,  false, false, false,
 handle_malloc_attribute, NULL },
   { "type generic", 0, 0,  false, true,  true,  false,
@@ -192,9 +218,9 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "simd", 0, 1,  true,  false, false, false,
 handle_simd_attribute, NULL },
   { "target",   1, -1, true,  false, false, false,
-handle_target_attribute, NULL },
+handle_target_attribute, attr_target_exclusions },
   { "target_clones",1, -1, true,  false, false, false,
-handle_target_clones_attribute, NULL },
+handle_target_clones_attribute, attr_target_clones_exclusions },
 
   { "vector_size",  1, 1,  false, true,  false, false,
 handle_vector_size_attribute, NULL },
@@ -6755,16 +6781,7 @@ handle_noinline_attribute (tree *node, tree name,
   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
-{
-  if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
-   {
- warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with attribute %qs", name, "always_inline");
- *no_add_attrs = true;
-   }
-  else
-   DECL_UNINLINABLE (*node) = 1;
-}
+DECL_UNINLINABLE (*node) = 1;
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -7063,12 +7080,6 @@ handle_target_attribute (tree *node, tree name, tree 
args, int flags,
   warning (OPT_Wattributes, "%qE attribute ignored", name);
   *no_add_attrs = true;
 }
-  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
-{
-  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with %qs attribute", name, "target_clones");
-  *no_add_attrs = true;
-}
   else if (!targetm.target_option.valid_attribute_p (*node, name, args, flags))
 *no_add_attrs = true;
 
@@ -7096,23 +7107,8 @@ 

[PATCH v3 2/5] c-family: Simplify attribute exclusion handling

2023-12-06 Thread Andrew Carlotti
This patch changes the handling of mutual exclusions involving the
target and target_clones attributes to use the generic attribute
exclusion lists.  Additionally, the duplicate handling for the
always_inline and noinline attribute exclusion is removed.

The only change in functionality is the choice of warning message
displayed - due to either a change in the wording for mutual exclusion
warnings, or a change in the order in which different checks occur.

gcc/c-family/ChangeLog:

* c-attribs.cc (attr_always_inline_exclusions): New.
(attr_target_exclusions): Ditto.
(attr_target_clones_exclusions): Ditto.
(c_common_attribute_table): Add new exclusion lists.
(handle_noinline_attribute): Remove custom exclusion handling.
(handle_always_inline_attribute): Ditto.
(handle_target_attribute): Ditto.
(handle_target_clones_attribute): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mvc2.C:
* g++.target/i386/mvc3.C:


diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 
461732f60f7c4031cc6692000fbdddb9f726a035..b3b41ef123a0f171f57acb1b7f7fdde716428c00
 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -214,6 +214,13 @@ static const struct attribute_spec::exclusions 
attr_inline_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+static const struct attribute_spec::exclusions attr_always_inline_exclusions[] 
=
+{
+  ATTR_EXCL ("noinline", true, true, true),
+  ATTR_EXCL ("target_clones", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
 static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
 {
   ATTR_EXCL ("always_inline", true, true, true),
@@ -221,6 +228,19 @@ static const struct attribute_spec::exclusions 
attr_noinline_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+static const struct attribute_spec::exclusions attr_target_exclusions[] =
+{
+  ATTR_EXCL ("target_clones", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
+static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
+{
+  ATTR_EXCL ("always_inline", true, true, true),
+  ATTR_EXCL ("target", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
 extern const struct attribute_spec::exclusions attr_noreturn_exclusions[] =
 {
   ATTR_EXCL ("alloc_align", true, true, true),
@@ -332,7 +352,7 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_leaf_attribute, NULL },
   { "always_inline",  0, 0, true,  false, false, false,
  handle_always_inline_attribute,
- attr_inline_exclusions },
+ attr_always_inline_exclusions },
   { "gnu_inline", 0, 0, true,  false, false, false,
  handle_gnu_inline_attribute,
  attr_inline_exclusions },
@@ -483,9 +503,11 @@ const struct attribute_spec c_common_attribute_table[] =
   { "error", 1, 1, true,  false, false, false,
  handle_error_attribute, NULL },
   { "target", 1, -1, true, false, false, false,
- handle_target_attribute, NULL },
+ handle_target_attribute,
+ attr_target_exclusions },
   { "target_clones",  1, -1, true, false, false, false,
- handle_target_clones_attribute, NULL },
+ handle_target_clones_attribute,
+ attr_target_clones_exclusions },
   { "optimize",   1, -1, true, false, false, false,
  handle_optimize_attribute, NULL },
   /* For internal use only.  The leading '*' both prevents its usage in
@@ -1397,16 +1419,7 @@ handle_noinline_attribute (tree *node, tree name,
   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
-{
-  if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
-   {
- warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with attribute %qs", name, "always_inline");
- *no_add_attrs = true;
-   }
-  else
-   DECL_UNINLINABLE (*node) = 1;
-}
+DECL_UNINLINABLE (*node) = 1;
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -1487,22 +1500,9 @@ handle_always_inline_attribute (tree *node, tree name,
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
 {
-  if (lookup_attribute ("noinline", DECL_ATTRIBUTES (*node)))
-   {
- warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with %qs attribute", name, "noinline");
- *no_add_attrs = true;
-   }
-  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
-   {
- 

[1/5] aarch64: Add cpu feature detection to libgcc

2023-12-06 Thread Andrew Carlotti
This is added to enable function multiversioning, but can also be used
directly.  The interface is chosen to match that used in LLVM's
compiler-rt, to facilitate cross-compiler compatibility.

The content of the patch is derived almost entirely from Pavel's prior
contributions to compiler-rt/lib/builtins/cpu_model.c. I have made minor
changes to align more closely with GCC coding style, and to exclude any code
from other LLVM contributors, and am adding this to GCC with Pavel's approval.

libgcc/ChangeLog:

* config/aarch64/t-aarch64: Include cpuinfo.c
* config/aarch64/cpuinfo.c: New file
(__init_cpu_features_constructor) New.
(__init_cpu_features_resolver) New.
(__init_cpu_features) New.

Co-authored-by: Pavel Iliin 


diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
new file mode 100644
index 
..634f591c194bc70048f714d7eb0ace1f2f4137ea
--- /dev/null
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -0,0 +1,500 @@
+/* CPU feature detection for AArch64 architecture.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+  
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#if __has_include()
+#include 
+
+#if __has_include()
+#include 
+#else
+typedef struct __ifunc_arg_t {
+  unsigned long _size;
+  unsigned long _hwcap;
+  unsigned long _hwcap2;
+} __ifunc_arg_t;
+#endif
+
+#if __has_include()
+#include 
+
+/* CPUFeatures must correspond to the same AArch64 features in aarch64.cc  */
+enum CPUFeatures {
+  FEAT_RNG,
+  FEAT_FLAGM,
+  FEAT_FLAGM2,
+  FEAT_FP16FML,
+  FEAT_DOTPROD,
+  FEAT_SM4,
+  FEAT_RDM,
+  FEAT_LSE,
+  FEAT_FP,
+  FEAT_SIMD,
+  FEAT_CRC,
+  FEAT_SHA1,
+  FEAT_SHA2,
+  FEAT_SHA3,
+  FEAT_AES,
+  FEAT_PMULL,
+  FEAT_FP16,
+  FEAT_DIT,
+  FEAT_DPB,
+  FEAT_DPB2,
+  FEAT_JSCVT,
+  FEAT_FCMA,
+  FEAT_RCPC,
+  FEAT_RCPC2,
+  FEAT_FRINTTS,
+  FEAT_DGH,
+  FEAT_I8MM,
+  FEAT_BF16,
+  FEAT_EBF16,
+  FEAT_RPRES,
+  FEAT_SVE,
+  FEAT_SVE_BF16,
+  FEAT_SVE_EBF16,
+  FEAT_SVE_I8MM,
+  FEAT_SVE_F32MM,
+  FEAT_SVE_F64MM,
+  FEAT_SVE2,
+  FEAT_SVE_AES,
+  FEAT_SVE_PMULL128,
+  FEAT_SVE_BITPERM,
+  FEAT_SVE_SHA3,
+  FEAT_SVE_SM4,
+  FEAT_SME,
+  FEAT_MEMTAG,
+  FEAT_MEMTAG2,
+  FEAT_MEMTAG3,
+  FEAT_SB,
+  FEAT_PREDRES,
+  FEAT_SSBS,
+  FEAT_SSBS2,
+  FEAT_BTI,
+  FEAT_LS64,
+  FEAT_LS64_V,
+  FEAT_LS64_ACCDATA,
+  FEAT_WFXT,
+  FEAT_SME_F64,
+  FEAT_SME_I64,
+  FEAT_SME2,
+  FEAT_RCPC3,
+  FEAT_MAX,
+  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
+   in __aarch64_cpu_features.  */
+  FEAT_INIT  /* Used as flag of features initialization completion.  */
+};
+
+/* Architecture features used in Function Multi Versioning.  */
+struct {
+  unsigned long long features;
+  /* As features grows new fields could be added.  */
+} __aarch64_cpu_features __attribute__((visibility("hidden"), nocommon));
+
+#ifndef _IFUNC_ARG_HWCAP
+#define _IFUNC_ARG_HWCAP (1ULL << 62)
+#endif
+#ifndef AT_HWCAP
+#define AT_HWCAP 16
+#endif
+#ifndef HWCAP_CPUID
+#define HWCAP_CPUID (1 << 11)
+#endif
+#ifndef HWCAP_FP
+#define HWCAP_FP (1 << 0)
+#endif
+#ifndef HWCAP_ASIMD
+#define HWCAP_ASIMD (1 << 1)
+#endif
+#ifndef HWCAP_AES
+#define HWCAP_AES (1 << 3)
+#endif
+#ifndef HWCAP_PMULL
+#define HWCAP_PMULL (1 << 4)
+#endif
+#ifndef HWCAP_SHA1
+#define HWCAP_SHA1 (1 << 5)
+#endif
+#ifndef HWCAP_SHA2
+#define HWCAP_SHA2 (1 << 6)
+#endif
+#ifndef HWCAP_ATOMICS
+#define HWCAP_ATOMICS (1 << 8)
+#endif
+#ifndef HWCAP_FPHP
+#define HWCAP_FPHP (1 << 9)
+#endif
+#ifndef HWCAP_ASIMDHP
+#define HWCAP_ASIMDHP (1 << 10)
+#endif
+#ifndef HWCAP_ASIMDRDM
+#define HWCAP_ASIMDRDM (1 << 12)
+#endif
+#ifndef HWCAP_JSCVT
+#define HWCAP_JSCVT (1 << 13)
+#endif
+#ifndef HWCAP_FCMA
+#define HWCAP_FCMA (1 << 14)
+#endif
+#ifndef HWCAP_LRCPC
+#define HWCAP_LRCPC (1 << 15)
+#endif
+#ifndef HWCAP_DCPOP
+#define HWCAP_DCPOP (1 << 16)
+#endif
+#ifndef HWCAP_SHA3
+#define HWCAP_SHA3 (1 << 17)
+#endif
+#ifndef HWCAP_SM3
+#define HWCAP_SM3 (1 << 18)
+#endif
+#ifndef HWCAP_SM4
+#define HWCAP_SM4 (1 << 19)

[PATCH v3 0/5] target_version and aarch64 function multiversioning

2023-12-06 Thread Andrew Carlotti
This series adds support for function multiversioning on aarch64.

Patches 1-3 are already approved, with just one minor change from the previous
version of patch 1 suggested by Richard Sandiford.

Patches 4-5 are updated based on Richard's reviews.  The only major change is
replacing the EXPANDED_CLONES_ATTRIBUTE target hook with the
TARGET_HAS_FMV_TARGET_ATTRIBUTE macro.  I've also reorganised
dispatch_function_versions and aarch64_mangle_decl_assembler_name, along with
several other minor fixes.

The updated series passes regression testing on both aarch64 for C and C++.
The previous version passed testing on x86; I haven't retested it since.

Ok for master?

Thanks,
Andrew


aarch64: Fix +nopredres, +nols64 and +nomops

2023-12-05 Thread Andrew Carlotti
For native cpu feature detection, certain features have no entry in
/proc/cpuinfo, so have to be assumed to be present whenever the detected
cpu is supposed to support that feature.

However, the logic for this was mistakenly implemented by excluding
these features from part of aarch64_get_extension_string_for_isa_flags.
This function is also used elsewhere when canonicalising explicit
feature sets, which may require removing features that are normally
implied by the specified architecture version.

This change reenables generation of +nopredres, +nols64 and +nomops
during canonicalisation, by relocating the misplaced native cpu
detection logic.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_get_extension_string_for_isa_flags): Remove filtering
of features without native detection.
* config/aarch64/driver-aarch64.cc (host_detect_local_cpu):
Explicitly add expected features that lack cpuinfo detection.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_29.c: New test.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
ee2ea7eae105d19ec906ef8d25d3a237fbeac4b4..37e60d6083e290b18b1f4c6274123b0a58de5476
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -357,8 +357,7 @@ aarch64_get_extension_string_for_isa_flags
   }
 
   for (auto  : all_extensions)
-if (opt.native_detect_p
-   && (opt.flag_canonical != AARCH64_FL_CRYPTO)
+if ((opt.flag_canonical != AARCH64_FL_CRYPTO)
&& (opt.flag_canonical & current_flags & ~isa_flags))
   {
current_flags &= ~opt.flags_off;
diff --git a/gcc/config/aarch64/driver-aarch64.cc 
b/gcc/config/aarch64/driver-aarch64.cc
index 
8e318892b10aa2288421fad418844744a2f5a3b4..470c19b650f1ae953918eaeddbf0f768c12a99d9
 100644
--- a/gcc/config/aarch64/driver-aarch64.cc
+++ b/gcc/config/aarch64/driver-aarch64.cc
@@ -262,6 +262,7 @@ host_detect_local_cpu (int argc, const char **argv)
   unsigned int n_variants = 0;
   bool processed_exts = false;
   aarch64_feature_flags extension_flags = 0;
+  aarch64_feature_flags unchecked_extension_flags = 0;
   aarch64_feature_flags default_flags = 0;
   std::string buf;
   size_t sep_pos = -1;
@@ -348,7 +349,10 @@ host_detect_local_cpu (int argc, const char **argv)
  /* If the feature contains no HWCAPS string then ignore it for the
 auto detection.  */
  if (val.empty ())
-   continue;
+   {
+ unchecked_extension_flags |= aarch64_extensions[i].flag;
+ continue;
+   }
 
  bool enabled = true;
 
@@ -447,6 +451,13 @@ host_detect_local_cpu (int argc, const char **argv)
   if (tune)
 return res;
 
+  if (!processed_exts)
+goto not_found;
+
+  /* Add any features that should be be present, but can't be verified using
+ the /proc/cpuinfo "Features" list.  */
+  extension_flags |= unchecked_extension_flags & default_flags;
+
   {
 std::string extension
   = aarch64_get_extension_string_for_isa_flags (extension_flags,
diff --git a/gcc/testsuite/gcc.target/aarch64/options_set_29.c 
b/gcc/testsuite/gcc.target/aarch64/options_set_29.c
new file mode 100644
index 
..01bb73c02e232bdfeca5f16dad3fa2a6484843d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/options_set_29.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv9.3-a+nopredres+nols64+nomops" } */
+
+int main ()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {\.arch 
armv9\.3\-a\+crc\+nopredres\+nols64\+nomops\n} 1 } } */
+
+/* Checking if enabling default features drops the superfluous bits.   */


aarch64: Fix +nocrypto handling

2023-12-05 Thread Andrew Carlotti
Additionally, replace all checks for the AARCH64_FL_CRYPTO bit with
checks for (AARCH64_FL_AES | AARCH64_FL_SHA2) instead.  The value of the
AARCH64_FL_CRYPTO bit within isa_flags is now ignored, but it is
retained because removing it would make processing the data in
option-extensions.def significantly more complex.

Ok for master?

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_get_extension_string_for_isa_flags): Fix generation of
the "+nocrypto" extension.
* config/aarch64/aarch64.h (AARCH64_ISA_CRYPTO): Remove.
(TARGET_CRYPTO): Remove.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Don't use TARGET_CRYPTO.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_27.c: New test.
* gcc.target/aarch64/options_set_28.c: New test.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
20bc4e1291bba9b73798398fea659f1154afa205..6d12454143cd64ebaafa7f5e6c23869ee0bfa543
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -310,6 +310,7 @@ aarch64_get_extension_string_for_isa_flags
  But in order to make the output more readable, it seems better
  to add the strings in definition order.  */
   aarch64_feature_flags added = 0;
+  auto flags_crypto = AARCH64_FL_AES | AARCH64_FL_SHA2;
   for (unsigned int i = ARRAY_SIZE (all_extensions); i-- > 0; )
 {
   auto  = all_extensions[i];
@@ -319,7 +320,7 @@ aarch64_get_extension_string_for_isa_flags
 per-feature crypto flags.  */
   auto flags = opt.flag_canonical;
   if (flags == AARCH64_FL_CRYPTO)
-   flags = AARCH64_FL_AES | AARCH64_FL_SHA2;
+   flags = flags_crypto;
 
   if ((flags & isa_flags & (explicit_flags | ~current_flags)) == flags)
{
@@ -337,9 +338,27 @@ aarch64_get_extension_string_for_isa_flags
   /* Remove the features in current_flags & ~isa_flags.  If the feature does
  not have an HWCAPs then it shouldn't be taken into account for feature
  detection because one way or another we can't tell if it's available
- or not.  */
+ or not.
+
+ As a special case, emit "+nocrypto" instead of "+noaes+nosha2", in order
+ to support assemblers that predate the separate per-feature crypto flags.
+ Only use "+nocrypto" when "simd" is enabled (to avoid redundant feature
+ removal), and when "sm4" is not already enabled (to avoid dependending on
+ whether "+nocrypto" also disables "sm4")  */
+  for (auto  : all_extensions)
+if ((opt.flag_canonical == AARCH64_FL_CRYPTO)
+   && ((flags_crypto & current_flags & ~isa_flags) == flags_crypto)
+   && (current_flags & AARCH64_FL_SIMD)
+   && !(current_flags & AARCH64_FL_SM4))
+  {
+   current_flags &= ~opt.flags_off;
+   outstr += "+no";
+   outstr += opt.name;
+  }
+
   for (auto  : all_extensions)
 if (opt.native_detect_p
+   && (opt.flag_canonical != AARCH64_FL_CRYPTO)
&& (opt.flag_canonical & current_flags & ~isa_flags))
   {
current_flags &= ~opt.flags_off;
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
ab8844f6049dc95b97648b651bfcd3a4ccd3ca0b..4f9ee01d52f3ac42f95edbb030bdb2d09fc36d16
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -140,7 +140,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_ILP32, "_ILP32", pfile);
   aarch64_def_or_undef (TARGET_ILP32, "__ILP32__", pfile);
 
-  aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile);
+  aarch64_def_or_undef (TARGET_AES && TARGET_SHA2, "__ARM_FEATURE_CRYPTO", 
pfile);
   aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
   aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE", pfile);
   cpp_undef (pfile, "__ARM_FEATURE_SVE_BITS");
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
1ac298926ce1606a87bcdcaf691f182ca416d600..d3613a0a42b7b6d2c4452739841b133014909a39
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -177,10 +177,13 @@ enum class aarch64_feature : unsigned char {
 
 #endif
 
-/* Macros to test ISA flags.  */
+/* Macros to test ISA flags.
+
+   There is intentionally no macro for AARCH64_FL_CRYPTO, since this flag bit
+   is not always set when its constituent features are present.
+   Check (TARGET_AES && TARGET_SHA2) instead.  */
 
 #define AARCH64_ISA_CRC(aarch64_isa_flags & AARCH64_FL_CRC)
-#define AARCH64_ISA_CRYPTO (aarch64_isa_flags & AARCH64_FL_CRYPTO)
 #define AARCH64_ISA_FP (aarch64_isa_flags & AARCH64_FL_FP)
 #define AARCH64_ISA_SIMD   (aarch64_isa_flags & AARCH64_FL_SIMD)
 #define AARCH64_ISA_LSE   (aarch64_isa_flags & AARCH64_FL_LSE)
@@ -223,9 +226,6 @@ enum class aarch64_feature : unsigned char {
 #define 

aarch64 testsuite: Check entire .arch string

2023-12-05 Thread Andrew Carlotti
Add a terminating newline to various tests, and add missing
extensions to some test strings.

Obvious change, so I'll push it once my other option handling patches are
approved (if noone objects).

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_18.c: Add \+nopauth\n
* gcc.target/aarch64/options_set_7.c: Add \+crc\n
* gcc.target/aarch64/options_set_8.c: Add \+crc\+nodotprod\n
* gcc.target/aarch64/cpunative/native_cpu_0.c: Add \n
* gcc.target/aarch64/cpunative/native_cpu_1.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_2.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_3.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_4.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_5.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_8.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_9.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_10.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_11.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_12.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_14.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_15.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/options_set_1.c: Ditto.
* gcc.target/aarch64/options_set_2.c: Ditto.
* gcc.target/aarch64/options_set_3.c: Ditto.
* gcc.target/aarch64/options_set_4.c: Ditto.
* gcc.target/aarch64/options_set_5.c: Ditto.
* gcc.target/aarch64/options_set_6.c: Ditto.
* gcc.target/aarch64/options_set_9.c: Ditto.
* gcc.target/aarch64/options_set_11.c: Ditto.
* gcc.target/aarch64/options_set_12.c: Ditto.
* gcc.target/aarch64/options_set_13.c: Ditto.
* gcc.target/aarch64/options_set_14.c: Ditto.
* gcc.target/aarch64/options_set_15.c: Ditto.
* gcc.target/aarch64/options_set_16.c: Ditto.
* gcc.target/aarch64/options_set_17.c: Ditto.
* gcc.target/aarch64/options_set_18.c: Ditto.
* gcc.target/aarch64/options_set_19.c: Ditto.
* gcc.target/aarch64/options_set_20.c: Ditto.
* gcc.target/aarch64/options_set_21.c: Ditto.
* gcc.target/aarch64/options_set_22.c: Ditto.
* gcc.target/aarch64/options_set_23.c: Ditto.
* gcc.target/aarch64/options_set_24.c: Ditto.
* gcc.target/aarch64/options_set_25.c: Ditto.
* gcc.target/aarch64/options_set_26.c: Ditto.


diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
index 
8499f87c39b173491a89626af56f4e193b1d12b5..fb5a7a18ad1a2d09ac4b231150a1bd9e72d6fab6
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto\n} } } */
 
 /* Test a normal looking procinfo.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c
index 
2cf0e89994b1cc0dc9fac67f4dc431c003498048..cb50e3b73057994432cc3ed15e3d5b57c7a3cb7b
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+nosimd} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+nosimd\n} } } */
 
 /* Test one where fp is on by default so turn off simd.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c
index 
ddb06b8227576807fe068b76dabed91a0223e4fa..6a524bad371c55fc32698ff0994f4ad431be49ca
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+nofp} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+nofp\n} } } */
 
 /* Test one with no entry in feature list.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c
index 
96b9ca434ebbf007ddaa45d55a8c2b8e7a19a715..644f4792275bdd32a9f84241f0c329b046cbd909
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { 

[PATCH] aarch64: Add missing driver-aarch64 dependencies

2023-12-05 Thread Andrew Carlotti
Ok for master?

gcc/ChangeLog:

* config/aarch64/x-aarch64: Add missing dependencies.


diff --git a/gcc/config/aarch64/x-aarch64 b/gcc/config/aarch64/x-aarch64
index 
3cf701a0a01ab00eaaafdfad14bd90ebbb1d498f..6fd638faaab7cb5bb2309d36d6dea2adf1fb8d32
 100644
--- a/gcc/config/aarch64/x-aarch64
+++ b/gcc/config/aarch64/x-aarch64
@@ -1,3 +1,7 @@
 driver-aarch64.o: $(srcdir)/config/aarch64/driver-aarch64.cc \
-  $(CONFIG_H) $(SYSTEM_H)
+  $(CONFIG_H) $(SYSTEM_H) $(TM_H) $(CORETYPES_H) \
+  $(srcdir)/config/aarch64/aarch64-protos.h \
+  $(srcdir)/config/aarch64/aarch64-feature-deps.h \
+  $(srcdir)/config/aarch64/aarch64-cores.def \
+  $(srcdir)/config/aarch64/aarch64-arches.def
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<


Re: [PATCH v2 5/5] aarch64: Add function multiversioning support

2023-12-04 Thread Andrew Carlotti
On Fri, Nov 24, 2023 at 04:22:54PM +, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > This adds initial support for function multiversioning on aarch64 using
> > the target_version and target_clones attributes.  This loosely follows
> > the Beta specification in the ACLE [1], although with some differences
> > that still need to be resolved (possibly as follow-up patches).
> >
> > Existing function multiversioning implementations are broken in various
> > ways when used across translation units.  This includes placing
> > resolvers in the wrong translation units, and using symbol mangling that
> > callers to unintentionally bypass the resolver in some circumstances.
> > Fixing these issues for aarch64 will require modifications to our ACLE
> > specification.  It will also require further adjustments to existing
> > middle end code, to facilitate different mangling and resolver
> > placement while preserving existing target behaviours.
> >
> > The list of function multiversioning features specified in the ACLE is
> > also inconsistent with the list of features supported in target option
> > extensions.  I intend to resolve some or all of these inconsistencies at
> > a later stage.
> >
> > The target_version attribute is currently only supported in C++, since
> > this is the only frontend with existing support for multiversioning
> > using the target attribute.  On the other hand, this patch happens to
> > enable multiversioning with the target_clones attribute in Ada and D, as
> > well as the entire C family, using their existing frontend support.
> >
> > This patch also does not support the following aspects of the Beta
> > specification:
> >
> > - The target_clones attribute should allow an implicit unlisted
> >   "default" version.
> > - There should be an option to disable function multiversioning at
> >   compile time.
> > - Unrecognised target names in a target_clones attribute should be
> >   ignored (with an optional warning).  This current patch raises an
> >   error instead.
> >
> > [1] 
> > https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
> >
> > ---
> >
> > I believe the support present in this patch correctly handles function
> > multiversioning within a single translation unit for all features in the 
> > ACLE
> > specification with option extension support.
> >
> > Is it ok to push this patch in its current state? I'd then continue working 
> > on
> > incremental improvements to the supported feature extensions and the ABI 
> > issues
> > in followup patches, in along with corresponding changes and improvements to
> > the ACLE specification.
> >
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-feature-deps.h (fmv_deps_):
> > Define aarch64_feature_flags mask foreach FMV feature.
> > * config/aarch64/aarch64-option-extensions.def: Use new macros
> > to define FMV feature extensions.
> > * config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
> > Check for target_version attribute after processing target
> > attribute.
> > (aarch64_fmv_feature_data): New.
> > (aarch64_parse_fmv_features): New.
> > (aarch64_process_target_version_attr): New.
> > (aarch64_option_valid_version_attribute_p): New.
> > (get_feature_mask_for_version): New.
> > (compare_feature_masks): New.
> > (aarch64_compare_version_priority): New.
> > (build_ifunc_arg_type): New.
> > (make_resolver_func): New.
> > (add_condition_to_bb): New.
> > (compare_feature_version_info): New.
> > (dispatch_function_versions): New.
> > (aarch64_generate_version_dispatcher_body): New.
> > (aarch64_get_function_versions_dispatcher): New.
> > (aarch64_common_function_versions): New.
> > (aarch64_mangle_decl_assembler_name): New.
> > (TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
> > (TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
> > (TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
> > (TARGET_COMPARE_VERSION_PRIORITY): New implementation.
> > (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
> > (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
> > (TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
> > * config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
> >   new value to report duplicate FMV feature.
> > * common/config/aarch64/

Re: [PATCH v2 4/5] Add support for target_version attribute

2023-12-04 Thread Andrew Carlotti
On Wed, Nov 29, 2023 at 05:53:56PM +, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > This patch adds support for the "target_version" attribute to the middle
> > end and the C++ frontend, which will be used to implement function
> > multiversioning in the aarch64 backend.
> >
> > On targets that don't use the "target" attribute for multiversioning,
> > there is no conflict between the "target" and "target_clones"
> > attributes.  This patch therefore makes the mutual exclusion in
> > C-family, D and Ada conditonal upon the value of the
> > expanded_clones_attribute target hook.
> >
> > The "target_version" attribute is only added to C++ in this patch,
> > because this is currently the only frontend which supports
> > multiversioning using the "target" attribute.  Support for the
> > "target_version" attribute will be extended to C at a later date.
> >
> > Targets that currently use the "target" attribute for function
> > multiversioning (i.e. i386 and rs6000) are not affected by this patch.
> >
> > Ok for master?
> >
> > gcc/ChangeLog:
> >
> > * attribs.cc (decl_attributes): Pass attribute name to target.
> > (is_function_default_version): Update comment to specify
> > incompatibility with target_version attributes.
> > * cgraphclones.cc (cgraph_node::create_version_clone_with_body):
> > Call valid_version_attribute_p for target_version attributes.
> > * target.def (valid_version_attribute_p): New hook.
> > (expanded_clones_attribute): New hook.
> > * doc/tm.texi.in: Add new hooks.
> > * doc/tm.texi: Regenerate.
> > * multiple_target.cc (create_dispatcher_calls): Remove redundant
> > is_function_default_version check.
> > (expand_target_clones): Use target hook for attribute name.
> > * targhooks.cc (default_target_option_valid_version_attribute_p):
> > New.
> > * targhooks.h (default_target_option_valid_version_attribute_p):
> > New.
> > * tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
> > target_version attributes.
> >
> > gcc/c-family/ChangeLog:
> >
> > * c-attribs.cc (CLONES_USES_TARGET): New macro.
> > (attr_target_exclusions): Use new macro.
> > (attr_target_clones_exclusions): Ditto, and add target_version.
> > (attr_target_version_exclusions): New.
> > (c_common_attribute_table): Add target_version.
> > (handle_target_version_attribute): New.
> >
> > gcc/ada/ChangeLog:
> >
> > * gcc-interface/utils.cc (CLONES_USES_TARGET): New macro.
> > (attr_target_exclusions): Use new macro.
> > (attr_target_clones_exclusions): Ditto.
> >
> > gcc/d/ChangeLog:
> >
> > * d-attribs.cc (CLONES_USES_TARGET): New macro.
> > (attr_target_exclusions): Use new macro.
> > (attr_target_clones_exclusions): Ditto.
> >
> > gcc/cp/ChangeLog:
> >
> > * decl2.cc (check_classfn): Update comment to include
> > target_version attributes.
> >
> >
> > diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
> > index 
> > e33a63948cebdeafc3abcdd539a35141969ad978..8850943cb3326568b4679a73405f50487aa1b7c6
> >  100644
> > --- a/gcc/ada/gcc-interface/utils.cc
> > +++ b/gcc/ada/gcc-interface/utils.cc
> > @@ -143,16 +143,21 @@ static const struct attribute_spec::exclusions 
> > attr_noinline_exclusions[] =
> >{ NULL, false, false, false },
> >  };
> >  
> > +#define CLONES_USES_TARGET \
> > +  (strcmp (targetm.target_option.expanded_clones_attribute, \
> > +  "target") == 0)
> > +
> 
> Sorry for the slower review on this part.  I was hoping inspiration
> would strike for a way to resolve this, but it hasn't, so:
> 
> The codebase usually avoids static variables that need dynamic
> initialisation.  So although macros are not the preferred way of
> doing things, I think one is probably appropriate here.  How about:
> 
>   TARGET_HAS_FMV_TARGET_ATTRIBUTE
> 
> with the default being true, and with AArch64 defining it to false?
> 
> This would replace the expanded_clones_attribute hook, with:
> 
>   const char *new_attr_name = targetm.target_option.expanded_clones_attribute;
> 
> becoming:
> 
>   const char *new_attr_name = (TARGET_HAS_FMV_TARGET_ATTRIBUTE
>  ? "target" : "target_version");
> 
> I realise this is anything but elegant, but I thi

Re: [PATCH v2[1/5] aarch64: Add cpu feature detection to libgcc

2023-12-04 Thread Andrew Carlotti
On Mon, Nov 20, 2023 at 03:46:06PM +, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > This is added to enable function multiversioning, but can also be used
> > directly.  The interface is chosen to match that used in LLVM's
> > compiler-rt, to facilitate cross-compiler compatibility.
> >
> > The content of the patch is derived almost entirely from Pavel's prior
> > contributions to compiler-rt/lib/builtins/cpu_model.c. I have made minor
> > changes to align more closely with GCC coding style, and to exclude any code
> > from other LLVM contributors, and am adding this to GCC with Pavel's 
> > approval.
> >
> > libgcc/ChangeLog:
> >
> > * config/aarch64/t-aarch64: Include cpuinfo.c
> > * config/aarch64/cpuinfo.c: New file
> > (__init_cpu_features_constructor) New.
> > (__init_cpu_features_resolver) New.
> > (__init_cpu_features) New.
> 
> OK on the basis that you mentioed in the covering note: we can deal
> with fixes incrementally.  One question though...
> >
> > Co-authored-by: Pavel Iliin 
> >
> >
> > diff --git a/libgcc/config/aarch64/cpuinfo.c 
> > b/libgcc/config/aarch64/cpuinfo.c
> > new file mode 100644
> > index 
> > ..0888ca4ed058430f524b99cb0e204bd996fa0e55
> > --- /dev/null
> > +++ b/libgcc/config/aarch64/cpuinfo.c
> > @@ -0,0 +1,502 @@
> > +/* CPU feature detection for AArch64 architecture.
> > +   Copyright (C) 2023 Free Software Foundation, Inc.
> > +
> > +   This file is part of GCC.
> > +
> > +   This file is free software; you can redistribute it and/or modify it
> > +   under the terms of the GNU General Public License as published by the
> > +   Free Software Foundation; either version 3, or (at your option) any
> > +   later version.
> > +
> > +   This file is distributed in the hope that it will be useful, but
> > +   WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   General Public License for more details.
> > +
> > +   Under Section 7 of GPL version 3, you are granted additional
> > +   permissions described in the GCC Runtime Library Exception, version
> > +   3.1, as published by the Free Software Foundation.
> > +  
> > +   You should have received a copy of the GNU General Public License and
> > +   a copy of the GCC Runtime Library Exception along with this program;
> > +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +#if defined(__has_include)
> 
> Is this protecting against a known condition?  libgcc has to be built
> with the associated version of GCC, so it might be better to drop the
> #if and get a noisy failure if something unexpected happens.  That can
> be part of 5/5 though.
> 
> Thanks,
> Richard

I don't know that this is required, so I'll assume it isn't.  I'll drop it in
the next version of this patch.

> > +#if __has_include()
> > +#include 
> > +
> > +#if __has_include()
> > +#include 
> > +#else
> > +typedef struct __ifunc_arg_t {
> > +  unsigned long _size;
> > +  unsigned long _hwcap;
> > +  unsigned long _hwcap2;
> > +} __ifunc_arg_t;
> > +#endif
> > +
> > +#if __has_include()
> > +#include 
> > +
> > +/* CPUFeatures must correspond to the same AArch64 features in aarch64.cc  
> > */
> > +enum CPUFeatures {
> > +  FEAT_RNG,
> > +  FEAT_FLAGM,
> > +  FEAT_FLAGM2,
> > +  FEAT_FP16FML,
> > +  FEAT_DOTPROD,
> > +  FEAT_SM4,
> > +  FEAT_RDM,
> > +  FEAT_LSE,
> > +  FEAT_FP,
> > +  FEAT_SIMD,
> > +  FEAT_CRC,
> > +  FEAT_SHA1,
> > +  FEAT_SHA2,
> > +  FEAT_SHA3,
> > +  FEAT_AES,
> > +  FEAT_PMULL,
> > +  FEAT_FP16,
> > +  FEAT_DIT,
> > +  FEAT_DPB,
> > +  FEAT_DPB2,
> > +  FEAT_JSCVT,
> > +  FEAT_FCMA,
> > +  FEAT_RCPC,
> > +  FEAT_RCPC2,
> > +  FEAT_FRINTTS,
> > +  FEAT_DGH,
> > +  FEAT_I8MM,
> > +  FEAT_BF16,
> > +  FEAT_EBF16,
> > +  FEAT_RPRES,
> > +  FEAT_SVE,
> > +  FEAT_SVE_BF16,
> > +  FEAT_SVE_EBF16,
> > +  FEAT_SVE_I8MM,
> > +  FEAT_SVE_F32MM,
> > +  FEAT_SVE_F64MM,
> > +  FEAT_SVE2,
> > +  FEAT_SVE_AES,
> > +  FEAT_SVE_PMULL128,
> > +  FEAT_SVE_BITPERM,
> > +  FEAT_SVE_SHA3,
> > +  FEAT_SVE_SM4,
> > +  FEAT_SME,
> > +  FEAT_MEMTAG,
> > +  FEAT_MEMTAG2,
> > +  FEA

Re: [PATCH v2 3/5] ada: Improve attribute exclusion handling

2023-11-17 Thread Andrew Carlotti
On Fri, Nov 17, 2023 at 11:45:16AM +0100, Marc Poulhi�s wrote:
> 
> Hello,
> 
> > I haven't manged to test the Ada frontend, but this patch (and the following
> 
> I don't have an aarch64 setup to test, but I may be able to help with the
> issue preventing you from testing. Can you elaborate what is the problem?
> 
> Marc

I only really got as far as trying to configure a build environemnt, which
failed with 'configure: error: GNAT is required to build ada'.  I have no prior
Ada experience, and I couldn't work out how to get any relevant test code to
compile on Compiler Explorer.  I therefore decided it wasn't worth me spending
more effort trying to test from Ada a small change to some code that is
effectively front-end independent, but just happens to be added to a limited
subset of front ends.

It's probably sufficient to simply test that the Ada changes can be built for
any target, since I'd be surprised if I've managed to copy this code from C++
in a way that breaks functionality without obviously breaking the build.



[PATCH v2 5/5] aarch64: Add function multiversioning support

2023-11-16 Thread Andrew Carlotti
This adds initial support for function multiversioning on aarch64 using
the target_version and target_clones attributes.  This loosely follows
the Beta specification in the ACLE [1], although with some differences
that still need to be resolved (possibly as follow-up patches).

Existing function multiversioning implementations are broken in various
ways when used across translation units.  This includes placing
resolvers in the wrong translation units, and using symbol mangling that
callers to unintentionally bypass the resolver in some circumstances.
Fixing these issues for aarch64 will require modifications to our ACLE
specification.  It will also require further adjustments to existing
middle end code, to facilitate different mangling and resolver
placement while preserving existing target behaviours.

The list of function multiversioning features specified in the ACLE is
also inconsistent with the list of features supported in target option
extensions.  I intend to resolve some or all of these inconsistencies at
a later stage.

The target_version attribute is currently only supported in C++, since
this is the only frontend with existing support for multiversioning
using the target attribute.  On the other hand, this patch happens to
enable multiversioning with the target_clones attribute in Ada and D, as
well as the entire C family, using their existing frontend support.

This patch also does not support the following aspects of the Beta
specification:

- The target_clones attribute should allow an implicit unlisted
  "default" version.
- There should be an option to disable function multiversioning at
  compile time.
- Unrecognised target names in a target_clones attribute should be
  ignored (with an optional warning).  This current patch raises an
  error instead.

[1] 
https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning

---

I believe the support present in this patch correctly handles function
multiversioning within a single translation unit for all features in the ACLE
specification with option extension support.

Is it ok to push this patch in its current state? I'd then continue working on
incremental improvements to the supported feature extensions and the ABI issues
in followup patches, in along with corresponding changes and improvements to
the ACLE specification.


gcc/ChangeLog:

* config/aarch64/aarch64-feature-deps.h (fmv_deps_):
Define aarch64_feature_flags mask foreach FMV feature.
* config/aarch64/aarch64-option-extensions.def: Use new macros
to define FMV feature extensions.
* config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
Check for target_version attribute after processing target
attribute.
(aarch64_fmv_feature_data): New.
(aarch64_parse_fmv_features): New.
(aarch64_process_target_version_attr): New.
(aarch64_option_valid_version_attribute_p): New.
(get_feature_mask_for_version): New.
(compare_feature_masks): New.
(aarch64_compare_version_priority): New.
(build_ifunc_arg_type): New.
(make_resolver_func): New.
(add_condition_to_bb): New.
(compare_feature_version_info): New.
(dispatch_function_versions): New.
(aarch64_generate_version_dispatcher_body): New.
(aarch64_get_function_versions_dispatcher): New.
(aarch64_common_function_versions): New.
(aarch64_mangle_decl_assembler_name): New.
(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
(TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
(TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
(TARGET_COMPARE_VERSION_PRIORITY): New implementation.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
* config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
  new value to report duplicate FMV feature.
* common/config/aarch64/cpuinfo.h: New file.

libgcc/ChangeLog:

* config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
  copy in gcc/common

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_17.c: Reorder expected flags.
* gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_6.c: 

[PATCH v2 4/5] Add support for target_version attribute

2023-11-16 Thread Andrew Carlotti
This patch adds support for the "target_version" attribute to the middle
end and the C++ frontend, which will be used to implement function
multiversioning in the aarch64 backend.

On targets that don't use the "target" attribute for multiversioning,
there is no conflict between the "target" and "target_clones"
attributes.  This patch therefore makes the mutual exclusion in
C-family, D and Ada conditonal upon the value of the
expanded_clones_attribute target hook.

The "target_version" attribute is only added to C++ in this patch,
because this is currently the only frontend which supports
multiversioning using the "target" attribute.  Support for the
"target_version" attribute will be extended to C at a later date.

Targets that currently use the "target" attribute for function
multiversioning (i.e. i386 and rs6000) are not affected by this patch.

Ok for master?

gcc/ChangeLog:

* attribs.cc (decl_attributes): Pass attribute name to target.
(is_function_default_version): Update comment to specify
incompatibility with target_version attributes.
* cgraphclones.cc (cgraph_node::create_version_clone_with_body):
Call valid_version_attribute_p for target_version attributes.
* target.def (valid_version_attribute_p): New hook.
(expanded_clones_attribute): New hook.
* doc/tm.texi.in: Add new hooks.
* doc/tm.texi: Regenerate.
* multiple_target.cc (create_dispatcher_calls): Remove redundant
is_function_default_version check.
(expand_target_clones): Use target hook for attribute name.
* targhooks.cc (default_target_option_valid_version_attribute_p):
New.
* targhooks.h (default_target_option_valid_version_attribute_p):
New.
* tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
target_version attributes.

gcc/c-family/ChangeLog:

* c-attribs.cc (CLONES_USES_TARGET): New macro.
(attr_target_exclusions): Use new macro.
(attr_target_clones_exclusions): Ditto, and add target_version.
(attr_target_version_exclusions): New.
(c_common_attribute_table): Add target_version.
(handle_target_version_attribute): New.

gcc/ada/ChangeLog:

* gcc-interface/utils.cc (CLONES_USES_TARGET): New macro.
(attr_target_exclusions): Use new macro.
(attr_target_clones_exclusions): Ditto.

gcc/d/ChangeLog:

* d-attribs.cc (CLONES_USES_TARGET): New macro.
(attr_target_exclusions): Use new macro.
(attr_target_clones_exclusions): Ditto.

gcc/cp/ChangeLog:

* decl2.cc (check_classfn): Update comment to include
target_version attributes.


diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 
e33a63948cebdeafc3abcdd539a35141969ad978..8850943cb3326568b4679a73405f50487aa1b7c6
 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -143,16 +143,21 @@ static const struct attribute_spec::exclusions 
attr_noinline_exclusions[] =
   { NULL, false, false, false },
 };
 
+#define CLONES_USES_TARGET \
+  (strcmp (targetm.target_option.expanded_clones_attribute, \
+  "target") == 0)
+
 static const struct attribute_spec::exclusions attr_target_exclusions[] =
 {
-  { "target_clones", true, true, true },
+  { "target_clones", CLONES_USES_TARGET, CLONES_USES_TARGET,
+CLONES_USES_TARGET },
   { NULL, false, false, false },
 };
 
 static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
 {
   { "always_inline", true, true, true },
-  { "target", true, true, true },
+  { "target", CLONES_USES_TARGET, CLONES_USES_TARGET, CLONES_USES_TARGET },
   { NULL, false, false, false },
 };
 
diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 
f9fd258598914ce2112ecaaeaad6c63cd69a44e2..27533023ef5c481ba085c2f0c605dfb992987b3e
 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -657,7 +657,8 @@ decl_attributes (tree *node, tree attributes, int flags,
  options to the attribute((target(...))) list.  */
   if (TREE_CODE (*node) == FUNCTION_DECL
   && current_target_pragma
-  && targetm.target_option.valid_attribute_p (*node, NULL_TREE,
+  && targetm.target_option.valid_attribute_p (*node,
+ get_identifier("target"),
  current_target_pragma, 0))
 {
   tree cur_attr = lookup_attribute ("target", attributes);
@@ -1241,8 +1242,9 @@ make_dispatcher_decl (const tree decl)
   return func_decl;  
 }
 
-/* Returns true if decl is multi-versioned and DECL is the default function,
-   that is it is not tagged with target specific optimization.  */
+/* Returns true if DECL is multi-versioned using the target attribute, and this
+   is the default version.  This function can only be used for targets that do
+   not support the "target_version" attribute.  */
 
 bool
 is_function_default_version (const tree decl)
diff --git 

[PATCH v2 3/5] ada: Improve attribute exclusion handling

2023-11-16 Thread Andrew Carlotti
Change the handling of some attribute mutual exclusions to use the
generic attribute exclusion lists, and fix some asymmetric exclusions by
adding the exclusions for always_inline after noinline or target_clones.

Aside from the new always_inline exclusions, the only change is
functionality is the choice of warning message displayed.  All warnings
about attribute mutual exclusions now use the same message.

---

I haven't manged to test the Ada frontend, but this patch (and the following
one) contain only minimal change to functionality, which I have tested by
copying the code to the C++ frontend and verifying the behaviour of equivalent
changes there.  Is this ok to push without further testing?  If not, then could
someone test this series for me?

gcc/ada/ChangeLog:

* gcc-interface/utils.cc (attr_noinline_exclusions): New.
(attr_always_inline_exclusions): Ditto.
(attr_target_exclusions): Ditto.
(attr_target_clones_exclusions): Ditto.
(gnat_internal_attribute_table): Add new exclusion lists.
(handle_noinline_attribute): Remove custom exclusion handling.
(handle_target_attribute): Ditto.
(handle_target_clones_attribute): Ditto.


diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 
8b2c7f99ef3060603658e438b71a3bfa3ef7f2ac..e33a63948cebdeafc3abcdd539a35141969ad978
 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -130,6 +130,32 @@ static const struct attribute_spec::exclusions 
attr_stack_protect_exclusions[] =
   { NULL, false, false, false },
 };
 
+static const struct attribute_spec::exclusions attr_always_inline_exclusions[] 
=
+{
+  { "noinline", true, true, true },
+  { "target_clones", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
+{
+  { "always_inline", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_target_exclusions[] =
+{
+  { "target_clones", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
+{
+  { "always_inline", true, true, true },
+  { "target", true, true, true },
+  { NULL, false, false, false },
+};
+
 /* Fake handler for attributes we don't properly support, typically because
they'd require dragging a lot of the common-c front-end circuitry.  */
 static tree fake_attribute_handler (tree *, tree, tree, int, bool *);
@@ -165,7 +191,7 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "strub",   0, 1, false, true, false, true,
 handle_strub_attribute, NULL },
   { "noinline", 0, 0,  true,  false, false, false,
-handle_noinline_attribute, NULL },
+handle_noinline_attribute, attr_noinline_exclusions },
   { "noclone",  0, 0,  true,  false, false, false,
 handle_noclone_attribute, NULL },
   { "no_icf",   0, 0,  true,  false, false, false,
@@ -175,7 +201,7 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "leaf", 0, 0,  true,  false, false, false,
 handle_leaf_attribute, NULL },
   { "always_inline",0, 0,  true,  false, false, false,
-handle_always_inline_attribute, NULL },
+handle_always_inline_attribute, attr_always_inline_exclusions },
   { "malloc",   0, 0,  true,  false, false, false,
 handle_malloc_attribute, NULL },
   { "type generic", 0, 0,  false, true,  true,  false,
@@ -192,9 +218,9 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "simd", 0, 1,  true,  false, false, false,
 handle_simd_attribute, NULL },
   { "target",   1, -1, true,  false, false, false,
-handle_target_attribute, NULL },
+handle_target_attribute, attr_target_exclusions },
   { "target_clones",1, -1, true,  false, false, false,
-handle_target_clones_attribute, NULL },
+handle_target_clones_attribute, attr_target_clones_exclusions },
 
   { "vector_size",  1, 1,  false, true,  false, false,
 handle_vector_size_attribute, NULL },
@@ -6742,16 +6768,7 @@ handle_noinline_attribute (tree *node, tree name,
   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
-{
-  if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
-   {
- warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with attribute %qs", name, "always_inline");
- *no_add_attrs = true;
-   }
-  else
-   DECL_UNINLINABLE (*node) = 1;
-}
+DECL_UNINLINABLE (*node) = 1;
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -7050,12 +7067,6 @@ handle_target_attribute (tree *node, tree name, tree 
args, int flags,
   warning (OPT_Wattributes, "%qE attribute ignored", name);
   *no_add_attrs = true;
 }
-  else if (lookup_attribute 

[PATCH v2 2/5] c-family: Simplify attribute exclusion handling

2023-11-16 Thread Andrew Carlotti
This patch changes the handling of mutual exclusions involving the
target and target_clones attributes to use the generic attribute
exclusion lists.  Additionally, the duplicate handling for the
always_inline and noinline attribute exclusion is removed.

The only change in functionality is the choice of warning message
displayed - due to either a change in the wording for mutual exclusion
warnings, or a change in the order in which different checks occur.

Ok for master?

gcc/c-family/ChangeLog:

* c-attribs.cc (attr_always_inline_exclusions): New.
(attr_target_exclusions): Ditto.
(attr_target_clones_exclusions): Ditto.
(c_common_attribute_table): Add new exclusion lists.
(handle_noinline_attribute): Remove custom exclusion handling.
(handle_always_inline_attribute): Ditto.
(handle_target_attribute): Ditto.
(handle_target_clones_attribute): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mvc2.C:
* g++.target/i386/mvc3.C:


diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 
461732f60f7c4031cc6692000fbdddb9f726a035..b3b41ef123a0f171f57acb1b7f7fdde716428c00
 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -214,6 +214,13 @@ static const struct attribute_spec::exclusions 
attr_inline_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+static const struct attribute_spec::exclusions attr_always_inline_exclusions[] 
=
+{
+  ATTR_EXCL ("noinline", true, true, true),
+  ATTR_EXCL ("target_clones", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
 static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
 {
   ATTR_EXCL ("always_inline", true, true, true),
@@ -221,6 +228,19 @@ static const struct attribute_spec::exclusions 
attr_noinline_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+static const struct attribute_spec::exclusions attr_target_exclusions[] =
+{
+  ATTR_EXCL ("target_clones", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
+static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
+{
+  ATTR_EXCL ("always_inline", true, true, true),
+  ATTR_EXCL ("target", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
 extern const struct attribute_spec::exclusions attr_noreturn_exclusions[] =
 {
   ATTR_EXCL ("alloc_align", true, true, true),
@@ -332,7 +352,7 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_leaf_attribute, NULL },
   { "always_inline",  0, 0, true,  false, false, false,
  handle_always_inline_attribute,
- attr_inline_exclusions },
+ attr_always_inline_exclusions },
   { "gnu_inline", 0, 0, true,  false, false, false,
  handle_gnu_inline_attribute,
  attr_inline_exclusions },
@@ -483,9 +503,11 @@ const struct attribute_spec c_common_attribute_table[] =
   { "error", 1, 1, true,  false, false, false,
  handle_error_attribute, NULL },
   { "target", 1, -1, true, false, false, false,
- handle_target_attribute, NULL },
+ handle_target_attribute,
+ attr_target_exclusions },
   { "target_clones",  1, -1, true, false, false, false,
- handle_target_clones_attribute, NULL },
+ handle_target_clones_attribute,
+ attr_target_clones_exclusions },
   { "optimize",   1, -1, true, false, false, false,
  handle_optimize_attribute, NULL },
   /* For internal use only.  The leading '*' both prevents its usage in
@@ -1397,16 +1419,7 @@ handle_noinline_attribute (tree *node, tree name,
   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
-{
-  if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
-   {
- warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with attribute %qs", name, "always_inline");
- *no_add_attrs = true;
-   }
-  else
-   DECL_UNINLINABLE (*node) = 1;
-}
+DECL_UNINLINABLE (*node) = 1;
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -1487,22 +1500,9 @@ handle_always_inline_attribute (tree *node, tree name,
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
 {
-  if (lookup_attribute ("noinline", DECL_ATTRIBUTES (*node)))
-   {
- warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with %qs attribute", name, "noinline");
- *no_add_attrs = true;
-   }
-  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))

Re: [PATCH v2[1/5] aarch64: Add cpu feature detection to libgcc

2023-11-16 Thread Andrew Carlotti
This is added to enable function multiversioning, but can also be used
directly.  The interface is chosen to match that used in LLVM's
compiler-rt, to facilitate cross-compiler compatibility.

The content of the patch is derived almost entirely from Pavel's prior
contributions to compiler-rt/lib/builtins/cpu_model.c. I have made minor
changes to align more closely with GCC coding style, and to exclude any code
from other LLVM contributors, and am adding this to GCC with Pavel's approval.

libgcc/ChangeLog:

* config/aarch64/t-aarch64: Include cpuinfo.c
* config/aarch64/cpuinfo.c: New file
(__init_cpu_features_constructor) New.
(__init_cpu_features_resolver) New.
(__init_cpu_features) New.

Co-authored-by: Pavel Iliin 


diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
new file mode 100644
index 
..0888ca4ed058430f524b99cb0e204bd996fa0e55
--- /dev/null
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -0,0 +1,502 @@
+/* CPU feature detection for AArch64 architecture.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+  
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#if defined(__has_include)
+#if __has_include()
+#include 
+
+#if __has_include()
+#include 
+#else
+typedef struct __ifunc_arg_t {
+  unsigned long _size;
+  unsigned long _hwcap;
+  unsigned long _hwcap2;
+} __ifunc_arg_t;
+#endif
+
+#if __has_include()
+#include 
+
+/* CPUFeatures must correspond to the same AArch64 features in aarch64.cc  */
+enum CPUFeatures {
+  FEAT_RNG,
+  FEAT_FLAGM,
+  FEAT_FLAGM2,
+  FEAT_FP16FML,
+  FEAT_DOTPROD,
+  FEAT_SM4,
+  FEAT_RDM,
+  FEAT_LSE,
+  FEAT_FP,
+  FEAT_SIMD,
+  FEAT_CRC,
+  FEAT_SHA1,
+  FEAT_SHA2,
+  FEAT_SHA3,
+  FEAT_AES,
+  FEAT_PMULL,
+  FEAT_FP16,
+  FEAT_DIT,
+  FEAT_DPB,
+  FEAT_DPB2,
+  FEAT_JSCVT,
+  FEAT_FCMA,
+  FEAT_RCPC,
+  FEAT_RCPC2,
+  FEAT_FRINTTS,
+  FEAT_DGH,
+  FEAT_I8MM,
+  FEAT_BF16,
+  FEAT_EBF16,
+  FEAT_RPRES,
+  FEAT_SVE,
+  FEAT_SVE_BF16,
+  FEAT_SVE_EBF16,
+  FEAT_SVE_I8MM,
+  FEAT_SVE_F32MM,
+  FEAT_SVE_F64MM,
+  FEAT_SVE2,
+  FEAT_SVE_AES,
+  FEAT_SVE_PMULL128,
+  FEAT_SVE_BITPERM,
+  FEAT_SVE_SHA3,
+  FEAT_SVE_SM4,
+  FEAT_SME,
+  FEAT_MEMTAG,
+  FEAT_MEMTAG2,
+  FEAT_MEMTAG3,
+  FEAT_SB,
+  FEAT_PREDRES,
+  FEAT_SSBS,
+  FEAT_SSBS2,
+  FEAT_BTI,
+  FEAT_LS64,
+  FEAT_LS64_V,
+  FEAT_LS64_ACCDATA,
+  FEAT_WFXT,
+  FEAT_SME_F64,
+  FEAT_SME_I64,
+  FEAT_SME2,
+  FEAT_RCPC3,
+  FEAT_MAX,
+  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
+   in __aarch64_cpu_features.  */
+  FEAT_INIT  /* Used as flag of features initialization completion.  */
+};
+
+/* Architecture features used in Function Multi Versioning.  */
+struct {
+  unsigned long long features;
+  /* As features grows new fields could be added.  */
+} __aarch64_cpu_features __attribute__((visibility("hidden"), nocommon));
+
+#ifndef _IFUNC_ARG_HWCAP
+#define _IFUNC_ARG_HWCAP (1ULL << 62)
+#endif
+#ifndef AT_HWCAP
+#define AT_HWCAP 16
+#endif
+#ifndef HWCAP_CPUID
+#define HWCAP_CPUID (1 << 11)
+#endif
+#ifndef HWCAP_FP
+#define HWCAP_FP (1 << 0)
+#endif
+#ifndef HWCAP_ASIMD
+#define HWCAP_ASIMD (1 << 1)
+#endif
+#ifndef HWCAP_AES
+#define HWCAP_AES (1 << 3)
+#endif
+#ifndef HWCAP_PMULL
+#define HWCAP_PMULL (1 << 4)
+#endif
+#ifndef HWCAP_SHA1
+#define HWCAP_SHA1 (1 << 5)
+#endif
+#ifndef HWCAP_SHA2
+#define HWCAP_SHA2 (1 << 6)
+#endif
+#ifndef HWCAP_ATOMICS
+#define HWCAP_ATOMICS (1 << 8)
+#endif
+#ifndef HWCAP_FPHP
+#define HWCAP_FPHP (1 << 9)
+#endif
+#ifndef HWCAP_ASIMDHP
+#define HWCAP_ASIMDHP (1 << 10)
+#endif
+#ifndef HWCAP_ASIMDRDM
+#define HWCAP_ASIMDRDM (1 << 12)
+#endif
+#ifndef HWCAP_JSCVT
+#define HWCAP_JSCVT (1 << 13)
+#endif
+#ifndef HWCAP_FCMA
+#define HWCAP_FCMA (1 << 14)
+#endif
+#ifndef HWCAP_LRCPC
+#define HWCAP_LRCPC (1 << 15)
+#endif
+#ifndef HWCAP_DCPOP
+#define HWCAP_DCPOP (1 << 16)
+#endif
+#ifndef HWCAP_SHA3
+#define HWCAP_SHA3 (1 << 17)
+#endif
+#ifndef HWCAP_SM3
+#define HWCAP_SM3 (1 << 18)
+#endif
+#ifndef HWCAP_SM4

[PATCH v2 0/5] target_version and aarch64 function multiversioning

2023-11-16 Thread Andrew Carlotti
This series adds support for function multiversioning on aarch64.

Patch 1/5 is a repost of my copy of Pavel's aarch64 cpu feature detection code
to libgcc. This is slightly refactored in a later patch, but I've preserved
this patch as-is to make the attribution clearer.

Patches 2/5 and 3/5 are minor cleanups in the c-family and Ada attribute
exclusion handling, to support further tweaks to attribute exclusion handling
for c-family, Ada and D in patch 4.

Patch 4/5 adds support for the target_version attribute to the middle end and
C++ frontend, but should otherwise have no functional changes.

Patch 5/5 uses this support to implement function multiversioning in aarch64.

I plan to improve the existing documentation and tests, including covering the
new functionality, in subsequent commits (perhaps after fixing some of the
current ABI issues).

I'm happy with the state of patches 2-4. Patches 1 and 5 have various
outstanding issues, most of which require fixes to the ACLE as well.  It might
be best to push these patches in something like their current form, and then
push incremental fixes once we've agreed on the relevant specification changes.

The series passes regression testing on both x86 and aarch64 for C and C++. I
haven't got an Ada or D compiler on my build machine, so I haven't tested these
languages; however, I tested using the same code and making equivalent changes
in the C++ frontend, to verify their (minimal) impact upon attribute processing
functionality.

Thanks,
Andrew


Re: [2/4] aarch64: Fix tme intrinsic availability

2023-11-10 Thread Andrew Carlotti
On Fri, Nov 10, 2023 at 10:34:29AM +, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > The availability of tme intrinsics was previously gated at both
> > initialisation time (using global target options) and usage time
> > (accounting for function-specific target options).  This patch removes
> > the check at initialisation time, and also moves the intrinsics out of
> > the header file to allow for better error messages (matching the
> > existing error messages for SVE intrinsics).
> >
> > gcc/ChangeLog:
> >
> > PR target/112108
> > * config/aarch64/aarch64-builtins.cc (aarch64_init_tme_builtins):
> > (aarch64_general_init_builtins): Remove feature check.
> > (aarch64_check_general_builtin_call): New.
> > (aarch64_expand_builtin_tme): Check feature availability.
> > * config/aarch64/aarch64-c.cc (aarch64_check_builtin_call): Add
> > check for non-SVE builtins.
> > * config/aarch64/aarch64-protos.h (aarch64_check_general_builtin_call):
> > New prototype.
> > * config/aarch64/arm_acle.h (__tstart, __tcommit, __tcancel)
> > (__ttest): Remove.
> > (_TMFAILURE_*): Define unconditionally.
> 
> My main concern with this is that it makes the functions available
> even without including the header file.  That's fine from a namespace
> pollution PoV, since the functions are in the implementation namespace.
> But it might reduce code portability if GCC allows these ACLE functions
> to be used without including the header file, while other compilers
> require the header file.
> 
> For LS64 we instead used a pragma to trigger the definition of the
> functions (requiring aarch64_general_simulate_builtin rather than
> aarch64_general_add_builtin).  I think it'd be better to do the same here.

Good point - this is also the same as some simd intrinsic stuff I changed last
year.  I'll fix this in an updated patch, which will then also need a slightly
different version for backporting.

> > gcc/testsuite/ChangeLog:
> >
> > PR target/112108
> > * gcc.target/aarch64/acle/tme_guard-1.c: New test.
> > * gcc.target/aarch64/acle/tme_guard-2.c: New test.
> > * gcc.target/aarch64/acle/tme_guard-3.c: New test.
> > * gcc.target/aarch64/acle/tme_guard-4.c: New test.
> >
> >
> > diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> > b/gcc/config/aarch64/aarch64-builtins.cc
> > index 
> > 11a9ba2256f105d8cb9cdc4d6decb5b2be3d69af..ac0259a892e16adb5b241032ac3df1e7ab5370ef
> >  100644
> > --- a/gcc/config/aarch64/aarch64-builtins.cc
> > +++ b/gcc/config/aarch64/aarch64-builtins.cc
> > @@ -1765,19 +1765,19 @@ aarch64_init_tme_builtins (void)
> >  = build_function_type_list (void_type_node, uint64_type_node, NULL);
> >  
> >aarch64_builtin_decls[AARCH64_TME_BUILTIN_TSTART]
> > -= aarch64_general_add_builtin ("__builtin_aarch64_tstart",
> > += aarch64_general_add_builtin ("__tstart",
> >ftype_uint64_void,
> >AARCH64_TME_BUILTIN_TSTART);
> >aarch64_builtin_decls[AARCH64_TME_BUILTIN_TTEST]
> > -= aarch64_general_add_builtin ("__builtin_aarch64_ttest",
> > += aarch64_general_add_builtin ("__ttest",
> >ftype_uint64_void,
> >AARCH64_TME_BUILTIN_TTEST);
> >aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCOMMIT]
> > -= aarch64_general_add_builtin ("__builtin_aarch64_tcommit",
> > += aarch64_general_add_builtin ("__tcommit",
> >ftype_void_void,
> >AARCH64_TME_BUILTIN_TCOMMIT);
> >aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCANCEL]
> > -= aarch64_general_add_builtin ("__builtin_aarch64_tcancel",
> > += aarch64_general_add_builtin ("__tcancel",
> >ftype_void_uint64,
> >AARCH64_TME_BUILTIN_TCANCEL);
> >  }
> > @@ -2034,8 +2034,7 @@ aarch64_general_init_builtins (void)
> >if (!TARGET_ILP32)
> >  aarch64_init_pauth_hint_builtins ();
> >  
> > -  if (TARGET_TME)
> > -aarch64_init_tme_builtins ();
> > +  aarch64_init_tme_builtins ();
> >  
> >if (TARGET_MEMTAG)
> >  aarch64_init_memtag_builtins ();
> > @@ -2137,6 +2136,24 @@ aarch64_check_required_extensions (location_t 
> > location, tree fndecl,
> >gcc_unreachable ();
> >  }
> >  
> > +bool aarch64_check_general_builtin_call (l

aarch64: Add cpu feature detection to libgcc

2023-11-09 Thread Andrew Carlotti
This is added to enable function multiversioning, but can also be used
directly.  The interface is chosen to match that used in LLVM's
compiler-rt, to facilitate cross-compiler compatibility.

The content of the patch is derived almost entirely from Pavel's prior
contributions to compiler-rt/lib/builtins/cpu_model.c. I have made minor
changes to align more closely with GCC coding style, and to exclude any code
from other LLVM contributors, and am adding this to GCC with Pavel's approval.

libgcc/ChangeLog:

* config/aarch64/t-aarch64: Include cpuinfo.c
* config/aarch64/cpuinfo.c: New file
(__init_cpu_features_constructor) New.
(__init_cpu_features_resolver) New.
(__init_cpu_features) New.

Co-authored-by: Pavel Iliin 


diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
new file mode 100644
index 
..9edbd804145a6e35e2420504f53497a26040
--- /dev/null
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -0,0 +1,500 @@
+/* CPU feature detection for AArch64 architecture.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+  
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#if defined(__has_include)
+#if __has_include()
+#include 
+
+#if __has_include()
+#include 
+#else
+typedef struct __ifunc_arg_t {
+  unsigned long _size;
+  unsigned long _hwcap;
+  unsigned long _hwcap2;
+} __ifunc_arg_t;
+#endif
+
+#if __has_include()
+#include 
+
+/* CPUFeatures must correspond to the same AArch64 features in aarch64.cc  */
+enum CPUFeatures {
+  FEAT_RNG,
+  FEAT_FLAGM,
+  FEAT_FLAGM2,
+  FEAT_FP16FML,
+  FEAT_DOTPROD,
+  FEAT_SM4,
+  FEAT_RDM,
+  FEAT_LSE,
+  FEAT_FP,
+  FEAT_SIMD,
+  FEAT_CRC,
+  FEAT_SHA1,
+  FEAT_SHA2,
+  FEAT_SHA3,
+  FEAT_AES,
+  FEAT_PMULL,
+  FEAT_FP16,
+  FEAT_DIT,
+  FEAT_DPB,
+  FEAT_DPB2,
+  FEAT_JSCVT,
+  FEAT_FCMA,
+  FEAT_RCPC,
+  FEAT_RCPC2,
+  FEAT_FRINTTS,
+  FEAT_DGH,
+  FEAT_I8MM,
+  FEAT_BF16,
+  FEAT_EBF16,
+  FEAT_RPRES,
+  FEAT_SVE,
+  FEAT_SVE_BF16,
+  FEAT_SVE_EBF16,
+  FEAT_SVE_I8MM,
+  FEAT_SVE_F32MM,
+  FEAT_SVE_F64MM,
+  FEAT_SVE2,
+  FEAT_SVE_AES,
+  FEAT_SVE_PMULL128,
+  FEAT_SVE_BITPERM,
+  FEAT_SVE_SHA3,
+  FEAT_SVE_SM4,
+  FEAT_SME,
+  FEAT_MEMTAG,
+  FEAT_MEMTAG2,
+  FEAT_MEMTAG3,
+  FEAT_SB,
+  FEAT_PREDRES,
+  FEAT_SSBS,
+  FEAT_SSBS2,
+  FEAT_BTI,
+  FEAT_LS64,
+  FEAT_LS64_V,
+  FEAT_LS64_ACCDATA,
+  FEAT_WFXT,
+  FEAT_SME_F64,
+  FEAT_SME_I64,
+  FEAT_SME2,
+  FEAT_RCPC3,
+  FEAT_MAX,
+  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
+   in __aarch64_cpu_features.  */
+  FEAT_INIT  /* Used as flag of features initialization completion.  */
+};
+
+/* Architecture features used in Function Multi Versioning.  */
+struct {
+  unsigned long long features;
+  /* As features grows new fields could be added.  */
+} __aarch64_cpu_features __attribute__((visibility("hidden"), nocommon));
+
+#ifndef _IFUNC_ARG_HWCAP
+#define _IFUNC_ARG_HWCAP (1ULL << 62)
+#endif
+#ifndef AT_HWCAP
+#define AT_HWCAP 16
+#endif
+#ifndef HWCAP_CPUID
+#define HWCAP_CPUID (1 << 11)
+#endif
+#ifndef HWCAP_FP
+#define HWCAP_FP (1 << 0)
+#endif
+#ifndef HWCAP_ASIMD
+#define HWCAP_ASIMD (1 << 1)
+#endif
+#ifndef HWCAP_AES
+#define HWCAP_AES (1 << 3)
+#endif
+#ifndef HWCAP_PMULL
+#define HWCAP_PMULL (1 << 4)
+#endif
+#ifndef HWCAP_SHA1
+#define HWCAP_SHA1 (1 << 5)
+#endif
+#ifndef HWCAP_SHA2
+#define HWCAP_SHA2 (1 << 6)
+#endif
+#ifndef HWCAP_ATOMICS
+#define HWCAP_ATOMICS (1 << 8)
+#endif
+#ifndef HWCAP_FPHP
+#define HWCAP_FPHP (1 << 9)
+#endif
+#ifndef HWCAP_ASIMDHP
+#define HWCAP_ASIMDHP (1 << 10)
+#endif
+#ifndef HWCAP_ASIMDRDM
+#define HWCAP_ASIMDRDM (1 << 12)
+#endif
+#ifndef HWCAP_JSCVT
+#define HWCAP_JSCVT (1 << 13)
+#endif
+#ifndef HWCAP_FCMA
+#define HWCAP_FCMA (1 << 14)
+#endif
+#ifndef HWCAP_LRCPC
+#define HWCAP_LRCPC (1 << 15)
+#endif
+#ifndef HWCAP_DCPOP
+#define HWCAP_DCPOP (1 << 16)
+#endif
+#ifndef HWCAP_SHA3
+#define HWCAP_SHA3 (1 << 17)
+#endif
+#ifndef HWCAP_SM3
+#define HWCAP_SM3 (1 << 18)
+#endif
+#ifndef HWCAP_SM4
+#define HWCAP_SM4 (1 << 19)

[4/4] aarch64: Fix ls64 intrinsic availability

2023-11-09 Thread Andrew Carlotti
The availability of ls64 intrinsics and data types were determined
solely by the globally specified architecture features, which did not
reflect any changes specified in target pragmas or attributes.

This patch removes the initialisation-time guards for the intrinsics,
and replaces them with checks at use time. We also get better error
messages when ls64 is not available (matching the existing error
messages for SVE intrinsics).

The data512_t type is made always available; this is consistent with the
present behaviour for Neon fp16/bf16 types.

gcc/ChangeLog:

PR target/112108
* config/aarch64/aarch64-builtins.cc (handle_arm_acle_h): Remove
feature check at initialisation.
(aarch64_check_general_builtin_call): Check ls64 intrinsics.
(aarch64_expand_builtin_ls64): Add feature check.
* config/aarch64/arm_acle.h: (data512_t) Make always available.

gcc/testsuite/ChangeLog:

PR target/112108
* gcc.target/aarch64/acle/ls64_guard-1.c: New test.
* gcc.target/aarch64/acle/ls64_guard-2.c: New test.
* gcc.target/aarch64/acle/ls64_guard-3.c: New test.
* gcc.target/aarch64/acle/ls64_guard-4.c: New test.


diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 
503d8ad98d7de959d8c7c78cef575d29e2132f78..9fd0d5c362815c25793bc04a1d82e32bd30bbc22
 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1943,8 +1943,7 @@ aarch64_init_data_intrinsics (void)
 void
 handle_arm_acle_h (void)
 {
-  if (TARGET_LS64)
-aarch64_init_ls64_builtins ();
+  aarch64_init_ls64_builtins ();
 }
 
 /* Initialize fpsr fpcr getters and setters.  */
@@ -2148,6 +2147,13 @@ bool aarch64_check_general_builtin_call (location_t 
location,
return aarch64_check_required_extensions (location, fndecl,
  AARCH64_FL_TME, false);
 
+  case AARCH64_LS64_BUILTIN_LD64B:
+  case AARCH64_LS64_BUILTIN_ST64B:
+  case AARCH64_LS64_BUILTIN_ST64BV:
+  case AARCH64_LS64_BUILTIN_ST64BV0:
+   return aarch64_check_required_extensions (location, fndecl,
+ AARCH64_FL_LS64, false);
+
   default:
break;
 }
@@ -2630,6 +2636,11 @@ aarch64_expand_builtin_ls64 (int fcode, tree exp, rtx 
target)
 {
   expand_operand ops[3];
 
+  tree fndecl = aarch64_builtin_decls[fcode];
+  if (!aarch64_check_required_extensions (EXPR_LOCATION (exp), fndecl,
+ AARCH64_FL_LS64, false))
+return target;
+
   switch (fcode)
 {
 case AARCH64_LS64_BUILTIN_LD64B:
diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 
57f16603d22cec81002b00b94afe1201c83b4b94..e7aae7e5278691508086e6438b57b8a6fb6df554
 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -235,9 +235,7 @@ __crc32d (uint32_t __a, uint64_t __b)
 #define _TMFAILURE_INT0x0080u
 #define _TMFAILURE_TRIVIAL0x0100u
 
-#ifdef __ARM_FEATURE_LS64
 typedef __arm_data512_t data512_t;
-#endif
 
 #pragma GCC push_options
 #pragma GCC target ("+nothing+rng")
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-1.c 
b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-1.c
new file mode 100644
index 
..7dfc193a2934c994220280990316027c07e75ac4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8.6-a" } */
+
+#include 
+
+data512_t foo (void * p)
+{
+  return __arm_ld64b (p); /* { dg-error {ACLE function '__arm_ld64b' requires 
ISA extension 'ls64'} } */
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-2.c 
b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-2.c
new file mode 100644
index 
..3ede05a81f026f8606ee2c9cd56f15ce45caa1c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8.6-a" } */
+
+#include 
+
+#pragma GCC target("arch=armv8-a+ls64")
+data512_t foo (void * p)
+{
+  return __arm_ld64b (p);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-3.c 
b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-3.c
new file mode 100644
index 
..e0fccdad7bec4aa522fb709d010289fd02f91d05
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-3.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8-a+ls64 -mgeneral-regs-only" } */
+
+#include 
+
+data512_t foo (void * p)
+{
+  return __arm_ld64b (p);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-4.c 
b/gcc/testsuite/gcc.target/aarch64/acle/ls64_guard-4.c
new file mode 100644
index 

[3/4] aarch64: Fix memtag intrinsic availability

2023-11-09 Thread Andrew Carlotti
The availability of memtag intrinsics and data types were determined
solely by the globally specified architecture features, which did not
reflect any changes specified in target pragmas or attributes.

This patch removes the initialisation-time guards for the intrinsics,
and replaces them with checks at use time. It also removes the macro
indirection from the header file - this simplifies the header, and
allows the missing extension error reporting to find the user-facing
intrinsic names.

gcc/ChangeLog:

PR target/112108
* config/aarch64/aarch64-builtins.cc (aarch64_init_memtag_builtins):
Replace internal builtin names with intrinsic names.
(aarch64_general_init_builtins): Remove feature check.
(aarch64_check_general_builtin_call): Check memtag intrinsics.
(aarch64_expand_builtin_memtag): Add feature check.
* config/aarch64/arm_acle.h (__arm_mte_create_random_tag)
(__arm_mte_exclude_tag, __arm_mte_ptrdiff)
(__arm_mte_increment_tag, __arm_mte_set_tag, __arm_mte_get_tag):
Remove.

gcc/testsuite/ChangeLog:

PR target/112108
* gcc.target/aarch64/acle/memtag_guard-1.c: New test.
* gcc.target/aarch64/acle/memtag_guard-2.c: New test.
* gcc.target/aarch64/acle/memtag_guard-3.c: New test.
* gcc.target/aarch64/acle/memtag_guard-4.c: New test.


diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 
ac0259a892e16adb5b241032ac3df1e7ab5370ef..503d8ad98d7de959d8c7c78cef575d29e2132f78
 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1813,7 +1813,7 @@ aarch64_init_memtag_builtins (void)
 
 #define AARCH64_INIT_MEMTAG_BUILTINS_DECL(F, N, I, T) \
   aarch64_builtin_decls[AARCH64_MEMTAG_BUILTIN_##F] \
-= aarch64_general_add_builtin ("__builtin_aarch64_memtag_"#N, \
+= aarch64_general_add_builtin ("__arm_mte_"#N, \
   T, AARCH64_MEMTAG_BUILTIN_##F); \
   aarch64_memtag_builtin_data[AARCH64_MEMTAG_BUILTIN_##F - \
  AARCH64_MEMTAG_BUILTIN_START - 1] = \
@@ -1821,19 +1821,19 @@ aarch64_init_memtag_builtins (void)
 
   fntype = build_function_type_list (ptr_type_node, ptr_type_node,
 uint64_type_node, NULL);
-  AARCH64_INIT_MEMTAG_BUILTINS_DECL (IRG, irg, irg, fntype);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (IRG, create_random_tag, irg, fntype);
 
   fntype = build_function_type_list (uint64_type_node, ptr_type_node,
 uint64_type_node, NULL);
-  AARCH64_INIT_MEMTAG_BUILTINS_DECL (GMI, gmi, gmi, fntype);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (GMI, exclude_tag, gmi, fntype);
 
   fntype = build_function_type_list (ptrdiff_type_node, ptr_type_node,
 ptr_type_node, NULL);
-  AARCH64_INIT_MEMTAG_BUILTINS_DECL (SUBP, subp, subp, fntype);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (SUBP, ptrdiff, subp, fntype);
 
   fntype = build_function_type_list (ptr_type_node, ptr_type_node,
 unsigned_type_node, NULL);
-  AARCH64_INIT_MEMTAG_BUILTINS_DECL (INC_TAG, inc_tag, addg, fntype);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (INC_TAG, increment_tag, addg, fntype);
 
   fntype = build_function_type_list (void_type_node, ptr_type_node, NULL);
   AARCH64_INIT_MEMTAG_BUILTINS_DECL (SET_TAG, set_tag, stg, fntype);
@@ -2036,8 +2036,7 @@ aarch64_general_init_builtins (void)
 
   aarch64_init_tme_builtins ();
 
-  if (TARGET_MEMTAG)
-aarch64_init_memtag_builtins ();
+  aarch64_init_memtag_builtins ();
 
   if (in_lto_p)
 handle_arm_acle_h ();
@@ -2152,6 +2151,12 @@ bool aarch64_check_general_builtin_call (location_t 
location,
   default:
break;
 }
+
+  if (fcode >= AARCH64_MEMTAG_BUILTIN_START
+  && fcode <= AARCH64_MEMTAG_BUILTIN_END)
+   return aarch64_check_required_extensions (location, fndecl,
+ AARCH64_FL_MEMTAG, false);
+
   return true;
 }
 
@@ -2716,6 +2721,11 @@ aarch64_expand_builtin_memtag (int fcode, tree exp, rtx 
target)
   return const0_rtx;
 }
 
+  tree fndecl = aarch64_builtin_decls[fcode];
+  if (!aarch64_check_required_extensions (EXPR_LOCATION (exp), fndecl,
+ AARCH64_FL_MEMTAG, false))
+return target;
+
   rtx pat = NULL;
   enum insn_code icode = aarch64_memtag_builtin_data[fcode -
   AARCH64_MEMTAG_BUILTIN_START - 1].icode;
diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 
f4e35d1e12ac9bbcc4f1b75d8e5baad62f8634a0..57f16603d22cec81002b00b94afe1201c83b4b94
 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -257,29 +257,6 @@ __rndrrs (uint64_t *__res)
 
 #pragma GCC pop_options
 
-#pragma GCC push_options
-#pragma GCC target ("+nothing+memtag")
-
-#define __arm_mte_create_random_tag(__ptr, __u64_mask) 

[2/4] aarch64: Fix tme intrinsic availability

2023-11-09 Thread Andrew Carlotti
The availability of tme intrinsics was previously gated at both
initialisation time (using global target options) and usage time
(accounting for function-specific target options).  This patch removes
the check at initialisation time, and also moves the intrinsics out of
the header file to allow for better error messages (matching the
existing error messages for SVE intrinsics).

gcc/ChangeLog:

PR target/112108
* config/aarch64/aarch64-builtins.cc (aarch64_init_tme_builtins):
(aarch64_general_init_builtins): Remove feature check.
(aarch64_check_general_builtin_call): New.
(aarch64_expand_builtin_tme): Check feature availability.
* config/aarch64/aarch64-c.cc (aarch64_check_builtin_call): Add
check for non-SVE builtins.
* config/aarch64/aarch64-protos.h (aarch64_check_general_builtin_call):
New prototype.
* config/aarch64/arm_acle.h (__tstart, __tcommit, __tcancel)
(__ttest): Remove.
(_TMFAILURE_*): Define unconditionally.

gcc/testsuite/ChangeLog:

PR target/112108
* gcc.target/aarch64/acle/tme_guard-1.c: New test.
* gcc.target/aarch64/acle/tme_guard-2.c: New test.
* gcc.target/aarch64/acle/tme_guard-3.c: New test.
* gcc.target/aarch64/acle/tme_guard-4.c: New test.


diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 
11a9ba2256f105d8cb9cdc4d6decb5b2be3d69af..ac0259a892e16adb5b241032ac3df1e7ab5370ef
 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1765,19 +1765,19 @@ aarch64_init_tme_builtins (void)
 = build_function_type_list (void_type_node, uint64_type_node, NULL);
 
   aarch64_builtin_decls[AARCH64_TME_BUILTIN_TSTART]
-= aarch64_general_add_builtin ("__builtin_aarch64_tstart",
+= aarch64_general_add_builtin ("__tstart",
   ftype_uint64_void,
   AARCH64_TME_BUILTIN_TSTART);
   aarch64_builtin_decls[AARCH64_TME_BUILTIN_TTEST]
-= aarch64_general_add_builtin ("__builtin_aarch64_ttest",
+= aarch64_general_add_builtin ("__ttest",
   ftype_uint64_void,
   AARCH64_TME_BUILTIN_TTEST);
   aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCOMMIT]
-= aarch64_general_add_builtin ("__builtin_aarch64_tcommit",
+= aarch64_general_add_builtin ("__tcommit",
   ftype_void_void,
   AARCH64_TME_BUILTIN_TCOMMIT);
   aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCANCEL]
-= aarch64_general_add_builtin ("__builtin_aarch64_tcancel",
+= aarch64_general_add_builtin ("__tcancel",
   ftype_void_uint64,
   AARCH64_TME_BUILTIN_TCANCEL);
 }
@@ -2034,8 +2034,7 @@ aarch64_general_init_builtins (void)
   if (!TARGET_ILP32)
 aarch64_init_pauth_hint_builtins ();
 
-  if (TARGET_TME)
-aarch64_init_tme_builtins ();
+  aarch64_init_tme_builtins ();
 
   if (TARGET_MEMTAG)
 aarch64_init_memtag_builtins ();
@@ -2137,6 +2136,24 @@ aarch64_check_required_extensions (location_t location, 
tree fndecl,
   gcc_unreachable ();
 }
 
+bool aarch64_check_general_builtin_call (location_t location,
+unsigned int fcode)
+{
+  tree fndecl = aarch64_builtin_decls[fcode];
+  switch (fcode)
+{
+  case AARCH64_TME_BUILTIN_TSTART:
+  case AARCH64_TME_BUILTIN_TCOMMIT:
+  case AARCH64_TME_BUILTIN_TTEST:
+  case AARCH64_TME_BUILTIN_TCANCEL:
+   return aarch64_check_required_extensions (location, fndecl,
+ AARCH64_FL_TME, false);
+
+  default:
+   break;
+}
+  return true;
+}
 
 typedef enum
 {
@@ -2559,6 +2576,11 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int 
fcode)
 static rtx
 aarch64_expand_builtin_tme (int fcode, tree exp, rtx target)
 {
+  tree fndecl = aarch64_builtin_decls[fcode];
+  if (!aarch64_check_required_extensions (EXPR_LOCATION (exp), fndecl,
+ AARCH64_FL_TME, false))
+return target;
+
   switch (fcode)
 {
 case AARCH64_TME_BUILTIN_TSTART:
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
ab8844f6049dc95b97648b651bfcd3a4ccd3ca0b..6b6bd77e9e66cd2d9a211387e07d3e20d935fb1a
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -339,7 +339,7 @@ aarch64_check_builtin_call (location_t loc, vec 
arg_loc,
   switch (code & AARCH64_BUILTIN_CLASS)
 {
 case AARCH64_BUILTIN_GENERAL:
-  return true;
+  return aarch64_check_general_builtin_call (loc, subcode);
 
 case AARCH64_BUILTIN_SVE:
   return aarch64_sve::check_builtin_call (loc, arg_loc, subcode,
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 

[1/4] aarch64: Refactor check_required_extensions

2023-11-09 Thread Andrew Carlotti
Move SVE extension checking functionality to aarch64-builtins.cc, so
that it can be shared by non-SVE intrinsics.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins.cc (check_builtin_call)
(expand_builtin): Update calls to the below.
(report_missing_extension, check_required_registers)
(check_required_extensions): Move out of aarch64_sve namespace,
rename, and move into...
* config/aarch64/aarch64-builtins.cc (aarch64_report_missing_extension)
(aarch64_check_non_general_registers)
(aarch64_check_required_extensions) ...here.
* config/aarch64/aarch64-protos.h (aarch64_check_required_extensions):
Add prototype.


diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 
04f59fd9a54306d6422b03e32dce79bc00aed4f8..11a9ba2256f105d8cb9cdc4d6decb5b2be3d69af
 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2054,6 +2054,90 @@ aarch64_general_builtin_decl (unsigned code, bool)
   return aarch64_builtin_decls[code];
 }
 
+/* True if we've already complained about attempts to use functions
+   when the required extension is disabled.  */
+static bool reported_missing_extension_p;
+
+/* True if we've already complained about attempts to use functions
+   which require registers that are missing.  */
+static bool reported_missing_registers_p;
+
+/* Report an error against LOCATION that the user has tried to use
+   function FNDECL when extension EXTENSION is disabled.  */
+static void
+aarch64_report_missing_extension (location_t location, tree fndecl,
+ const char *extension)
+{
+  /* Avoid reporting a slew of messages for a single oversight.  */
+  if (reported_missing_extension_p)
+return;
+
+  error_at (location, "ACLE function %qD requires ISA extension %qs",
+   fndecl, extension);
+  inform (location, "you can enable %qs using the command-line"
+ " option %<-march%>, or by using the %"
+ " attribute or pragma", extension);
+  reported_missing_extension_p = true;
+}
+
+/* Check whether non-general registers required by ACLE function fndecl are
+ * available.  Report an error against LOCATION and return false if not.  */
+static bool
+aarch64_check_non_general_registers (location_t location, tree fndecl)
+{
+  /* Avoid reporting a slew of messages for a single oversight.  */
+  if (reported_missing_registers_p)
+return false;
+
+  if (TARGET_GENERAL_REGS_ONLY)
+{
+  /* FP/SIMD/SVE registers are not usable when -mgeneral-regs-only option
+is specified.  */
+  error_at (location,
+   "ACLE function %qD is incompatible with the use of %qs",
+   fndecl, "-mgeneral-regs-only");
+  reported_missing_registers_p = true;
+  return false;
+}
+
+  return true;
+}
+
+/* Check whether all the AARCH64_FL_* values in REQUIRED_EXTENSIONS are
+   enabled, given that those extensions are required for function FNDECL.
+   Report an error against LOCATION if not.
+   If REQUIRES_NON_GENERAL_REGISTERS is true, then also check whether
+   non-general registers are available.  */
+bool
+aarch64_check_required_extensions (location_t location, tree fndecl,
+  aarch64_feature_flags required_extensions,
+  bool requires_non_general_registers)
+{
+  auto missing_extensions = required_extensions & ~aarch64_asm_isa_flags;
+  if (missing_extensions == 0)
+return requires_non_general_registers
+  ? aarch64_check_non_general_registers (location, fndecl)
+  : true;
+
+  static const struct {
+aarch64_feature_flags flag;
+const char *name;
+  } extensions[] = {
+#define AARCH64_OPT_EXTENSION(EXT_NAME, IDENT, C, D, E, F) \
+{ AARCH64_FL_##IDENT, EXT_NAME },
+#include "aarch64-option-extensions.def"
+  };
+
+  for (unsigned int i = 0; i < ARRAY_SIZE (extensions); ++i)
+if (missing_extensions & extensions[i].flag)
+  {
+   aarch64_report_missing_extension (location, fndecl, extensions[i].name);
+   return false;
+  }
+  gcc_unreachable ();
+}
+
+
 typedef enum
 {
   SIMD_ARG_COPY_TO_REG,
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
60a55f4bc1956786ea687fc7cad7ec9e4a84e1f0..30726140a6945dcb86b787f8f47952810d99379f
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -988,6 +988,9 @@ tree aarch64_general_builtin_rsqrt (unsigned int);
 void handle_arm_acle_h (void);
 void handle_arm_neon_h (void);
 
+bool aarch64_check_required_extensions (location_t, tree,
+   aarch64_feature_flags, bool = true);
+
 namespace aarch64_sve {
   void init_builtins ();
   void handle_arm_sve_h ();
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 

[0/4] aarch64: Fix intrinsic availability [PR112108]

2023-11-09 Thread Andrew Carlotti
This series of patches fixes issues with some intrinsics being incorrectly
gated by global target options, instead of just using function-specific target
options.  These issues have been present since the +tme, +memtag and +ls64
intrinsics were introduced.

Bootstrapped and regression tested on aarch64.  Ok to merge?

Also, ok for backports to all open affected versions (with regression tests)?
I believe the first three patches will apply cleanly back to GCC 11.

The ls64 intrinsics were only added in GCC 12, and have recently had their
implementation changed, so I'll send a separate backport patch for approval
once this series is merged.


Re: [1/3] Add support for target_version attribute

2023-11-03 Thread Andrew Carlotti
On Thu, Oct 26, 2023 at 07:41:09PM +0100, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > This patch adds support for the "target_version" attribute to the middle
> > end and the C++ frontend, which will be used to implement function
> > multiversioning in the aarch64 backend.
> >
> > Note that C++ is currently the only frontend which supports
> > multiversioning using the "target" attribute, whereas the
> > "target_clones" attribute is additionally supported in C, D and Ada.
> > Support for the target_version attribute will be extended to C at a
> > later date.
> >
> > Targets that currently use the "target" attribute for function
> > multiversioning (i.e. i386 and rs6000) are not affected by this patch.
> >
> >
> > I could have implemented the target hooks slightly differently, by reusing 
> > the
> > valid_attribute_p hook and adding attribute name checks to each backend
> > implementation (c.f. the aarch64 implementation in patch 2/3).  Would this 
> > be
> > preferable?
> 
> Having as much as possible in target-independent code seems better
> to me FWIW.  On that basis:
> 
> >
> > Otherwise, is this ok for master?
> >
> >
> > gcc/c-family/ChangeLog:
> >
> > * c-attribs.cc (handle_target_version_attribute): New.
> > (c_common_attribute_table): Add target_version.
> > (handle_target_clones_attribute): Add conflict with
> > target_version attribute.
> >
> > gcc/ChangeLog:
> >
> > * attribs.cc (is_function_default_version): Update comment to
> > specify incompatibility with target_version attributes.
> > * cgraphclones.cc (cgraph_node::create_version_clone_with_body):
> > Call valid_version_attribute_p for target_version attributes.
> > * target.def (valid_version_attribute_p): New hook.
> > (expanded_clones_attribute): New hook.
> > * doc/tm.texi.in: Add new hooks.
> > * doc/tm.texi: Regenerate.
> > * multiple_target.cc (create_dispatcher_calls): Remove redundant
> > is_function_default_version check.
> > (expand_target_clones): Use target hook for attribute name.
> > * targhooks.cc (default_target_option_valid_version_attribute_p):
> > New.
> > * targhooks.h (default_target_option_valid_version_attribute_p):
> > New.
> > * tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
> > target_version attributes.
> >
> > gcc/cp/ChangeLog:
> >
> > * decl2.cc (check_classfn): Update comment to include
> > target_version attributes.
> >
> >
> > diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> > index 
> > b1300018d1e8ed8e02ded1ea721dc192a6d32a49..a3c4a81e8582ea4fd06b9518bf51fad7c998ddd6
> >  100644
> > --- a/gcc/attribs.cc
> > +++ b/gcc/attribs.cc
> > @@ -1233,8 +1233,9 @@ make_dispatcher_decl (const tree decl)
> >return func_decl;  
> >  }
> >  
> > -/* Returns true if decl is multi-versioned and DECL is the default 
> > function,
> > -   that is it is not tagged with target specific optimization.  */
> > +/* Returns true if DECL is multi-versioned using the target attribute, and 
> > this
> > +   is the default version.  This function can only be used for targets 
> > that do
> > +   not support the "target_version" attribute.  */
> >  
> >  bool
> >  is_function_default_version (const tree decl)
> > diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> > index 
> > 072cfb69147bd6b314459c0bd48a0c1fb92d3e4d..1a224c036277d51ab4dc0d33a403177bd226e48a
> >  100644
> > --- a/gcc/c-family/c-attribs.cc
> > +++ b/gcc/c-family/c-attribs.cc
> > @@ -148,6 +148,7 @@ static tree handle_alloc_align_attribute (tree *, tree, 
> > tree, int, bool *);
> >  static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool 
> > *);
> >  static tree handle_assume_attribute (tree *, tree, tree, int, bool *);
> >  static tree handle_target_attribute (tree *, tree, tree, int, bool *);
> > +static tree handle_target_version_attribute (tree *, tree, tree, int, bool 
> > *);
> >  static tree handle_target_clones_attribute (tree *, tree, tree, int, bool 
> > *);
> >  static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> >  static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > @@ -480,6 +481,8 @@ const struct attribute_spec c_common_attribute_table[] =
> >   handle_error_attribute, NULL },
> > 

Re: [1/3] Add support for target_version attribute

2023-10-19 Thread Andrew Carlotti
On Thu, Oct 19, 2023 at 07:04:09AM +, Richard Biener wrote:
> On Wed, 18 Oct 2023, Andrew Carlotti wrote:
> 
> > This patch adds support for the "target_version" attribute to the middle
> > end and the C++ frontend, which will be used to implement function
> > multiversioning in the aarch64 backend.
> > 
> > Note that C++ is currently the only frontend which supports
> > multiversioning using the "target" attribute, whereas the
> > "target_clones" attribute is additionally supported in C, D and Ada.
> > Support for the target_version attribute will be extended to C at a
> > later date.
> > 
> > Targets that currently use the "target" attribute for function
> > multiversioning (i.e. i386 and rs6000) are not affected by this patch.
> > 
> > 
> > I could have implemented the target hooks slightly differently, by reusing 
> > the
> > valid_attribute_p hook and adding attribute name checks to each backend
> > implementation (c.f. the aarch64 implementation in patch 2/3).  Would this 
> > be
> > preferable?
> > 
> > Otherwise, is this ok for master?
> 
> This lacks user-level documentation in doc/extend.texi (where
> target_clones is documented).

Good point.  I'll add documentation updates as a separate patch in the series
(rather than documenting the state after this patch, in which the attribute is
supported on zero targets).  I think the existing documentation for target and
target_clones needs some improvement as well.

> Was there any discussion/description of why target_clones cannot
> be made work for aarch64?
> 
> Richard.

The second patch in this series does include support for target_clones on
aarch64.  However, the support in that patch is not fully compliant with our
ACLE specification.  I also have some unresolved questions about the
correctness of current function multiversioning implementations using ifuncs
across translation units, which could affect how we want to implement it for
aarch64.

Andrew

> > 
> > gcc/c-family/ChangeLog:
> > 
> > * c-attribs.cc (handle_target_version_attribute): New.
> > (c_common_attribute_table): Add target_version.
> > (handle_target_clones_attribute): Add conflict with
> > target_version attribute.
> > 
> > gcc/ChangeLog:
> > 
> > * attribs.cc (is_function_default_version): Update comment to
> > specify incompatibility with target_version attributes.
> > * cgraphclones.cc (cgraph_node::create_version_clone_with_body):
> > Call valid_version_attribute_p for target_version attributes.
> > * target.def (valid_version_attribute_p): New hook.
> > (expanded_clones_attribute): New hook.
> > * doc/tm.texi.in: Add new hooks.
> > * doc/tm.texi: Regenerate.
> > * multiple_target.cc (create_dispatcher_calls): Remove redundant
> > is_function_default_version check.
> > (expand_target_clones): Use target hook for attribute name.
> > * targhooks.cc (default_target_option_valid_version_attribute_p):
> > New.
> > * targhooks.h (default_target_option_valid_version_attribute_p):
> > New.
> > * tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
> > target_version attributes.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * decl2.cc (check_classfn): Update comment to include
> > target_version attributes.
> > 
> > 
> > diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> > index 
> > b1300018d1e8ed8e02ded1ea721dc192a6d32a49..a3c4a81e8582ea4fd06b9518bf51fad7c998ddd6
> >  100644
> > --- a/gcc/attribs.cc
> > +++ b/gcc/attribs.cc
> > @@ -1233,8 +1233,9 @@ make_dispatcher_decl (const tree decl)
> >return func_decl;  
> >  }
> >  
> > -/* Returns true if decl is multi-versioned and DECL is the default 
> > function,
> > -   that is it is not tagged with target specific optimization.  */
> > +/* Returns true if DECL is multi-versioned using the target attribute, and 
> > this
> > +   is the default version.  This function can only be used for targets 
> > that do
> > +   not support the "target_version" attribute.  */
> >  
> >  bool
> >  is_function_default_version (const tree decl)
> > diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> > index 
> > 072cfb69147bd6b314459c0bd48a0c1fb92d3e4d..1a224c036277d51ab4dc0d33a403177bd226e48a
> >  100644
> > --- a/gcc/c-family/c-attribs.cc
> > +++ b/gcc/c-family/c-attribs.cc
> > @@ -148,6 +148,7 @@ static tree handle_alloc_align_attribute (tree *, tree, 
>

[3/3] WIP/RFC: Fix name mangling for target_clones

2023-10-18 Thread Andrew Carlotti
This is a partial patch to make the mangling of function version names
for target_clones match those generated using the target or
target_version attributes.  It modifies the name of function versions,
but does not yet rename the resolved symbol, resulting in a duplicate
symbol name (and an error at assembly time).


Is this sort of approach ok?  Should I create an extra target hook to be called
here, so that the target_clones mangling can be target-specific but not
necessarily the same as for target attribute versioning?


diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
index 
8af6b23d8c0306920e0fdcb3559ef047a16689f4..15672c02c6f9d6043a36bf081067f08d1ab834e5
 100644
--- a/gcc/cgraphclones.cc
+++ b/gcc/cgraphclones.cc
@@ -1033,11 +1033,6 @@ cgraph_node::create_version_clone_with_body
   else
 new_decl = copy_node (old_decl);
 
-  /* Generate a new name for the new version. */
-  tree fnname = (version_decl ? clone_function_name_numbered (old_decl, suffix)
-   : clone_function_name (old_decl, suffix));
-  DECL_NAME (new_decl) = fnname;
-  SET_DECL_ASSEMBLER_NAME (new_decl, fnname);
   SET_DECL_RTL (new_decl, NULL);
 
   DECL_VIRTUAL_P (new_decl) = 0;
@@ -1065,6 +1060,24 @@ cgraph_node::create_version_clone_with_body
return NULL;
 }
 
+  /* Generate a new name for the new version. */
+  if (version_decl)
+{
+  tree fnname = (clone_function_name_numbered (old_decl, suffix));
+  DECL_NAME (new_decl) = fnname;
+  SET_DECL_ASSEMBLER_NAME (new_decl, fnname);
+}
+  else
+{
+  /* Add target version mangling.  We assume that the target hook will
+produce the same mangled name as it would have produced if the decl
+had already been versioned when the hook was previously called.  */
+  tree fnname = DECL_ASSEMBLER_NAME (old_decl);
+  DECL_NAME (new_decl) = fnname;
+  fnname = targetm.mangle_decl_assembler_name (new_decl, fnname);
+  SET_DECL_ASSEMBLER_NAME (new_decl, fnname);
+}
+
   /* When the old decl was a con-/destructor make sure the clone isn't.  */
   DECL_STATIC_CONSTRUCTOR (new_decl) = 0;
   DECL_STATIC_DESTRUCTOR (new_decl) = 0;
diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index 
3db57c2b13d612a37240d9dcf58ad21b2286633c..d9aec9a5ab532701b4a1877b440f3a553ffa28e2
 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -162,7 +162,12 @@ create_dispatcher_calls (struct cgraph_node *node)
}
 }
 
-  tree fname = clone_function_name (node->decl, "default");
+  /* Add version mangling to default decl name.  We assume that the target
+ hook will produce the same mangled name as it would have produced if the
+ decl had already been versioned when the hook was previously called.  */
+  tree fname = DECL_ASSEMBLER_NAME (node->decl);
+  DECL_NAME (node->decl) = fname;
+  fname = targetm.mangle_decl_assembler_name (node->decl, fname);
   symtab->change_decl_assembler_name (node->decl, fname);
 
   if (node->definition)


[2/3] [aarch64] Add function multiversioning support

2023-10-18 Thread Andrew Carlotti
This adds initial support for function multiversion on aarch64 using the
target_version and target_clones attributes. This mostly follows the
Beta specification in the ACLE [1], with a few diffences that remain to
be fixed:

- Symbol mangling for target_clones differs from that for target_version
  and does not match the mangling specified in the ACLE. This
  inconsistency is also present in i386 and rs6000 mangling.
- The target_clones attribute does not currently support an implicit
  "default" version.
- Unrecognised target names in a target_clones attribute should be
  ignored (with an optional warning), but currently cause an error to be
  raised instead.
- There is no option to disable function multiversioning at compile
  time.
- There is no support for function multiversioning in C, since this is
  not yet enabled in the frontend. On the other hand, this patch
  happens to enable multiversioning in Ada and D as well, using their
  existing frontend support.

This patch relies on adding functionality to libgcc, to support:
- struct { unsigned long long features; } __aarch64_cpu_features;
- void __init_cpu_features (void);
- void __init_cpu_features_resolver (unsigned long hwcap,
 const __ifunc_arg_t *arg);
This support matches the interface currently used in LLVM's compiler-rt,
and will be implemented in a future patch (which will be merged before
merging this patch).

This version of the patch incorrectly uses __init_cpu_features in the
ifunc resolvers, which could lead to invalid library calls at load time.
I will fix this to use __init_cpu_features_resolver in a future version
of the patch.

[1] 
https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning

gcc/ChangeLog:

* attribs.cc (decl_attributes): Pass attribute name to target
hook.
* config/aarch64/aarch64.cc
(aarch64_process_target_version_attr): New.
(aarch64_option_valid_attribute_p): Add check and support for
target_version attribute.
(enum CPUFeatures): New list of for bitmask positions.
(aarch64_fmv_feature_data): New.
(get_feature_bit): New.
(get_feature_mask_for_version): New.
(compare_feature_masks): New.
(aarch64_compare_version_priority): New.
(make_resolver_func): New.
(add_condition_to_bb): New.
(compare_feature_version_info): New.
(dispatch_function_versions): New.
(aarch64_generate_version_dispatcher_body): New.
(aarch64_get_function_versions_dispatcher): New.
(aarch64_common_function_versions): New.
(aarch64_mangle_decl_assembler_name): New.
(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
(TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
(TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
(TARGET_COMPARE_VERSION_PRIORITY): New implementation.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.


diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 
a3c4a81e8582ea4fd06b9518bf51fad7c998ddd6..cc935b502028392ebdc105f940900f01f79196a7
 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -657,7 +657,8 @@ decl_attributes (tree *node, tree attributes, int flags,
  options to the attribute((target(...))) list.  */
   if (TREE_CODE (*node) == FUNCTION_DECL
   && current_target_pragma
-  && targetm.target_option.valid_attribute_p (*node, NULL_TREE,
+  && targetm.target_option.valid_attribute_p (*node,
+ get_identifier("target"),
  current_target_pragma, 0))
 {
   tree cur_attr = lookup_attribute ("target", attributes);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
9c3c0e705e2e6ea3b55b4a5f1e7d3360f91eb51d..ca0e2a2507ffdbf99e17b77240504bf2d175b9c0
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19088,11 +19088,70 @@ aarch64_process_target_attr (tree args)
   return true;
 }
 
+/* Parse the tree in ARGS that contains the targeti_version attribute
+   information and update the global target options space.  */
+
+bool
+aarch64_process_target_version_attr (tree args)
+{
+  if (TREE_CODE (args) == TREE_LIST)
+{
+  if (TREE_CHAIN (args))
+   {
+ error ("attribute % has multiple values");
+ return false;
+   }
+  args = TREE_VALUE (args);
+}
+
+  if (!args || TREE_CODE (args) != STRING_CST)
+{
+  error ("attribute % argument not a string");
+  return false;
+}
+
+  const char *str = TREE_STRING_POINTER (args);
+  if (strcmp (str, "default") == 0)
+return true;
+
+  auto with_plus = std::string ("+") + str;
+  enum aarch_parse_opt_result parse_res;
+  auto isa_flags 

[1/3] Add support for target_version attribute

2023-10-18 Thread Andrew Carlotti
This patch adds support for the "target_version" attribute to the middle
end and the C++ frontend, which will be used to implement function
multiversioning in the aarch64 backend.

Note that C++ is currently the only frontend which supports
multiversioning using the "target" attribute, whereas the
"target_clones" attribute is additionally supported in C, D and Ada.
Support for the target_version attribute will be extended to C at a
later date.

Targets that currently use the "target" attribute for function
multiversioning (i.e. i386 and rs6000) are not affected by this patch.


I could have implemented the target hooks slightly differently, by reusing the
valid_attribute_p hook and adding attribute name checks to each backend
implementation (c.f. the aarch64 implementation in patch 2/3).  Would this be
preferable?

Otherwise, is this ok for master?


gcc/c-family/ChangeLog:

* c-attribs.cc (handle_target_version_attribute): New.
(c_common_attribute_table): Add target_version.
(handle_target_clones_attribute): Add conflict with
target_version attribute.

gcc/ChangeLog:

* attribs.cc (is_function_default_version): Update comment to
specify incompatibility with target_version attributes.
* cgraphclones.cc (cgraph_node::create_version_clone_with_body):
Call valid_version_attribute_p for target_version attributes.
* target.def (valid_version_attribute_p): New hook.
(expanded_clones_attribute): New hook.
* doc/tm.texi.in: Add new hooks.
* doc/tm.texi: Regenerate.
* multiple_target.cc (create_dispatcher_calls): Remove redundant
is_function_default_version check.
(expand_target_clones): Use target hook for attribute name.
* targhooks.cc (default_target_option_valid_version_attribute_p):
New.
* targhooks.h (default_target_option_valid_version_attribute_p):
New.
* tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
target_version attributes.

gcc/cp/ChangeLog:

* decl2.cc (check_classfn): Update comment to include
target_version attributes.


diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 
b1300018d1e8ed8e02ded1ea721dc192a6d32a49..a3c4a81e8582ea4fd06b9518bf51fad7c998ddd6
 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -1233,8 +1233,9 @@ make_dispatcher_decl (const tree decl)
   return func_decl;  
 }
 
-/* Returns true if decl is multi-versioned and DECL is the default function,
-   that is it is not tagged with target specific optimization.  */
+/* Returns true if DECL is multi-versioned using the target attribute, and this
+   is the default version.  This function can only be used for targets that do
+   not support the "target_version" attribute.  */
 
 bool
 is_function_default_version (const tree decl)
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 
072cfb69147bd6b314459c0bd48a0c1fb92d3e4d..1a224c036277d51ab4dc0d33a403177bd226e48a
 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -148,6 +148,7 @@ static tree handle_alloc_align_attribute (tree *, tree, 
tree, int, bool *);
 static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool *);
 static tree handle_assume_attribute (tree *, tree, tree, int, bool *);
 static tree handle_target_attribute (tree *, tree, tree, int, bool *);
+static tree handle_target_version_attribute (tree *, tree, tree, int, bool *);
 static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
 static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
 static tree ignore_attribute (tree *, tree, tree, int, bool *);
@@ -480,6 +481,8 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_error_attribute, NULL },
   { "target", 1, -1, true, false, false, false,
  handle_target_attribute, NULL },
+  { "target_version", 1, -1, true, false, false, false,
+ handle_target_version_attribute, NULL },
   { "target_clones",  1, -1, true, false, false, false,
  handle_target_clones_attribute, NULL },
   { "optimize",   1, -1, true, false, false, false,
@@ -5569,6 +5572,45 @@ handle_target_attribute (tree *node, tree name, tree 
args, int flags,
   return NULL_TREE;
 }
 
+/* Handle a "target_version" attribute.  */
+
+static tree
+handle_target_version_attribute (tree *node, tree name, tree args, int flags,
+ bool *no_add_attrs)
+{
+  /* Ensure we have a function type.  */
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
+{
+  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+  "with %qs 

[0/3] target_version and aarch64 function multiversioning

2023-10-18 Thread Andrew Carlotti
This series adds support for function multiversioning on aarch64.  There are a
few minor issues in patch 2/3, that I intend to fix in future versions or
follow-up patches.  I also have some open questions about the correctness of
existing function multiversioning implementations [1], that could affect some
details of this patch series.

Patches 1/3 and 2/3 both pass regression testing on x86.  Patch 2/3 requires
adding function multiversioning tests to aarch64, which I haven't included yet.
Patch 3/3 demonstrates a potential approach for improving consistency of symbol
naming between target_clones and target/target_version multiversioning, but
would require agreement on how to resolve some of the issues discussed in [1].

Thanks,
Andrew


[1] https://gcc.gnu.org/pipermail/gcc/2023-October/242686.html


aarch64: Replace duplicated selftests

2023-10-18 Thread Andrew Carlotti
Pushed as obvious.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_test_fractional_cost):
Test <= instead of testing < twice.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
2b0de7ca0389be6698c329b54f9501b8ec09183f..9c3c0e705e2e6ea3b55b4a5f1e7d3360f91eb51d
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -27529,18 +27529,18 @@ aarch64_test_fractional_cost ()
   ASSERT_EQ (cf (2, 3) * 5, cf (10, 3));
   ASSERT_EQ (14 * cf (11, 21), cf (22, 3));
 
-  ASSERT_TRUE (cf (4, 15) < cf (5, 15));
-  ASSERT_FALSE (cf (5, 15) < cf (5, 15));
-  ASSERT_FALSE (cf (6, 15) < cf (5, 15));
-  ASSERT_TRUE (cf (1, 3) < cf (2, 5));
-  ASSERT_TRUE (cf (1, 12) < cf (1, 6));
-  ASSERT_FALSE (cf (5, 3) < cf (5, 3));
-  ASSERT_TRUE (cf (239, 240) < 1);
-  ASSERT_FALSE (cf (240, 240) < 1);
-  ASSERT_FALSE (cf (241, 240) < 1);
-  ASSERT_FALSE (2 < cf (207, 104));
-  ASSERT_FALSE (2 < cf (208, 104));
-  ASSERT_TRUE (2 < cf (209, 104));
+  ASSERT_TRUE (cf (4, 15) <= cf (5, 15));
+  ASSERT_TRUE (cf (5, 15) <= cf (5, 15));
+  ASSERT_FALSE (cf (6, 15) <= cf (5, 15));
+  ASSERT_TRUE (cf (1, 3) <= cf (2, 5));
+  ASSERT_TRUE (cf (1, 12) <= cf (1, 6));
+  ASSERT_TRUE (cf (5, 3) <= cf (5, 3));
+  ASSERT_TRUE (cf (239, 240) <= 1);
+  ASSERT_TRUE (cf (240, 240) <= 1);
+  ASSERT_FALSE (cf (241, 240) <= 1);
+  ASSERT_FALSE (2 <= cf (207, 104));
+  ASSERT_TRUE (2 <= cf (208, 104));
+  ASSERT_TRUE (2 <= cf (209, 104));
 
   ASSERT_TRUE (cf (4, 15) < cf (5, 15));
   ASSERT_FALSE (cf (5, 15) < cf (5, 15));


Re: [GCC 13 PATCH] aarch64: Remove architecture dependencies from intrinsics

2023-07-20 Thread Andrew Carlotti via Gcc-patches
On Thu, Jul 20, 2023 at 09:37:14AM +0200, Richard Biener wrote:
> On Thu, Jul 20, 2023 at 8:49 AM Richard Sandiford via Gcc-patches
>  wrote:
> >
> > Andrew Carlotti  writes:
> > > Updated patch to fix the fp16 intrinsic pragmas, and pushed to master.
> > > OK to backport to GCC 13?
> >
> > OK, thanks.
> 
> In case you want it in 13.2 please push it really soon, we want to do 13.2 RC1
> today.
> 
> Richard.

Pushed, thanks.

> 
> > Richard
> >
> > > Many intrinsics currently depend on both an architecture version and a
> > > feature, despite the corresponding instructions being available within
> > > GCC at lower architecture versions.
> > >
> > > LLVM has already removed these explicit architecture version
> > > dependences; this patch does the same for GCC. Note that +fp16 does not
> > > imply +simd, so we need to add an explicit +simd for the Neon fp16
> > > intrinsics.
> > >
> > > Binutils did not previously support all of these architecture+feature
> > > combinations, but this problem is already reachable from GCC.  For
> > > example, compiling the test gcc.target/aarch64/usadv16qi-dotprod.c
> > > with -O3 -march=armv8-a+dotprod has resulted in an assembler error since
> > > GCC 10.  This is fixed in Binutils 2.41.
> > >
> > > This patch retains explicit architecture version dependencies for
> > > features that do not currently have a separate feature flag.
> > >
> > > gcc/ChangeLog:
> > >
> > >  * config/aarch64/aarch64.h (TARGET_MEMTAG): Remove armv8.5
> > >  dependency.
> > >  * config/aarch64/arm_acle.h: Remove unnecessary armv8.x
> > >  dependencies from target pragmas.
> > >  * config/aarch64/arm_fp16.h (target): Likewise.
> > >  * config/aarch64/arm_neon.h (target): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >  * gcc.target/aarch64/feature-bf16-backport.c: New test.
> > >  * gcc.target/aarch64/feature-dotprod-backport.c: New test.
> > >  * gcc.target/aarch64/feature-fp16-backport.c: New test.
> > >  * gcc.target/aarch64/feature-fp16-scalar-backport.c: New test.
> > >  * gcc.target/aarch64/feature-fp16fml-backport.c: New test.
> > >  * gcc.target/aarch64/feature-i8mm-backport.c: New test.
> > >  * gcc.target/aarch64/feature-memtag-backport.c: New test.
> > >  * gcc.target/aarch64/feature-sha3-backport.c: New test.
> > >  * gcc.target/aarch64/feature-sm4-backport.c: New test.
> > >
> > > ---
> > >
> > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > > index 
> > > a01f1ee99d85917941ffba55bc3b4dcac87b41f6..2b0fc97bb71e9d560ae26035c7d7142682e46c38
> > >  100644
> > > --- a/gcc/config/aarch64/aarch64.h
> > > +++ b/gcc/config/aarch64/aarch64.h
> > > @@ -292,7 +292,7 @@ enum class aarch64_feature : unsigned char {
> > >  #define TARGET_RNG (AARCH64_ISA_RNG)
> > >
> > >  /* Memory Tagging instructions optional to Armv8.5 enabled through 
> > > +memtag.  */
> > > -#define TARGET_MEMTAG (AARCH64_ISA_V8_5A && AARCH64_ISA_MEMTAG)
> > > +#define TARGET_MEMTAG (AARCH64_ISA_MEMTAG)
> > >
> > >  /* I8MM instructions are enabled through +i8mm.  */
> > >  #define TARGET_I8MM (AARCH64_ISA_I8MM)
> > > diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
> > > index 
> > > 3b6b63e6805432b5f1686745f987c52d2967c7c1..7599a32301dadf80760d3cb40a8685d2e6a476fb
> > >  100644
> > > --- a/gcc/config/aarch64/arm_acle.h
> > > +++ b/gcc/config/aarch64/arm_acle.h
> > > @@ -292,7 +292,7 @@ __rndrrs (uint64_t *__res)
> > >  #pragma GCC pop_options
> > >
> > >  #pragma GCC push_options
> > > -#pragma GCC target ("arch=armv8.5-a+memtag")
> > > +#pragma GCC target ("+nothing+memtag")
> > >
> > >  #define __arm_mte_create_random_tag(__ptr, __u64_mask) \
> > >__builtin_aarch64_memtag_irg(__ptr, __u64_mask)
> > > diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
> > > index 
> > > 350f8cc33d99e16137e9d70fa7958b10924dc67f..c10f9dcf7e097ded1740955addcd73348649dc56
> > >  100644
> > > --- a/gcc/config/aarch64/arm_fp16.h
> > > +++ b/gcc/config/aarch64/arm_fp16.h
> > > @@ -30,7 +30,7 @@
> > >  #include 
> > >
> > >  #pragma GCC push_options
> > > -#pragma GCC target ("arch=armv8.2-a+fp16&q

Re: [GCC 13 PATCH] aarch64: Remove architecture dependencies from intrinsics

2023-07-19 Thread Andrew Carlotti via Gcc-patches
On Wed, Jul 19, 2023 at 07:35:26PM +0100, Ramana Radhakrishnan wrote:
> On Wed, Jul 19, 2023 at 5:44 PM Andrew Carlotti via Gcc-patches
>  wrote:
> >
> > Updated patch to fix the fp16 intrinsic pragmas, and pushed to master.
> > OK to backport to GCC 13?
> >
> >
> > Many intrinsics currently depend on both an architecture version and a
> > feature, despite the corresponding instructions being available within
> > GCC at lower architecture versions.
> >
> > LLVM has already removed these explicit architecture version
> > dependences; this patch does the same for GCC. Note that +fp16 does not
> > imply +simd, so we need to add an explicit +simd for the Neon fp16
> > intrinsics.
> >
> > Binutils did not previously support all of these architecture+feature
> > combinations, but this problem is already reachable from GCC.  For
> > example, compiling the test gcc.target/aarch64/usadv16qi-dotprod.c
> > with -O3 -march=armv8-a+dotprod has resulted in an assembler error since
> > GCC 10.  This is fixed in Binutils 2.41.
> 
> Are there any implementations that actually implement v8-a + dotprod
> ?. As far as I'm aware this was v8.2-A as the base architecture where
> this was allowed. Has this changed recently?
> 
> 
> regards
> Ramana

I don't recall whether there are any physical implementations of DotProd
without Armv8.2, but similar situations have already occurred with other
features.

There are also situations where developers wish to enable only a subset of
available features.  For example, the existing restrictions in GCC have forced
Chromium to disable their memtag support when building with GCC [1]; with this
patch, they will be able to reenable memtag support from GCC 14 (and GCC 13.x
when this is backported).

I don't see any advantages to trying to enforce minimum architecture versions
for features in GCC, except perhaps maintaining the status quo.  But the status
quo is already rather inconsistent, and these changes only make GCC more
permissive (and only for options that currently don't work).


[1] https://chromium-review.googlesource.com/c/chromium/src/+/3238466


[GCC 13 PATCH] aarch64: Remove architecture dependencies from intrinsics

2023-07-19 Thread Andrew Carlotti via Gcc-patches
Updated patch to fix the fp16 intrinsic pragmas, and pushed to master.
OK to backport to GCC 13?


Many intrinsics currently depend on both an architecture version and a
feature, despite the corresponding instructions being available within
GCC at lower architecture versions.

LLVM has already removed these explicit architecture version
dependences; this patch does the same for GCC. Note that +fp16 does not
imply +simd, so we need to add an explicit +simd for the Neon fp16
intrinsics.

Binutils did not previously support all of these architecture+feature
combinations, but this problem is already reachable from GCC.  For
example, compiling the test gcc.target/aarch64/usadv16qi-dotprod.c
with -O3 -march=armv8-a+dotprod has resulted in an assembler error since
GCC 10.  This is fixed in Binutils 2.41.

This patch retains explicit architecture version dependencies for
features that do not currently have a separate feature flag.

gcc/ChangeLog:

 * config/aarch64/aarch64.h (TARGET_MEMTAG): Remove armv8.5
 dependency.
 * config/aarch64/arm_acle.h: Remove unnecessary armv8.x
 dependencies from target pragmas.
 * config/aarch64/arm_fp16.h (target): Likewise.
 * config/aarch64/arm_neon.h (target): Likewise.

gcc/testsuite/ChangeLog:

 * gcc.target/aarch64/feature-bf16-backport.c: New test.
 * gcc.target/aarch64/feature-dotprod-backport.c: New test.
 * gcc.target/aarch64/feature-fp16-backport.c: New test.
 * gcc.target/aarch64/feature-fp16-scalar-backport.c: New test.
 * gcc.target/aarch64/feature-fp16fml-backport.c: New test.
 * gcc.target/aarch64/feature-i8mm-backport.c: New test.
 * gcc.target/aarch64/feature-memtag-backport.c: New test.
 * gcc.target/aarch64/feature-sha3-backport.c: New test.
 * gcc.target/aarch64/feature-sm4-backport.c: New test.

---

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
a01f1ee99d85917941ffba55bc3b4dcac87b41f6..2b0fc97bb71e9d560ae26035c7d7142682e46c38
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -292,7 +292,7 @@ enum class aarch64_feature : unsigned char {
 #define TARGET_RNG (AARCH64_ISA_RNG)
 
 /* Memory Tagging instructions optional to Armv8.5 enabled through +memtag.  */
-#define TARGET_MEMTAG (AARCH64_ISA_V8_5A && AARCH64_ISA_MEMTAG)
+#define TARGET_MEMTAG (AARCH64_ISA_MEMTAG)
 
 /* I8MM instructions are enabled through +i8mm.  */
 #define TARGET_I8MM (AARCH64_ISA_I8MM)
diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 
3b6b63e6805432b5f1686745f987c52d2967c7c1..7599a32301dadf80760d3cb40a8685d2e6a476fb
 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -292,7 +292,7 @@ __rndrrs (uint64_t *__res)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.5-a+memtag")
+#pragma GCC target ("+nothing+memtag")
 
 #define __arm_mte_create_random_tag(__ptr, __u64_mask) \
   __builtin_aarch64_memtag_irg(__ptr, __u64_mask)
diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
index 
350f8cc33d99e16137e9d70fa7958b10924dc67f..c10f9dcf7e097ded1740955addcd73348649dc56
 100644
--- a/gcc/config/aarch64/arm_fp16.h
+++ b/gcc/config/aarch64/arm_fp16.h
@@ -30,7 +30,7 @@
 #include 
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+fp16")
+#pragma GCC target ("+nothing+fp16")
 
 typedef __fp16 float16_t;
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
0ace1eeddb97443433c091d2363403fcf2907654..349f3167699447eb397af482eaeadf8a07617025
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -25590,7 +25590,7 @@ __INTERLEAVE_LIST (zip)
 #include "arm_fp16.h"
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+fp16")
+#pragma GCC target ("+nothing+simd+fp16")
 
 /* ARMv8.2-A FP16 one operand vector intrinsics.  */
 
@@ -26753,7 +26753,7 @@ vminnmvq_f16 (float16x8_t __a)
 /* AdvSIMD Dot Product intrinsics.  */
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+dotprod")
+#pragma GCC target ("+nothing+dotprod")
 
 __extension__ extern __inline uint32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -26844,7 +26844,7 @@ vdotq_laneq_s32 (int32x4_t __r, int8x16_t __a, 
int8x16_t __b, const int __index)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+sm4")
+#pragma GCC target ("+nothing+sm4")
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -26911,7 +26911,7 @@ vsm4ekeyq_u32 (uint32x4_t __a, uint32x4_t __b)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+sha3")
+#pragma GCC target ("+nothing+sha3")
 
 __extension__ extern __inline uint64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -27547,7 +27547,7 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t 
__a, float32x4_t __b,
 #pragma GCC pop_options
 
 #pragma GCC 

Re: [PATCH] aarch64: Remove architecture dependencies from intrinsics

2023-06-29 Thread Andrew Carlotti via Gcc-patches
On Tue, Jun 27, 2023 at 07:23:32AM +0100, Richard Sandiford wrote:
> Andrew Carlotti via Gcc-patches  writes:
> > Many intrinsics currently depend on both an architecture version and a
> > feature, despite the corresponding instructions being available within
> > GCC at lower architecture versions.
> >
> > LLVM has already removed these explicit architecture version
> > dependences; this patch does the same for GCC, as well as removing an
> > unecessary simd dependency for the scalar fp16 intrinsics.
> >
> > Binutils does not support all of these architecture+feature combinations
> > yet, but this is an existing problem that is already reachable from GCC.
> > For example, compiling the test gcc.target/aarch64/usadv16qi-dotprod.c
> > with -O3 -march=armv8-a+dotprod has resulted in an assembler error since
> > GCC 10. I intend to patch this in binutils.
> >
> > This patch retains explicit architecture version dependencies for
> > features that do not currently have a separate feature flag.
> >
> > Ok for master, and backport to GCC 13?
> >
> > gcc/ChangeLog:
> >
> >  * config/aarch64/aarch64.h (TARGET_MEMTAG): Remove armv8.5
> >  dependency.
> >  * config/aarch64/arm_acle.h: Remove unnecessary armv8.x
> >  dependencies from target pragmas.
> >  * config/aarch64/arm_fp16.h (target): Likewise.
> 
> The change to this file is a bit different from the others,
> since it's removing an implicit dependency on +simd, rather
> than a dependency on an architecture level.  I think it'd be
> worth mentioning that explicitly in the changelog.
> 
> OK with that change, thanks.
> 
> (Arguably we should add +nosimd to many of the other pragmas in
> arm_acle.h, but that's logically a separate patch.)
> 
> Richard

Actually, I think I should just remove the +nosimd from the patch, because
+fp16 doesn't enable simd (unlike +bf16, which has simd as an 'explicit on'
implication).

Aside from +bf16, the only other feature with simd as an 'explicit on' is
+rdma. However, there appear to be no non-simd rdma instructions, so
+nothing+rdma+nosimd is effectively the same as +nothing.

> > ...
> >
> > diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
> > index 
> > a8fa4dbbdfe1bab4aa604bb311ef66d4e1de18ac..84b2ed66f9ba19fba6ccd8be33940d7239bfa22e
> >  100644
> > --- a/gcc/config/aarch64/arm_fp16.h
> > +++ b/gcc/config/aarch64/arm_fp16.h
> > @@ -30,7 +30,7 @@
> >  #include 
> >  
> >  #pragma GCC push_options
> > -#pragma GCC target ("arch=armv8.2-a+fp16")
> > +#pragma GCC target ("+nothing+fp16+nosimd")
> >  
> >  typedef __fp16 float16_t;
> >  


[committed] docs: Fix typo

2023-06-26 Thread Andrew Carlotti via Gcc-patches
gcc/ChangeLog:

 * doc/optinfo.texi: Fix "steam" -> "stream".


diff --git a/gcc/doc/optinfo.texi b/gcc/doc/optinfo.texi
index 
b91bba7bd10470b17ca5190688beee06ad3b87ab..5e8c97ef118786e68b7e46f3c802154cb9b57b83
 100644
--- a/gcc/doc/optinfo.texi
+++ b/gcc/doc/optinfo.texi
@@ -100,7 +100,7 @@ that one could also use special file names @code{stdout} and
 respectively.
 
 @item @code{alt_stream}
-This steam is used for printing optimization specific output in
+This stream is used for printing optimization specific output in
 response to the @option{-fopt-info}. Again a file name can be given. If
 the file name is not given, it defaults to @code{stderr}.
 @end table


[PATCH] aarch64: Remove architecture dependencies from intrinsics

2023-06-26 Thread Andrew Carlotti via Gcc-patches
Many intrinsics currently depend on both an architecture version and a
feature, despite the corresponding instructions being available within
GCC at lower architecture versions.

LLVM has already removed these explicit architecture version
dependences; this patch does the same for GCC, as well as removing an
unecessary simd dependency for the scalar fp16 intrinsics.

Binutils does not support all of these architecture+feature combinations
yet, but this is an existing problem that is already reachable from GCC.
For example, compiling the test gcc.target/aarch64/usadv16qi-dotprod.c
with -O3 -march=armv8-a+dotprod has resulted in an assembler error since
GCC 10. I intend to patch this in binutils.

This patch retains explicit architecture version dependencies for
features that do not currently have a separate feature flag.

Ok for master, and backport to GCC 13?

gcc/ChangeLog:

 * config/aarch64/aarch64.h (TARGET_MEMTAG): Remove armv8.5
 dependency.
 * config/aarch64/arm_acle.h: Remove unnecessary armv8.x
 dependencies from target pragmas.
 * config/aarch64/arm_fp16.h (target): Likewise.
 * config/aarch64/arm_neon.h (target): Likewise.

gcc/testsuite/ChangeLog:

 * gcc.target/aarch64/feature-bf16-backport.c: New test.
 * gcc.target/aarch64/feature-dotprod-backport.c: New test.
 * gcc.target/aarch64/feature-fp16-backport.c: New test.
 * gcc.target/aarch64/feature-fp16-scalar-backport.c: New test.
 * gcc.target/aarch64/feature-fp16fml-backport.c: New test.
 * gcc.target/aarch64/feature-i8mm-backport.c: New test.
 * gcc.target/aarch64/feature-memtag-backport.c: New test.
 * gcc.target/aarch64/feature-sha3-backport.c: New test.
 * gcc.target/aarch64/feature-sm4-backport.c: New test.


diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
7129ed1ff370d597895b3f46b56b1250da7fa190..cdb664eb8f7db820b6b06b2667bfad6dc14cb7a2
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -292,7 +292,7 @@ enum class aarch64_feature : unsigned char {
 #define TARGET_RNG (AARCH64_ISA_RNG)
 
 /* Memory Tagging instructions optional to Armv8.5 enabled through +memtag.  */
-#define TARGET_MEMTAG (AARCH64_ISA_V8_5A && AARCH64_ISA_MEMTAG)
+#define TARGET_MEMTAG (AARCH64_ISA_MEMTAG)
 
 /* I8MM instructions are enabled through +i8mm.  */
 #define TARGET_I8MM (AARCH64_ISA_I8MM)
diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 
e0ac591d2c8d6c4c4c8a074b2d9881c47b1db1ab..87fb42f47c5821adecbb0ea441e0a38c63972e77
 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -325,7 +325,7 @@ __rndrrs (uint64_t *__res)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.5-a+memtag")
+#pragma GCC target ("+nothing+memtag")
 
 #define __arm_mte_create_random_tag(__ptr, __u64_mask) \
   __builtin_aarch64_memtag_irg(__ptr, __u64_mask)
diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
index 
a8fa4dbbdfe1bab4aa604bb311ef66d4e1de18ac..84b2ed66f9ba19fba6ccd8be33940d7239bfa22e
 100644
--- a/gcc/config/aarch64/arm_fp16.h
+++ b/gcc/config/aarch64/arm_fp16.h
@@ -30,7 +30,7 @@
 #include 
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+fp16")
+#pragma GCC target ("+nothing+fp16+nosimd")
 
 typedef __fp16 float16_t;
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
eeec9f162e223df8cf7803b3227aef22e94227ac..a078674376af121c36bbebef76631c25a6815b1b
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -25590,7 +25590,7 @@ __INTERLEAVE_LIST (zip)
 #include "arm_fp16.h"
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+fp16")
+#pragma GCC target ("+nothing+fp16")
 
 /* ARMv8.2-A FP16 one operand vector intrinsics.  */
 
@@ -26753,7 +26753,7 @@ vminnmvq_f16 (float16x8_t __a)
 /* AdvSIMD Dot Product intrinsics.  */
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+dotprod")
+#pragma GCC target ("+nothing+dotprod")
 
 __extension__ extern __inline uint32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -26844,7 +26844,7 @@ vdotq_laneq_s32 (int32x4_t __r, int8x16_t __a, 
int8x16_t __b, const int __index)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+sm4")
+#pragma GCC target ("+nothing+sm4")
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -26911,7 +26911,7 @@ vsm4ekeyq_u32 (uint32x4_t __a, uint32x4_t __b)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+sha3")
+#pragma GCC target ("+nothing+sha3")
 
 __extension__ extern __inline uint64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -27547,7 +27547,7 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t 
__a, float32x4_t __b,
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+fp16fml")
+#pragma GCC 

Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-03-01 Thread Andrew Carlotti via Gcc-patches
On Thu, Feb 23, 2023 at 11:39:51AM -0500, Andrew MacLeod via Gcc-patches wrote:
> 
> 
> Inheriting from operator_mult is also going to be hazardous because it also
> has an op1_range and op2_range...� you should at least define those and
> return VARYING to avoid other issues.� Same thing applies to widen_plus I
> think, and it has relation processing and other things as well.� Your widen
> operands are not what those classes expect, so I think you probably just
> want a fresh range operator.
> 
> It also looks like the mult operation is sign/zero extending both upper
> bounds, and neither lower bound..�� I think that should be the LH upper 
> and
> lower bound?
> 
> I've attached a second patch� (newversion.patch) which incorporates my fix,
> the fix to the sign of only op1's bounds,� as well as a simplification of
> the classes to not inherit from operator_mult/plus..�� I think this still
> does what you want?� and it wont get you into unexpected trouble later :-)
> 
> let me know if this is still doing what you are expecting...
> 
> Andrew
> 

Hi,

This patch still uses the wrong signedness for some of the extensions in
WIDEN_MULT_EXPR. It currently bases it's promotion decisions on whether there
is any signed argument, and whether the result is signed - i.e.:

Patch extends as:
UUU UU
UUS -> USU
USU SU
USS SU  wrong
SUU US  wrong
SUS -> SSU
SSU SS  wrong
SSS SS

The documentation in tree.def is unclear about whether the output signedness is
linked to the input signedness, but at least the SSU case seems valid, and is
mishandled here.

I think it would be clearer and simpler to have four (or three) different
versions for each combnation of signedness of the input operands. This could be
implemented without extra code duplication by creating four different instances
of an operator_widen_mult class (perhaps extending a range_operator_mixed_sign
class), with the signedness indicated by two additional class members.

The documentation for WIDEN_PLUS_EXPR (and several other expressions added in
the same commit) is completely missing. If the signs are required to be
matching, then this should be clarified; otherwise it would need the same
special handling as WIDEN_MULT_EXPR.

Andrew

> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
> index d9dfdc56939..824e0338f34 100644
> --- a/gcc/gimple-range-op.cc
> +++ b/gcc/gimple-range-op.cc
> @@ -179,6 +179,8 @@ gimple_range_op_handler::gimple_range_op_handler (gimple 
> *s)
>// statements.
>if (is_a  (m_stmt))
>  maybe_builtin_call ();
> +  else
> +maybe_non_standard ();
>  }
>  
>  // Calculate what we can determine of the range of this unary
> @@ -764,6 +766,36 @@ public:
>}
>  } op_cfn_parity;
>  
> +// Set up a gimple_range_op_handler for any nonstandard function which can be
> +// supported via range-ops.
> +
> +void
> +gimple_range_op_handler::maybe_non_standard ()
> +{
> +  if (gimple_code (m_stmt) == GIMPLE_ASSIGN)
> +switch (gimple_assign_rhs_code (m_stmt))
> +  {
> + case WIDEN_MULT_EXPR:
> + {
> +   m_valid = true;
> +   m_op1 = gimple_assign_rhs1 (m_stmt);
> +   m_op2 = gimple_assign_rhs2 (m_stmt);
> +   bool signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
> +   bool signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
> +   if (signed2 && !signed1)
> + std::swap (m_op1, m_op2);
> +
> +   if (signed1 || signed2)
> + m_int = ptr_op_widen_mult_signed;
> +   else
> + m_int = ptr_op_widen_mult_unsigned;
> +   break;
> + }
> + default:
> +   break;
> +  }
> +}
> +
>  // Set up a gimple_range_op_handler for any built in function which can be
>  // supported via range-ops.
>  
> diff --git a/gcc/gimple-range-op.h b/gcc/gimple-range-op.h
> index 743b858126e..1bf63c5ce6f 100644
> --- a/gcc/gimple-range-op.h
> +++ b/gcc/gimple-range-op.h
> @@ -41,6 +41,7 @@ public:
>relation_trio = TRIO_VARYING);
>  private:
>void maybe_builtin_call ();
> +  void maybe_non_standard ();
>gimple *m_stmt;
>tree m_op1, m_op2;
>  };
> diff --git a/gcc/range-op.cc b/gcc/range-op.cc
> index 5c67bce6d3a..7cd19a92d00 100644
> --- a/gcc/range-op.cc
> +++ b/gcc/range-op.cc
> @@ -1556,6 +1556,34 @@ operator_plus::op2_range (irange , tree type,
>return op1_range (r, type, lhs, op1, rel.swap_op1_op2 ());
>  }
>  
> +class operator_widen_plus : public range_operator
> +{
> +public:
> +  virtual void wi_fold (irange , tree type,
> + const wide_int _lb,
> + const wide_int _ub,
> + const wide_int _lb,
> + const wide_int _ub) const;
> +} op_widen_plus;
> +
> +void
> +operator_widen_plus::wi_fold (irange , tree type,
> + const wide_int _lb, const wide_int _ub,
> + const wide_int _lb, const wide_int _ub) const

Re: [PATCH 9/8] middle-end: Allow build_popcount_expr to use an IFN

2023-01-16 Thread Andrew Carlotti via Gcc-patches
Erm, ignore this - I just rediscovered the approval in a different mail
folder. I forgot that Outlook's automatic email dedpulication meant that
messages CC'd to me end up in one of two different folders at random
when I want them in both.


On Mon, Jan 16, 2023 at 02:03:29PM +, Andrew Carlotti via Gcc-patches wrote:
> Hi Richard
> 
> I accidentally pushed this patch earlier in the mistaken belief that
> you'd already approved it. It looks uncontroversial to me - it just adds
> IFN support to build_popcount_expr, analogous to the changes you
> suggested and approved for build_cltz_expr (and adjusts testcases
> accordingly). I might have incorporated it into an earlier patch in this
> series, if I hadn't already pushed that earlier patch.
> 
> Is this OK to leave in master now?
> 
> Thanks,
> Andrew
> 
> On Thu, Dec 22, 2022 at 05:43:21PM +, Andrew Carlotti via Gcc-patches 
> wrote:
> > Bootstrapped and regression tested on aarch64-unknown-linux-gnu and
> > x86_64-pc-linux-gnu - ok to merge?
> > 
> > gcc/ChangeLog:
> > 
> > * tree-ssa-loop-niter.cc (build_popcount_expr): Add IFN support.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/tree-ssa/pr86544.C: Add .POPCOUNT to tree scan regex.
> > * gcc.dg/tree-ssa/popcount.c: Likewise.
> > * gcc.dg/tree-ssa/popcount2.c: Likewise.
> > * gcc.dg/tree-ssa/popcount3.c: Likewise.
> > * gcc.target/aarch64/popcount4.c: Likewise.
> > * gcc.target/i386/pr95771.c: Likewise, and...
> > * gcc.target/i386/pr95771-2.c: ...split int128 test from above,
> > since this would emit just a single IFN if a TI optab is added.
> > 
> > ---
> > 
> > diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr86544.C 
> > b/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
> > index 
> > ef438916a8019320564f444ace08e2f4b4190684..50befb36bac75de1cfa282e38358278b3288bd1c
> >  100644
> > --- a/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
> > +++ b/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
> > @@ -12,5 +12,5 @@ int PopCount (long b) {
> >  return c;
> >  }
> >  
> > -/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } 
> > } */
> > +/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 1 
> > "optimized" } } */
> >  /* { dg-final { scan-tree-dump-times "if" 0 "phiopt4" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
> > index 
> > b4694109411a4631697463519acbe7d9df65bf6e..efd906a0f5447f0beb3752eded3756999b02e6e6
> >  100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
> > @@ -39,4 +39,4 @@ void PopCount3 (long b1) {
> >}
> >  }
> >  
> > -/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } 
> > } */
> > +/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 3 
> > "optimized" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
> > index 
> > ef73e345573de721833e98e89c252640a55f7c60..ae38a329bd4d868a762300d3218d68864c0fc4be
> >  100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
> > @@ -26,4 +26,4 @@ int main()
> >return 0;
> >  }
> >  
> > -/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } 
> > } */
> > +/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 1 
> > "optimized" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
> > index 
> > ef438916a8019320564f444ace08e2f4b4190684..50befb36bac75de1cfa282e38358278b3288bd1c
> >  100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
> > @@ -12,5 +12,5 @@ int PopCount (long b) {
> >  return c;
> >  }
> >  
> > -/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } 
> > } */
> > +/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 1 
> > "optimized" } } */
> >  /* { dg-final { scan-tree-dump-times "if" 0 "phiopt4" } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/popcount4.c 
> > b/gcc/testsuite/gcc.target/aarch64/popcount4.c
> > index 
> &g

Re: [PATCH 9/8] middle-end: Allow build_popcount_expr to use an IFN

2023-01-16 Thread Andrew Carlotti via Gcc-patches
Hi Richard

I accidentally pushed this patch earlier in the mistaken belief that
you'd already approved it. It looks uncontroversial to me - it just adds
IFN support to build_popcount_expr, analogous to the changes you
suggested and approved for build_cltz_expr (and adjusts testcases
accordingly). I might have incorporated it into an earlier patch in this
series, if I hadn't already pushed that earlier patch.

Is this OK to leave in master now?

Thanks,
Andrew

On Thu, Dec 22, 2022 at 05:43:21PM +, Andrew Carlotti via Gcc-patches wrote:
> Bootstrapped and regression tested on aarch64-unknown-linux-gnu and
> x86_64-pc-linux-gnu - ok to merge?
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-loop-niter.cc (build_popcount_expr): Add IFN support.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/tree-ssa/pr86544.C: Add .POPCOUNT to tree scan regex.
>   * gcc.dg/tree-ssa/popcount.c: Likewise.
>   * gcc.dg/tree-ssa/popcount2.c: Likewise.
>   * gcc.dg/tree-ssa/popcount3.c: Likewise.
>   * gcc.target/aarch64/popcount4.c: Likewise.
>   * gcc.target/i386/pr95771.c: Likewise, and...
>   * gcc.target/i386/pr95771-2.c: ...split int128 test from above,
>   since this would emit just a single IFN if a TI optab is added.
> 
> ---
> 
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr86544.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
> index 
> ef438916a8019320564f444ace08e2f4b4190684..50befb36bac75de1cfa282e38358278b3288bd1c
>  100644
> --- a/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
> @@ -12,5 +12,5 @@ int PopCount (long b) {
>  return c;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } 
> */
> +/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 1 
> "optimized" } } */
>  /* { dg-final { scan-tree-dump-times "if" 0 "phiopt4" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
> index 
> b4694109411a4631697463519acbe7d9df65bf6e..efd906a0f5447f0beb3752eded3756999b02e6e6
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
> @@ -39,4 +39,4 @@ void PopCount3 (long b1) {
>}
>  }
>  
> -/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } 
> */
> +/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 3 
> "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
> index 
> ef73e345573de721833e98e89c252640a55f7c60..ae38a329bd4d868a762300d3218d68864c0fc4be
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
> @@ -26,4 +26,4 @@ int main()
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } 
> */
> +/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 1 
> "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
> index 
> ef438916a8019320564f444ace08e2f4b4190684..50befb36bac75de1cfa282e38358278b3288bd1c
>  100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
> @@ -12,5 +12,5 @@ int PopCount (long b) {
>  return c;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } 
> */
> +/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 1 
> "optimized" } } */
>  /* { dg-final { scan-tree-dump-times "if" 0 "phiopt4" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcount4.c 
> b/gcc/testsuite/gcc.target/aarch64/popcount4.c
> index 
> ee55b2e335223053ca024e95b7a13aa4af32550e..8aa15ff018d4b5fc6bb59e52af20d5c33cea2ee0
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/popcount4.c
> +++ b/gcc/testsuite/gcc.target/aarch64/popcount4.c
> @@ -11,4 +11,4 @@ int PopCount (long b) {
>  return c;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "__builtin_popcount" 0 "optimized" } } 
> */
> +/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 0 
> "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr95771-2.c 
> b/gcc/testsuite/gcc.target/i386/pr95771-2.c
> new file mode 100644
> index 
> ..1db9dc94d0b66477667624012221d6844c141a26
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i

[PATCH 9/8] middle-end: Allow build_popcount_expr to use an IFN

2022-12-22 Thread Andrew Carlotti via Gcc-patches
Bootstrapped and regression tested on aarch64-unknown-linux-gnu and
x86_64-pc-linux-gnu - ok to merge?

gcc/ChangeLog:

* tree-ssa-loop-niter.cc (build_popcount_expr): Add IFN support.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr86544.C: Add .POPCOUNT to tree scan regex.
* gcc.dg/tree-ssa/popcount.c: Likewise.
* gcc.dg/tree-ssa/popcount2.c: Likewise.
* gcc.dg/tree-ssa/popcount3.c: Likewise.
* gcc.target/aarch64/popcount4.c: Likewise.
* gcc.target/i386/pr95771.c: Likewise, and...
* gcc.target/i386/pr95771-2.c: ...split int128 test from above,
since this would emit just a single IFN if a TI optab is added.

---

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr86544.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
index 
ef438916a8019320564f444ace08e2f4b4190684..50befb36bac75de1cfa282e38358278b3288bd1c
 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
@@ -12,5 +12,5 @@ int PopCount (long b) {
 return c;
 }
 
-/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 1 
"optimized" } } */
 /* { dg-final { scan-tree-dump-times "if" 0 "phiopt4" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c 
b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
index 
b4694109411a4631697463519acbe7d9df65bf6e..efd906a0f5447f0beb3752eded3756999b02e6e6
 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -39,4 +39,4 @@ void PopCount3 (long b1) {
   }
 }
 
-/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 3 
"optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
index 
ef73e345573de721833e98e89c252640a55f7c60..ae38a329bd4d868a762300d3218d68864c0fc4be
 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
@@ -26,4 +26,4 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 1 
"optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
index 
ef438916a8019320564f444ace08e2f4b4190684..50befb36bac75de1cfa282e38358278b3288bd1c
 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
@@ -12,5 +12,5 @@ int PopCount (long b) {
 return c;
 }
 
-/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 1 
"optimized" } } */
 /* { dg-final { scan-tree-dump-times "if" 0 "phiopt4" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/popcount4.c 
b/gcc/testsuite/gcc.target/aarch64/popcount4.c
index 
ee55b2e335223053ca024e95b7a13aa4af32550e..8aa15ff018d4b5fc6bb59e52af20d5c33cea2ee0
 100644
--- a/gcc/testsuite/gcc.target/aarch64/popcount4.c
+++ b/gcc/testsuite/gcc.target/aarch64/popcount4.c
@@ -11,4 +11,4 @@ int PopCount (long b) {
 return c;
 }
 
-/* { dg-final { scan-tree-dump-times "__builtin_popcount" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_popcount|\\.POPCOUNT" 0 
"optimized" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr95771-2.c 
b/gcc/testsuite/gcc.target/i386/pr95771-2.c
new file mode 100644
index 
..1db9dc94d0b66477667624012221d6844c141a26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr95771-2.c
@@ -0,0 +1,17 @@
+/* PR tree-optimization/95771 */
+/* { dg-do compile } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2 -mpopcnt -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump " = __builtin_popcount| = \\.POPCOUNT" 
"optimized" } } */
+
+int
+corge (unsigned __int128 x)
+{
+  int i = 0;
+  while (x)
+{
+  x &= x - 1;
+  ++i;
+}
+  return i;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr95771.c 
b/gcc/testsuite/gcc.target/i386/pr95771.c
index 
d7b67017800b705b9854f561916c20901ea76803..d41be445f4a68613a082b8956fea3ceaf33d7e0f
 100644
--- a/gcc/testsuite/gcc.target/i386/pr95771.c
+++ b/gcc/testsuite/gcc.target/i386/pr95771.c
@@ -1,8 +1,7 @@
 /* PR tree-optimization/95771 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -mpopcnt -fdump-tree-optimized" } */
-/* { dg-final { scan-tree-dump-times " = __builtin_popcount" 6 "optimized" { 
target int128 } } } */
-/* { dg-final { scan-tree-dump-times " = __builtin_popcount" 4 "optimized" { 
target { ! int128 } } } } */
+/* { dg-final { scan-tree-dump-times " = __builtin_popcount| = \\.POPCOUNT" 4 
"optimized" } } */
 
 int
 foo (unsigned char x)
@@ -51,17 +50,3 @@ qux (unsigned long long x)
 }
   return i;
 }
-
-#ifdef __SIZEOF_INT128__
-int
-corge (unsigned 

[PATCH 6/8 v2] docs: Add popcount, clz and ctz target attributes

2022-12-22 Thread Andrew Carlotti via Gcc-patches
Updated to reflect Sphinx revert; I'll commit this once the
cltz_complement patch is merged.

gcc/ChangeLog:

* doc/sourcebuild.texi: Add missing target attributes.

---

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 
ffe69d6fcb9c46cf97ba570e85b56e586a0c9b99..1036b185ee289bbf7883bd14956a41da9a6d677b
 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2511,6 +2511,24 @@ Target supports the execution of @code{amx-fp16} 
instructions.
 @item cell_hw
 Test system can execute AltiVec and Cell PPU instructions.
 
+@item clz
+Target supports a clz optab on int.
+
+@item clzl
+Target supports a clz optab on long.
+
+@item clzll
+Target supports a clz optab on long long.
+
+@item ctz
+Target supports a ctz optab on int.
+
+@item ctzl
+Target supports a ctz optab on long.
+
+@item ctzll
+Target supports a ctz optab on long long.
+
 @item cmpccxadd
 Target supports the execution of @code{cmpccxadd} instructions.
 
@@ -2532,6 +2550,15 @@ Target does not require strict alignment.
 @item pie_copyreloc
 The x86-64 target linker supports PIE with copy reloc.
 
+@item popcount
+Target supports a popcount optab on int.
+
+@item popcountl
+Target supports a popcount optab on long.
+
+@item popcountll
+Target supports a popcount optab on long long.
+
 @item prefetchi
 Target supports the execution of @code{prefetchi} instructions.
 


  1   2   >