Re: [PATCH] aarch64: Add SVE instruction types

2023-09-12 Thread Evandro Menezes via Gcc-patches
Hi, Kyrill.

I wonder if the regression that you noticed was the same that I did.  Overall, 
thus far, there’s no significant regression that I can say is due to 
scheduling.  However, there is one benchmark, 507.cactuBSSN_r/607.cactuBSSN_s 
in SPEC2017, that regressed by more than 10%.  Upon closer examination, it 
seems that the change in the live ranges led to heavy spilling and to doubling 
of the stack size.  The spilling looks rather capricious though, as there seem 
to be enough free registers available.  

Is this similar to what you observed as well?  I tried to adjust the priority 
of memory ops through, TARGET_SCHED_ADJUST_PRIORITY, but it was innefective.  
I’m a bit at a loss what’s likely going on with the RA at this point.  Any 
pointers?

Thank you,

-- 
Evandro Menezes



> Em 16 de mai. de 2023, à(s) 03:36, Kyrylo Tkachov  
> escreveu:
> 
> Hi Evandro,
>  
> I created a new attribute so I didn’t have to extend the “type” attribute 
> that lives in config/arm/types.md. As that attribute and file lives in the 
> arm backend but SVE is AArch64-only I didn’t want to add logic to the arm 
> backend as it’s not truly shared.
> The granularity has been somewhat subjective. I had looked at the Software 
> Optimisation guides for various SVE and SVE2-capable cores from Arm on 
> developer.arm.com and tried to glean commonalities between different 
> instruction groups.
> I did try writing a model for Neoverse V1 using that classification but I 
> couldn’t spend much time on it and the resulting model didn’t give me much 
> improvements and gave some regressions instead.
> I think that was more down to my rushed model rather than anything else 
> though.
>  
> Thanks,
> Kyrill
>  
> From: Evandro Menezes  
> Sent: Monday, May 15, 2023 9:13 PM
> To: Kyrylo Tkachov 
> Cc: Richard Sandiford ; Evandro Menezes via 
> Gcc-patches ; evandro+...@gcc.gnu.org; Tamar 
> Christina 
> Subject: Re: [PATCH] aarch64: Add SVE instruction types
>  
> Hi, Kyrill.
>  
> I wasn’t aware of your previous patch.  Could you clarify why you considered 
> creating an SVE specific type attribute instead of reusing the common one?  I 
> really liked the iterators that you created; I’d like to use them.
>  
> Do you have specific examples which you might want to mention with regards to 
> granularity?
>  
> Yes, my intent for this patch is to enable modeling the SVE instructions on 
> N1.  The patch that implements it brings up some performance improvements, 
> but it’s mostly flat, as expected.
>  
> Thank you,
> 
> -- 
> Evandro Menezes
>  
>  
> 
> 
> Em 15 de mai. de 2023, à(s) 04:49, Kyrylo Tkachov  <mailto:kyrylo.tkac...@arm.com>> escreveu:
>  
> 
> 
> 
> -Original Message-
> From: Richard Sandiford  <mailto:richard.sandif...@arm.com>>
> Sent: Monday, May 15, 2023 10:01 AM
> To: Evandro Menezes via Gcc-patches  <mailto:gcc-patches@gcc.gnu.org>>
> Cc: evandro+...@gcc.gnu.org <mailto:evandro+...@gcc.gnu.org>; Evandro Menezes 
> mailto:ebah...@icloud.com>>;
> Kyrylo Tkachov mailto:kyrylo.tkac...@arm.com>>; 
> Tamar Christina
> mailto:tamar.christ...@arm.com>>
> Subject: Re: [PATCH] aarch64: Add SVE instruction types
> 
> Evandro Menezes via Gcc-patches  <mailto:gcc-patches@gcc.gnu.org>> writes:
> 
> This patch adds the attribute `type` to most SVE1 instructions, as in the
> other
> 
> instructions.
> 
> Thanks for doing this.
> 
> Could you say what criteria you used for picking the granularity?  Other
> maintainers might disagree, but personally I'd prefer to distinguish two
> instructions only if:
> 
> (a) a scheduling description really needs to distinguish them or
> (b) grouping them together would be very artificial (because they're
>logically unrelated)
> 
> It's always possible to split types later if new scheduling descriptions
> require it.  Because of that, I don't think we should try to predict ahead
> of time what future scheduling descriptions will need.
> 
> Of course, this depends on having results that show that scheduling
> makes a significant difference on an SVE core.  I think one of the
> problems here is that, when a different scheduling model changes the
> performance of a particular test, it's difficult to tell whether
> the gain/loss is caused by the model being more/less accurate than
> the previous one, or if it's due to important "secondary" effects
> on register live ranges.  Instinctively, I'd have expected these
> secondary effects to dominate on OoO cores.
> 
> I agree with Richard on these points. The key here is getting the granularity 
> right without having too maintain too many t

Re: [PATCH] aarch64: Add SVE instruction types

2023-05-16 Thread Evandro Menezes via Gcc-patches
Hi, Kyrill.

It makes sense.  I could add the classification to a different attribute as you 
did and keep it in aarch64 as well.

I took the same approach, gleaning over several optimization guides for Arm 
processors supporting SVE and figuring out the smallest number of types that 
could cover most variations of resources used.  Methinks that the 
classification in this patch is close to that goal, but feedback is appreciated.

I did observe a meaningful gain in performance.  Of course, wide machines like 
the V1 can handle most instruction sequences thrown at it, but there’s still 
some efficiency left on the table without a tailored scheduling, especially 
when recovering from cache or branch misses, when it’s important to quickly 
fill up the pipeline back to regime, albeit umpteen transistors are dedicated 
to make sure that misses do not happen often.

Thank you,

-- 
Evandro Menezes



> Em 16 de mai. de 2023, à(s) 03:36, Kyrylo Tkachov  
> escreveu:
> 
> Hi Evandro,
>  
> I created a new attribute so I didn’t have to extend the “type” attribute 
> that lives in config/arm/types.md. As that attribute and file lives in the 
> arm backend but SVE is AArch64-only I didn’t want to add logic to the arm 
> backend as it’s not truly shared.
> The granularity has been somewhat subjective. I had looked at the Software 
> Optimisation guides for various SVE and SVE2-capable cores from Arm on 
> developer.arm.com <http://developer.arm.com/> and tried to glean 
> commonalities between different instruction groups.
> I did try writing a model for Neoverse V1 using that classification but I 
> couldn’t spend much time on it and the resulting model didn’t give me much 
> improvements and gave some regressions instead.
> I think that was more down to my rushed model rather than anything else 
> though.
>  
> Thanks,
> Kyrill
>  
> From: Evandro Menezes  
> Sent: Monday, May 15, 2023 9:13 PM
> To: Kyrylo Tkachov 
> Cc: Richard Sandiford ; Evandro Menezes via 
> Gcc-patches ; evandro+...@gcc.gnu.org; Tamar 
> Christina 
> Subject: Re: [PATCH] aarch64: Add SVE instruction types
>  
> Hi, Kyrill.
>  
> I wasn’t aware of your previous patch.  Could you clarify why you considered 
> creating an SVE specific type attribute instead of reusing the common one?  I 
> really liked the iterators that you created; I’d like to use them.
>  
> Do you have specific examples which you might want to mention with regards to 
> granularity?
>  
> Yes, my intent for this patch is to enable modeling the SVE instructions on 
> N1.  The patch that implements it brings up some performance improvements, 
> but it’s mostly flat, as expected.
>  
> Thank you,
> 
> -- 
> Evandro Menezes
>  
>  
> 
> 
> Em 15 de mai. de 2023, à(s) 04:49, Kyrylo Tkachov  <mailto:kyrylo.tkac...@arm.com>> escreveu:
>  
> 
> 
> 
> -Original Message-
> From: Richard Sandiford  <mailto:richard.sandif...@arm.com>>
> Sent: Monday, May 15, 2023 10:01 AM
> To: Evandro Menezes via Gcc-patches  <mailto:gcc-patches@gcc.gnu.org>>
> Cc: evandro+...@gcc.gnu.org <mailto:evandro+...@gcc.gnu.org>; Evandro Menezes 
> mailto:ebah...@icloud.com>>;
> Kyrylo Tkachov mailto:kyrylo.tkac...@arm.com>>; 
> Tamar Christina
> mailto:tamar.christ...@arm.com>>
> Subject: Re: [PATCH] aarch64: Add SVE instruction types
> 
> Evandro Menezes via Gcc-patches  <mailto:gcc-patches@gcc.gnu.org>> writes:
> 
> This patch adds the attribute `type` to most SVE1 instructions, as in the
> other
> 
> instructions.
> 
> Thanks for doing this.
> 
> Could you say what criteria you used for picking the granularity?  Other
> maintainers might disagree, but personally I'd prefer to distinguish two
> instructions only if:
> 
> (a) a scheduling description really needs to distinguish them or
> (b) grouping them together would be very artificial (because they're
>logically unrelated)
> 
> It's always possible to split types later if new scheduling descriptions
> require it.  Because of that, I don't think we should try to predict ahead
> of time what future scheduling descriptions will need.
> 
> Of course, this depends on having results that show that scheduling
> makes a significant difference on an SVE core.  I think one of the
> problems here is that, when a different scheduling model changes the
> performance of a particular test, it's difficult to tell whether
> the gain/loss is caused by the model being more/less accurate than
> the previous one, or if it's due to important "secondary" effects
> on register live ranges.  Instinctively, I'd have expected these
> secondary effects to dominate on OoO cor

Re: [PATCH] aarch64: Add SVE instruction types

2023-05-15 Thread Evandro Menezes via Gcc-patches
Hi, Kyrill.

I wasn’t aware of your previous patch.  Could you clarify why you considered 
creating an SVE specific type attribute instead of reusing the common one?  I 
really liked the iterators that you created; I’d like to use them.

Do you have specific examples which you might want to mention with regards to 
granularity?

Yes, my intent for this patch is to enable modeling the SVE instructions on N1. 
 The patch that implements it brings up some performance improvements, but it’s 
mostly flat, as expected.

Thank you,

-- 
Evandro Menezes



> Em 15 de mai. de 2023, à(s) 04:49, Kyrylo Tkachov  
> escreveu:
> 
> 
> 
>> -Original Message-
>> From: Richard Sandiford > <mailto:richard.sandif...@arm.com>>
>> Sent: Monday, May 15, 2023 10:01 AM
>> To: Evandro Menezes via Gcc-patches > <mailto:gcc-patches@gcc.gnu.org>>
>> Cc: evandro+...@gcc.gnu.org <mailto:evandro+...@gcc.gnu.org>; Evandro 
>> Menezes mailto:ebah...@icloud.com>>;
>> Kyrylo Tkachov mailto:kyrylo.tkac...@arm.com>>; 
>> Tamar Christina
>> mailto:tamar.christ...@arm.com>>
>> Subject: Re: [PATCH] aarch64: Add SVE instruction types
>> 
>> Evandro Menezes via Gcc-patches  writes:
>>> This patch adds the attribute `type` to most SVE1 instructions, as in the
>> other
>>> instructions.
>> 
>> Thanks for doing this.
>> 
>> Could you say what criteria you used for picking the granularity?  Other
>> maintainers might disagree, but personally I'd prefer to distinguish two
>> instructions only if:
>> 
>> (a) a scheduling description really needs to distinguish them or
>> (b) grouping them together would be very artificial (because they're
>>logically unrelated)
>> 
>> It's always possible to split types later if new scheduling descriptions
>> require it.  Because of that, I don't think we should try to predict ahead
>> of time what future scheduling descriptions will need.
>> 
>> Of course, this depends on having results that show that scheduling
>> makes a significant difference on an SVE core.  I think one of the
>> problems here is that, when a different scheduling model changes the
>> performance of a particular test, it's difficult to tell whether
>> the gain/loss is caused by the model being more/less accurate than
>> the previous one, or if it's due to important "secondary" effects
>> on register live ranges.  Instinctively, I'd have expected these
>> secondary effects to dominate on OoO cores.
> 
> I agree with Richard on these points. The key here is getting the granularity 
> right without having too maintain too many types that aren't useful in the 
> models.
> FWIW I had posted 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607101.html in 
> November. It adds annotations to SVE2 patterns as well as for base SVE.
> Feel free to reuse it if you'd like.
> I see you had posted a Neoverse V1 scheduling model. Does that give an 
> improvement on SVE code when combined with the scheduling attributes somehow?
> Thanks,
> Kyrill



Re: [PATCH] aarch64: Add SVE instruction types

2023-05-15 Thread Evandro Menezes via Gcc-patches
Hi, Richard.

My criteria were very much (a).  In some cases though, a particular instruction 
could have variations that others in its natural group didn’t, when if seemed 
sensible to create a specific description for this instruction, even if its 
base form shares resources with other instructions in its group.

Do you have specific instances in mind?

Thank you,

-- 
Evandro Menezes



> Em 15 de mai. de 2023, à(s) 04:00, Richard Sandiford 
>  escreveu:
> 
> Evandro Menezes via Gcc-patches  writes:
>> This patch adds the attribute `type` to most SVE1 instructions, as in the 
>> other
>> instructions.
> 
> Thanks for doing this.
> 
> Could you say what criteria you used for picking the granularity?  Other
> maintainers might disagree, but personally I'd prefer to distinguish two
> instructions only if:
> 
> (a) a scheduling description really needs to distinguish them or
> (b) grouping them together would be very artificial (because they're
>logically unrelated)
> 
> It's always possible to split types later if new scheduling descriptions
> require it.  Because of that, I don't think we should try to predict ahead
> of time what future scheduling descriptions will need.
> 
> Of course, this depends on having results that show that scheduling
> makes a significant difference on an SVE core.  I think one of the
> problems here is that, when a different scheduling model changes the
> performance of a particular test, it's difficult to tell whether
> the gain/loss is caused by the model being more/less accurate than
> the previous one, or if it's due to important "secondary" effects
> on register live ranges.  Instinctively, I'd have expected these
> secondary effects to dominate on OoO cores.
> 
> Richard


-- 
Evandro Menezes ◊ evan...@yahoo.com ◊ Austin, TX
Άγιος ο Θεός ⁂ ܩܕܝܫܐ ܐܢ̱ܬ ܠܐ ܡܝܘܬܐ ⁂ Sanctus Deus





[PATCH] aarch64: Add SVE instruction types

2023-05-12 Thread Evandro Menezes via Gcc-patches
This patch adds the attribute `type` to most SVE1 instructions, as in the other 
instructions.

-- 
Evandro Menezes



0002-aarch64-Add-SVE-instruction-types.patch
Description: Binary data




aarch64: Add scheduling model for Neoverse V1

2023-05-07 Thread Evandro Menezes via Gcc-patches
This patch adds the scheduling model for Neoverse V1, based on the information 
from the “Arm Neoverse V1 Software Optimization Guide” and on static and 
dynamic analysis of internal and public benchmarks.  Results are forthcoming. 

-- 
Evandro Menezes


0001-aarch64-Add-scheduling-model-for-Neoverse-V1.patch
Description: Binary data



Re: [PATCH] aarch64: Add the cost model for Neoverse N1

2023-04-24 Thread Evandro Menezes via Gcc-patches
Sorry, but it seems that, before sending, the email client is stripping leading 
spaces.  I’m attaching the file here.

-- 
Evandro Menezes ◊ evan...@yahoo.com ◊ Austin, TX
Άγιος ο Θεός ⁂ ܩܕܝܫܐ ܐܢ̱ܬ ܠܐ ܡܝܘܬܐ ⁂ Sanctus Deus

> Em 24 de abr. de 2023, à(s) 17:48, Evandro Menezes  
> escreveu:
> 
> Hi, Tamara.
> 
> Does this work?
> 
> Thank you,
> 
> -- 
> Evandro Menezes ◊ evan...@yahoo.com ◊ Austin, TX
> Άγιος ο Θεός ⁂ ܩܕܝܫܐ ܐܢ̱ܬ ܠܐ ܡܝܘܬܐ ⁂ Sanctus Deus
> 
>> Em 24 de abr. de 2023, à(s) 12:37, Tamar Christina  
>> escreveu:
>> 
>> Hi Evandro,
>> 
>> I wanted to give this patch a try, but the diff seems corrupt, the 
>> whitespaces at the start of the context lines seem to have gone missing.
>> 
>> Could you try resending it?
>> 
>> Thanks,
>> Tamar



Re: [PATCH] aarch64: Add the cost model for Neoverse N1

2023-04-24 Thread Evandro Menezes via Gcc-patches
Hi, Tamara.

Does this work?

Thank you,

-- 
Evandro Menezes ◊ evan...@yahoo.com ◊ Austin, TX
Άγιος ο Θεός ⁂ ܩܕܝܫܐ ܐܢ̱ܬ ܠܐ ܡܝܘܬܐ ⁂ Sanctus Deus

> Em 24 de abr. de 2023, à(s) 12:37, Tamar Christina  
> escreveu:
> 
> Hi Evandro,
> 
> I wanted to give this patch a try, but the diff seems corrupt, the 
> whitespaces at the start of the context lines seem to have gone missing.
> 
> Could you try resending it?
> 
> Thanks,
> Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Use the Neoverse N1 cost model.
* config/aarch64/aarch64.cc
(cortexa76_tunings): Rename variable.
(neoversen1_addrcost_table): New variable.
(neoversen1_vector_cost): Likewise.
(neoversen1_regmove_cost): Likewise.
(neoversen1_advsimd_vector_cost): Likewise.
(neoversen1_scalar_issue_info): Likewise.
(neoversen1_advsimd_issue_info): Likewise.
(neoversen1_vec_issue_info): Likewise.
(neoversen1_vector_cost): Likewise.
(neoversen1_tunings): Likewise.
* config/arm/aarch-cost-tables.h
(neoversen1_extra_costs): New variable.

Signed-off-by: Evandro Menezes 
---
 gcc/config/aarch64/aarch64-cores.def |  20 ++--
 gcc/config/aarch64/aarch64.cc| 155 ---
 gcc/config/arm/aarch-cost-tables.h   | 107 ++
 3 files changed, 259 insertions(+), 23 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 2ec88c98400..e352e4077b1 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -105,17 +105,17 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,  
thunderx2t99, V8_1A,  (CRYPTO), thu
 /* ARM ('A') cores. */
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD), cortexa53, 0x41, 0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD), cortexa73, 0x41, 0xd0a, -1)
-AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD), neoversen1, 0x41, 0xd0b, -1)
-AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), neoversen1, 0x41, 0xd0e, -1)
-AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), neoversen1, 0x41, 0xd0d, -1)
-AARCH64_CORE("cortex-a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), neoversen1, 0x41, 0xd41, -1)
-AARCH64_CORE("cortex-a78ae",  cortexa78ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE), neoversen1, 0x41, 0xd42, -1)
-AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), neoversen1, 0x41, 0xd4b, -1)
+AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD), cortexa76, 0x41, 0xd0b, -1)
+AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa76, 0x41, 0xd0e, -1)
+AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), cortexa76, 0x41, 0xd0d, -1)
+AARCH64_CORE("cortex-a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), cortexa76, 0x41, 0xd41, -1)
+AARCH64_CORE("cortex-a78ae",  cortexa78ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE), cortexa76, 0x41, 0xd42, -1)
+AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), cortexa76, 0x41, 0xd4b, -1)
 AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), cortexa73, 0x41, 0xd06, -1)
 AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd43, -1)
-AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), neoversen1, 0x41, 0xd44, -1)
-AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE, PAUTH), neoversen1, 0x41, 0xd4c, -1)
-AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
neoversen1, 0x41, 0xd0c, -1)
+AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), cortexa76, 0x41, 0xd44, -1)
+AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE, PAUTH), cortexa76, 0x41, 0xd4c, -1)
+AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
cortexa76, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1)
 
@@ -160,7 +160,7 @@ AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortexa53, 
cortexa53, V8A,  (CRC
 /* ARM DynamIQ big.LITTLE configurations.  */
 
 AARCH64_CORE("cortex-a75.cortex-a55",  cortexa75cortexa55, cortexa53, V8_2A,  
(F16, RCPC, DOTPROD), cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd0a, 0xd05), -1)
-AARCH64_CORE("cortex-a76.cortex-a55",

[PATCH] aarch64: Add the cost model for Neoverse N1

2023-04-18 Thread Evandro Menezes via Gcc-patches
This patch adds the cost model for Neoverse N1, based on the information from 
the "Arm Neoverse N1 Software Optimization Guide”.

-- 
Evandro Menezes



gcc/ChangeLog:

   * config/aarch64/aarch64-cores.def: Use the Neoverse N1 cost model.
   * config/aarch64/aarch64.cc
   (cortexa76_tunings): Rename variable.
   (neoversen1_addrcost_table): New variable.
   (neoversen1_vector_cost): Likewise.
   (neoversen1_regmove_cost): Likewise.
   (neoversen1_advsimd_vector_cost): Likewise.
   (neoversen1_scalar_issue_info): Likewise.
   (neoversen1_advsimd_issue_info): Likewise.
   (neoversen1_vec_issue_info): Likewise.
   (neoversen1_vector_cost): Likewise.
   (neoversen1_tunings): Likewise.
   * config/arm/aarch-cost-tables.h
   (neoversen1_extra_costs): New variable.

Signed-off-by: Evandro Menezes 
---
gcc/config/aarch64/aarch64-cores.def |  20 ++--
gcc/config/aarch64/aarch64.cc| 155 ---
gcc/config/arm/aarch-cost-tables.h   | 107 ++
3 files changed, 259 insertions(+), 23 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 2ec88c98400..e352e4077b1 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -105,17 +105,17 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,  
thunderx2t99, V8_1A,  (CRYPTO), thu
/* ARM ('A') cores. */
AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, V8_2A,  (F16, RCPC, DOTPROD), 
cortexa53, 0x41, 0xd05, -1)
AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, V8_2A,  (F16, RCPC, DOTPROD), 
cortexa73, 0x41, 0xd0a, -1)
-AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD), neoversen1, 0x41, 0xd0b, -1)
-AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), neoversen1, 0x41, 0xd0e, -1)
-AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), neoversen1, 0x41, 0xd0d, -1)
-AARCH64_CORE("cortex-a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), neoversen1, 0x41, 0xd41, -1)
-AARCH64_CORE("cortex-a78ae",  cortexa78ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE), neoversen1, 0x41, 0xd42, -1)
-AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), neoversen1, 0x41, 0xd4b, -1)
+AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD), cortexa76, 0x41, 0xd0b, -1)
+AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa76, 0x41, 0xd0e, -1)
+AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), cortexa76, 0x41, 0xd0d, -1)
+AARCH64_CORE("cortex-a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), cortexa76, 0x41, 0xd41, -1)
+AARCH64_CORE("cortex-a78ae",  cortexa78ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE), cortexa76, 0x41, 0xd42, -1)
+AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), cortexa76, 0x41, 0xd4b, -1)
AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), cortexa73, 0x41, 0xd06, -1)
AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd43, -1)
-AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), neoversen1, 0x41, 0xd44, -1)
-AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE, PAUTH), neoversen1, 0x41, 0xd4c, -1)
-AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
neoversen1, 0x41, 0xd0c, -1)
+AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), cortexa76, 0x41, 0xd44, -1)
+AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE, PAUTH), cortexa76, 0x41, 0xd4c, -1)
+AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
cortexa76, 0x41, 0xd0c, -1)
AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1)

@@ -160,7 +160,7 @@ AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortexa53, 
cortexa53, V8A,  (CRC
/* ARM DynamIQ big.LITTLE configurations.  */

AARCH64_CORE("cortex-a75.cortex-a55",  cortexa75cortexa55, cortexa53, V8_2A,  
(F16, RCPC, DOTPROD), cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd0a, 0xd05), -1)
-AARCH64_CORE("cortex-a76.cortex-a55",  cortexa76cortexa55, cortexa53, V8_2A,  
(F16, RCPC, DOTPROD), neoversen1, 0x41, AARCH64_BIG_LITTLE (0xd0b, 0xd05), -1)
+AARCH64_CORE("cortex-a76.cortex-a55",  cortexa76cortexa55, cortexa53, V8_2A,  
(F16, RCPC, DOTPROD), corte

[PATCH] aarch64: Add the scheduling model for Neoverse N1

2023-04-18 Thread Evandro Menezes via Gcc-patches
This patch adds the scheduling model for Neoverse N1, based on the information 
from the "Arm Neoverse N1 Software Optimization Guide”.

-- 
Evandro Menezes



gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Use the Neoverse N1 scheduling 
model.
* config/aarch64/aarch64.md: Include `neoverse-n1.md`.
* config/aarch64/neoverse-n1.md: New file.

Signed-off-by: Evandro Menezes 
---
 gcc/config/aarch64/aarch64-cores.def |   2 +-
 gcc/config/aarch64/aarch64.md|   1 +
 gcc/config/aarch64/neoverse-n1.md| 711 +++
 3 files changed, 713 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/aarch64/neoverse-n1.md

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index e352e4077b1..cc842c4e22c 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -116,7 +116,7 @@ AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, 
V8_2A,  (F16, RCPC, DOTPRO
 AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), cortexa76, 0x41, 0xd44, -1)
 AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE, PAUTH), cortexa76, 0x41, 0xd4c, -1)
 AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
cortexa76, 0x41, 0xd0c, -1)
-AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
+AARCH64_CORE("neoverse-n1",  neoversen1, neoversen1, V8_2A,  (F16, RCPC, 
DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1)
 
 /* Cavium ('C') cores. */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 022eef80bc1..6cb9e31259b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -471,6 +471,7 @@
 (include "../arm/cortex-a57.md")
 (include "../arm/exynos-m1.md")
 (include "falkor.md")
+(include "neoverse-n1.md")
 (include "saphira.md")
 (include "thunderx.md")
 (include "../arm/xgene1.md")
diff --git a/gcc/config/aarch64/neoverse-n1.md 
b/gcc/config/aarch64/neoverse-n1.md
new file mode 100644
index 000..d66fa10c330
--- /dev/null
+++ b/gcc/config/aarch64/neoverse-n1.md
@@ -0,0 +1,711 @@
+;; Arm Neoverse N1 pipeline description
+;; (Based on the "Arm Neoverse N1 Software Optimization Guide")
+;;
+;; Copyright (C) 2014-2023 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+;; The Neoverse N1 core is modelled as a multiple issue pipeline that has
+;; the following functional units.
+
+(define_automaton "neoverse_n1")
+
+;; 1 - Two pipelines for integer operations: SX1, SX2.
+
+(define_cpu_unit "neon1_sx1_issue" "neoverse_n1")
+(define_reservation "neon1_sx1" "neon1_sx1_issue")
+
+(define_cpu_unit "neon1_sx2_issue" "neoverse_n1")
+(define_reservation "neon1_sx2" "neon1_sx2_issue")
+
+;; 2 - One pipeline for complex integer operations: MX.
+
+(define_cpu_unit "neon1_mx_issue"
+"neoverse_n1")
+(define_reservation "neon1_mx" "neon1_mx_issue")
+(define_reservation "neon1_m_block" "neon1_mx_issue")
+
+;; 3 - Two asymmetric pipelines for Neon and FP operations: CX1, CX2.
+(define_automaton "neoverse_n1_cx")
+
+(define_cpu_unit "neon1_cx1_issue"
+"neoverse_n1_cx")
+(define_cpu_unit "neon1_cx2_issue"
+"neoverse_n1_cx")
+
+(define_reservation "neon1_cx1" "neon1_cx1_issue")
+(define_reservation "neon1_cx2" "neon1_cx2_issue")
+(define_reservation "neon1_v0_block" "neon1_cx1_issue")
+
+;; 4 - One pipeline for branch operations: BX.
+
+(define_cpu_unit "neon1_bx_issue" "neoverse_n1")
+(define_reservation "neon1_bx" "neon1_bx_issue")
+
+;; 5 - Two pipelines for load and store operations: LS1, LS2.
+
+(define_cpu_unit "neon1_ls1_issue" "neoverse_n1")
+(define_reservation "neon1_ls1" "neon1_ls1_issue")
+
+(define_cpu_unit "neon1_ls2_issue" "neoverse_n1")
+(define_reservation "neon1_ls2" "neon1_ls2_issue")
+
+;; Block all issue queues.
+
+(define_reservation "neon1_block" "neon1_sx1_issue + neon1_sx2_issue
+ + neon1_mx_issue
+ + neon1_cx1_issue + neon1_c

Re: [PATCH] aarch64: Add the cost and scheduling models for Neoverse N1

2023-04-17 Thread Evandro Menezes via Gcc-patches
Hi, Kyrylo.

> Em 11 de abr. de 2023, à(s) 04:41, Kyrylo Tkachov  
> escreveu:
> 
>> -Original Message-
>> From: Gcc-patches > bounces+kyrylo.tkachov=arm@gcc.gnu.org 
>> <mailto:bounces+kyrylo.tkachov=arm@gcc.gnu.org>> On Behalf Of Evandro
>> Menezes via Gcc-patches
>> Sent: Friday, April 7, 2023 11:34 PM
>> To: gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>
>> Cc: Evandro Menezes mailto:ebah...@icloud.com>>; 
>> Richard Sandiford
>> mailto:richard.sandif...@arm.com>>
>> Subject: [PATCH] aarch64: Add the cost and scheduling models for Neoverse
>> N1
>> 
>> This patch adds the cost and scheduling models for Neoverse N1, based on
>> the information from the "Arm Neoverse N1 Software Optimization Guide”.
>> 
> 
> Thank you for working on this. It is true that we haven't added any 
> scheduling models for big cores from Arm for quite a while.

Could you share what motivated y’all not to?

> How has this patch been tested and benchmarked?

I’ve tested it with some small and large benchmarks, for both static and 
dynamic analysis.

> Using numbers from the Software Optimization Guide is certainly the way to 
> go, but we need to ensure that the way GCC uses them actually results in 
> better performance in practice.

Of course.

>> [PATCH] aarch64: Add the cost and scheduling models for Neoverse N1
>> 
>> gcc/ChangeLog:
>> 
>>* config/aarch64/aarch64-cores.def:
>>Use the Neoverse N1 scheduling and cost models, but only for itself.
>>  * config/aarch64/aarch64.cc
>>(cortexa76_tunings): Rename variable.
>>  (neoversen1_addrcost_table): New variable.
>>  (neoversen1_vector_cost): Likewise.
>>  (neoversen1_regmove_cost): Likewise.
>>  (neoversen1_advsimd_vector_cost): Likewise.
>>  (neoversen1_scalar_issue_info): Likewise.
>>  (neoversen1_advsimd_issue_info): Likewise.
>>  (neoversen1_vec_issue_info): Likewise.
>>  (neoversen1_vector_cost): Likewise.
>>  (neoversen1_tunings): Likewise.
>>  * config/aarch64/aarch64.md: Include `neoverse-n1.md`.
>>  * config/aarch64/neoverse-n1.md: New file.
>>  * gcc/config/arm/aarch-cost-tables.h
>>  (neoversen1_extra_costs): New variable.
>> 
>> Signed-off-by: Evandro Menezes 
>> 
>> ---
>> gcc/config/aarch64/aarch64-cores.def |  22 +-
>> gcc/config/aarch64/aarch64.cc| 155 +-
>> gcc/config/aarch64/aarch64.md|   1 +
>> gcc/config/aarch64/neoverse-n1.md| 716 +++
>> gcc/config/arm/aarch-cost-tables.h   | 107 
>> 5 files changed, 977 insertions(+), 24 deletions(-)
>> create mode 100644 gcc/config/aarch64/neoverse-n1.md
>> 
>> diff --git a/gcc/config/aarch64/aarch64-cores.def
>> b/gcc/config/aarch64/aarch64-cores.def
>> index 2ec88c98400..cc842c4e22c 100644
>> --- a/gcc/config/aarch64/aarch64-cores.def
>> +++ b/gcc/config/aarch64/aarch64-cores.def
>> @@ -105,18 +105,18 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,
>> thunderx2t99, V8_1A,  (CRYPTO), thu
>> /* ARM ('A') cores. */
>> AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, V8_2A,  (F16, RCPC,
>> DOTPROD), cortexa53, 0x41, 0xd05, -1)
>> AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, V8_2A,  (F16, RCPC,
>> DOTPROD), cortexa73, 0x41, 0xd0a, -1)
>> -AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC,
>> DOTPROD), neoversen1, 0x41, 0xd0b, -1)
>> -AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16,
>> RCPC, DOTPROD, SSBS), neoversen1, 0x41, 0xd0e, -1)
>> -AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC,
>> DOTPROD, SSBS), neoversen1, 0x41, 0xd0d, -1)
>> -AARCH64_CORE("cortex-a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC,
>> DOTPROD, SSBS, PROFILE), neoversen1, 0x41, 0xd41, -1)
>> -AARCH64_CORE("cortex-a78ae",  cortexa78ae, cortexa57, V8_2A,  (F16,
>> RCPC, DOTPROD, SSBS, PROFILE), neoversen1, 0x41, 0xd42, -1)
>> -AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  (F16, RCPC,
>> DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), neoversen1, 0x41, 0xd4b, -1)
>> +AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC,
>> DOTPROD), cortexa76, 0x41, 0xd0b, -1)
>> +AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16,
>> RCPC, DOTPROD, SSBS), cortexa76, 0x41, 0xd0e, -1)
>> +AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC,
>>

[PATCH] aarch64: Add the cost and scheduling models for Neoverse N1

2023-04-07 Thread Evandro Menezes via Gcc-patches
This patch adds the cost and scheduling models for Neoverse N1, based on the 
information from the "Arm Neoverse N1 Software Optimization Guide”.

-- 
Evandro Menezes ◊ evan...@yahoo.com

[PATCH] aarch64: Add the cost and scheduling models for Neoverse N1

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def:
Use the Neoverse N1 scheduling and cost models, but only for itself.
* config/aarch64/aarch64.cc
(cortexa76_tunings): Rename variable.
(neoversen1_addrcost_table): New variable.
(neoversen1_vector_cost): Likewise.
(neoversen1_regmove_cost): Likewise.
(neoversen1_advsimd_vector_cost): Likewise.
(neoversen1_scalar_issue_info): Likewise.
(neoversen1_advsimd_issue_info): Likewise.
(neoversen1_vec_issue_info): Likewise.
(neoversen1_vector_cost): Likewise.
(neoversen1_tunings): Likewise.
* config/aarch64/aarch64.md: Include `neoverse-n1.md`.
* config/aarch64/neoverse-n1.md: New file.
* gcc/config/arm/aarch-cost-tables.h
(neoversen1_extra_costs): New variable.

Signed-off-by: Evandro Menezes 

---
 gcc/config/aarch64/aarch64-cores.def |  22 +-
 gcc/config/aarch64/aarch64.cc| 155 +-
 gcc/config/aarch64/aarch64.md|   1 +
 gcc/config/aarch64/neoverse-n1.md| 716 +++
 gcc/config/arm/aarch-cost-tables.h   | 107 
 5 files changed, 977 insertions(+), 24 deletions(-)
 create mode 100644 gcc/config/aarch64/neoverse-n1.md

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 2ec88c98400..cc842c4e22c 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -105,18 +105,18 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,  
thunderx2t99, V8_1A,  (CRYPTO), thu
 /* ARM ('A') cores. */
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD), cortexa53, 0x41, 0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD), cortexa73, 0x41, 0xd0a, -1)
-AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD), neoversen1, 0x41, 0xd0b, -1)
-AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), neoversen1, 0x41, 0xd0e, -1)
-AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), neoversen1, 0x41, 0xd0d, -1)
-AARCH64_CORE("cortex-a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), neoversen1, 0x41, 0xd41, -1)
-AARCH64_CORE("cortex-a78ae",  cortexa78ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE), neoversen1, 0x41, 0xd42, -1)
-AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), neoversen1, 0x41, 0xd4b, -1)
+AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD), cortexa76, 0x41, 0xd0b, -1)
+AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa76, 0x41, 0xd0e, -1)
+AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), cortexa76, 0x41, 0xd0d, -1)
+AARCH64_CORE("cortex-a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), cortexa76, 0x41, 0xd41, -1)
+AARCH64_CORE("cortex-a78ae",  cortexa78ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE), cortexa76, 0x41, 0xd42, -1)
+AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), cortexa76, 0x41, 0xd4b, -1)
 AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), cortexa73, 0x41, 0xd06, -1)
 AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd43, -1)
-AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), neoversen1, 0x41, 0xd44, -1)
-AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE, PAUTH), neoversen1, 0x41, 0xd4c, -1)
-AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
neoversen1, 0x41, 0xd0c, -1)
-AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
+AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), cortexa76, 0x41, 0xd44, -1)
+AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE, PAUTH), cortexa76, 0x41, 0xd4c, -1)
+AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
cortexa76, 0x41, 0xd0c, -1)
+AARCH64_CORE("neoverse-n1",  neoversen1, neoversen1, V8_2A,  (F16, RCPC, 
DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1)
 
 /* Cavium ('C') cores. */
@@ -160,7 +160,7 @@ AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortex