RE: Zen4 tuning part 1 - cost tables

2022-12-08 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Official Use Only - General]

Hi Honza,

Thank you for posting the tuning patch.

> -Original Message-
> From: Jan Hubicka 
> Sent: Tuesday, December 6, 2022 3:31 PM
> To: gcc-patches@gcc.gnu.org; mjam...@suse.cz; Alexander Monakov
> ; Kumar, Venkataramanan
> ; Joshi, Tejas Sanjay
> 
> Subject: Zen4 tuning part 1 - cost tables
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> Hi
> this patch updates cost of znver4 mostly based on data measued by Agner
> Fog.
> Compared to previous generations x87 became bit slower which is probably
> not big deal (and we have minimal benchmarking coverage for it).  One
> interesting improvement is reducation of FMA cost.  I also updated costs of
> AVX256 loads/stores  based on latencies (not throughput which is twice of
> avx256).
> Overall AVX512 vectorization seems to improve noticeably some of TSVC
> benchmarks but since internally 512 vectors are split to 256 vectors it is
> somewhat risky and does not win in SPEC scores (mostly by regressing
> benchmarks with loop that have small trip count like x264 and exchange), so
> for now I am going to set AVX256_OPTIMAL tune but I am still playing with it.
> We improved since ZNVER1 on choosing vectorization size and also have
> vectorized prologues/epilogues so it may be possible to make avx512 small
> win overall.

I also noted improvements to TSVC benchmarks when we enable AVX512 
vectorization.  I think we should allow full AVX512 bit vectorization for 
znver4.   Even if the 512 vectors are broken into two 256 vectors we can 
pipeline the higher half immediately in the next cycle.  Also we have less 
instructions to decode with avx512 instructions.  Overall AVX512 operations 
should be better.

>
> In general I would like to keep cost tables latency based unless we have a
> good reason to not do so.  There are some interesting diferences in
> znver3 tables that I also patched and seems performance neutral.  I will send
> that separately.
>
> Bootstrapped/regtested x86_64-linux, also benchmarked on SPEC2017 along
> with AVX512 tuning.  I plan to commit it tomorrow unless there are some
> comments.
>
> Honza
>
> * x86-tune-costs.h (znver4_cost): Upate costs of FP and SSE moves,
> division multiplication, gathers, L2 cache size, and more complex
> FP instrutions.
> diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-
> costs.h
> index f01b8ee9eef..3a6ce02f093 100644
> --- a/gcc/config/i386/x86-tune-costs.h
> +++ b/gcc/config/i386/x86-tune-costs.h
> @@ -1867,9 +1868,9 @@ struct processor_costs znver4_cost = {
>{8, 8, 8},   /* cost of storing integer
>registers.  */
>2,   /* cost of reg,reg fld/fst.  */
> -  {6, 6, 16},  /* cost of loading fp registers
> +  {14, 14, 17},/* cost of loading fp 
> registers
>in SFmode, DFmode and XFmode.  */
> -  {8, 8, 16},  /* cost of storing fp registers
> +  {12, 12, 16},/* cost of storing fp 
> registers
>in SFmode, DFmode and XFmode.  */
>2,   /* cost of moving MMX register.  */
>{6, 6},  /* cost of loading MMX registers
> @@ -1878,13 +1879,13 @@ struct processor_costs znver4_cost = {
>in SImode and DImode.  */
>2, 2, 3, /* cost of moving XMM,YMM,ZMM
>register.  */
> -  {6, 6, 6, 6, 12},/* cost of loading SSE registers
> +  {6, 6, 10, 10, 12},  /* cost of loading SSE registers
>in 32,64,128,256 and 512-bit.  */
> -  {8, 8, 8, 8, 16},/* cost of storing SSE registers
> +  {8, 8, 8, 12, 12},   /* cost of storing SSE registers
>in 32,64,128,256 and 512-bit.  */
> -  6, 6,/* SSE->integer and 
> integer->SSE
> +  6, 8,/* SSE->integer and 
> integer->SSE
>moves.  */
> -  8, 8,/* mask->integer and integer->mask 
> moves */
> +  8, 8,/* mask->integer and 
> integer->mask moves */
>{6, 6, 6},   /* cost of loading mask register
>in QImode, HImode, SImode.  */
>{8, 8, 8},   /* cost if storing mask register
> @@ -1894,6 +1895,7 @@ struct processor_costs znver4_cost = {
>},
>
>COSTS_N_INSNS (1),   

RE: [PATCH 2/2] i386: correct x87 multiplication modeling in znver.md

2022-11-16 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Official Use Only - General]

Hi,

Thank you for fixing this.

> -Original Message-
> From: Alexander Monakov 
> Sent: Tuesday, November 1, 2022 9:57 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Jan Hubička ; Joshi, Tejas Sanjay
> ; Kumar, Venkataramanan
> ; Alexander Monakov
> 
> Subject: [PATCH 2/2] i386: correct x87 multiplication modeling in
> znver.md
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> All multiplication instructions are fully pipelined, except AVX256
> instructions on Zen 1, which issue over two cycles on a 128-bit unit.
> Correct the model accordingly to reduce combinatorial explosion in
> automaton tables.
>
> Top znver table sizes in insn-automata.o:
>
> Before:
>
> 30056 r znver1_fp_min_issue_delay
> 120224 r znver1_fp_transitions
>
> After:
>
> 6720 r znver1_fp_min_issue_delay
> 53760 r znver1_fp_transitions
>
> gcc/ChangeLog:
>
> PR target/87832
> * config/i386/znver.md: (znver1_fp_op_mul): Correct cycles in
> the reservation.
> (znver1_fp_op_mul_load): Ditto.
> (znver1_mmx_mul): Ditto.
> (znver1_mmx_load): Ditto.
> (znver1_ssemul_ss_ps): Ditto.
> (znver1_ssemul_ss_ps_load): Ditto.
> (znver1_ssemul_avx256_ps): Ditto.
> (znver1_ssemul_avx256_ps_load): Ditto.
> (znver1_ssemul_sd_pd): Ditto.
> (znver1_ssemul_sd_pd_load): Ditto.
> (znver2_ssemul_sd_pd): Ditto.
> (znver2_ssemul_sd_pd_load): Ditto.
> (znver1_ssemul_avx256_pd): Ditto.
> (znver1_ssemul_avx256_pd_load): Ditto.
> (znver1_sseimul): Ditto.
> (znver1_sseimul_avx256): Ditto.
> (znver1_sseimul_load): Ditto.
> (znver1_sseimul_avx256_load): Ditto.
> (znver1_sseimul_di): Ditto.
> (znver1_sseimul_load_di): Ditto.
> ---
>  gcc/config/i386/znver.md | 40 
>  1 file changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md index
> c52f8b532..882f250f1 100644
> --- a/gcc/config/i386/znver.md
> +++ b/gcc/config/i386/znver.md
> @@ -573,13 +573,13 @@ (define_insn_reservation "znver1_fp_op_mul" 5
>  (and (eq_attr "cpu" "znver1,znver2,znver3")
>   (and (eq_attr "type" "fop,fmul")
>(eq_attr "memory" "none")))
> -"znver1-direct,znver1-fp0*5")
> +"znver1-direct,znver1-fp0")
>
>  (define_insn_reservation "znver1_fp_op_mul_load" 12
>  (and (eq_attr "cpu" "znver1,znver2,znver3")
>   (and (eq_attr "type" "fop,fmul")
>(eq_attr "memory" "load")))
> -"znver1-direct,znver1-load,znver1-fp0*5")
> +"znver1-direct,znver1-load,znver1-fp0")
>
>  (define_insn_reservation "znver1_fp_op_imul_load" 16
>  (and (eq_attr "cpu" "znver1,znver2,znver3") @@ 
> -684,13
> +684,13 @@ (define_insn_reservation "znver1_mmx_mul" 3
>  (and (eq_attr "cpu" "znver1,znver2,znver3")
>   (and (eq_attr "type" "mmxmul")
>(eq_attr "memory" "none")))
> - "znver1-direct,znver1-fp0*3")
> + "znver1-direct,znver1-fp0")
>
>  (define_insn_reservation "znver1_mmx_load" 10
>  (and (eq_attr "cpu" "znver1,znver2,znver3")
>   (and (eq_attr "type" "mmxmul")
>(eq_attr "memory" "load")))
> -"znver1-direct,znver1-load,znver1-fp0*3")
> +"znver1-direct,znver1-load,znver1-fp0")
>
>  ;; TODO
>  (define_insn_reservation "znver1_avx256_log" 1 @@ -1161,7 +1161,7
> @@ (define_insn_reservation "znver1_ssemul_ss_ps" 3
>   (eq_attr "mode"
> "V8SF,V4SF,SF,V4DF,V2DF,DF")))
>   (and (eq_attr "type" "ssemul")
>(eq_attr "memory" "none")))
> -"znver1-direct,(znver1-fp0|znver1-fp1)*3")
> +"znver1-direct,znver1-fp0|znver1-fp1")
>
>  (define_insn_reservation "znver1_ssemul_ss_ps_load" 10
>  (and (ior (and (eq_attr "cpu" "znver1") @@ -1172,47
> +1172,47 @@ (define_insn_reservation "znver1_ssemul_ss_ps_load" 10
>   (eq_attr "mode" 
> "V8SF,V4SF,SF")))
>   (and (eq_attr "type" "ssemul")
>(eq_attr "memory" "load")))
> -
> "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3")
> +
> + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1")
>
>  (define_insn_reservation "znver1_ssemul_avx256_ps" 3
>

RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-26 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Official Use Only - General]

Hi Alexander,

Thank you for looking in to this issue.

> -Original Message-
> From: Alexander Monakov 
> Sent: Tuesday, October 25, 2022 12:18 AM
> To: Jan Hubička 
> Cc: Kumar, Venkataramanan ; Jakub
> Jelinek ; Richard Biener
> ; Joshi, Tejas Sanjay
> ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen4 CPU
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> On Mon, 24 Oct 2022, Jan Hubička wrote:
>
> > > By the way, it appears pre-existing znver[123] models are also
> > > causing some kind of combinatorial blow-up, but before znver4 it was
> > > not a blocking issue:
> > >
> > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgc
> > >
> c.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D87832data=05%7C
> 01%7C
> > >
> Venkataramanan.Kumar%40amd.com%7C5d22bec311ac43b3f56a08dab5f
> 03fc7%7C
> > >
> 3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638022340726474
> 812%7CUnkn
> > >
> own%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik
> 1haW
> > >
> wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=kg2zKCBxDEeYYKijH
> 204QpOC4
> > > 0SJBADOvqlk0LhzJhc%3Dreserved=0
> >
> >
> > It is really easy to make DFA size to grow if there are possibly many
> > instructions in the pipeline (as every possible state of a modelled
> > pipeline needs to be a new state of the automaton). This is
> > essentially depth_of_pipeline * number_of_units with additional states
> > to repesent special instructions and this naturally keeps growing.
> >
> > We could try to break the FP automata into multiple ones, but there
> > are instructions that can go down any pipe which makes this hard or we
> > can try toreduce number of different reservation types (possibly by
> > breaking the automaton to znver1-3 and 4 or so).
> > With znver2 model I experimented with broken up version and common
> one
> > and ended up with smaller binary for combined one.
>
> Looking at znver1.md again, I think the problem is caused by incorrect
> modeling of division instructions: they have descriptions like
>
> (define_insn_reservation "znver1_idiv_DI" 41
> (and (eq_attr "cpu" "znver1,znver2")
>  (and (eq_attr "type" "idiv")
>   (and (eq_attr "mode" "DI")
>(eq_attr "memory" "none"
> "znver1-double,znver1-ieu2*41")
>
> which says that DImode idiv has latency 41 (which is correct) and that it
> occupies 2nd integer execution unit for 41 consecutive cycles, but that is
> not correct:

Yes you are correct. It does not block the 2nd integer execution pipe 
consecutively for 41 cycles.

>
> 1) the division instruction is partially pipelined, and has throughput 1/14

"Div" unit takes one instruction and in the worst case the latency will be 41 
cycles in znver1/2.
But I agree that we can put best case latency of 14 cycles for the scheduler 
model in znver1/2 .

>
> 2) for the most part it occupies a separate division unit, not the general
> arithmetic unit.

Agreed.

>
> (incidentally, I think the blowup is caused by interaction of such super-long
> 41-cycle paths with the rest of reservations)
>
> I think we should fix this by modeling the separate division unit properly,
> and fixing reservations to use the measured reciprocal throughput of those
> instructions (available from uops.info). The following patch does that for
> integer divisions and completely eliminates the integer part of the problem;
> the issue with floating-point divisions remains.
>
> Top 5 znver table sizes, before:
>
> 68692 r znver1_ieu_check
> 68692 r znver1_ieu_transitions
> 99792 r znver1_ieu_min_issue_delay
> 428108 r znver1_fp_min_issue_delay
> 856216 r znver1_fp_transitions
>
> After:
>
> 1454 r znver1_ieu_translate
> 1454 r znver1_translate
> 2304 r znver1_ieu_transitions
> 428108 r znver1_fp_min_issue_delay
> 856216 r znver1_fp_transitions
>
> Will you help getting this reviewed for trunk?
>
>
>
> diff --git a/gcc/config/i386/znver1.md b/gcc/config/i386/znver1.md index
> 9c25b4e27..39b59343d 100644
> --- a/gcc/config/i386/znver1.md
> +++ b/gcc/config/i386/znver1.md
> @@ -24,7 +24,7 @@
>  ;; AMD znver1, znver2 and znver3 Scheduling  ;; Modeling automatons for
> zen decoders, integer execution pipes,  ;; AGU pipes and floating point
> execution units.
> -(define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu")
> +(define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu,
> +znver1_idiv")
>
>  ;; Decoders unit has 4 decoders and all of them can decode fast path  ;; and
> vector type instructions.
> @@ -50,6 +50,7 @@
>  (define_cpu_unit "znver1-ieu1" "znver1_ieu")  (define_cpu_unit "znver1-
> ieu2" "znver1_ieu")  (define_cpu_unit "znver1-ieu3" "znver1_ieu")
> +(define_cpu_unit "znver1-idiv" "znver1_idiv")
>  (define_reservation 

RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-23 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Official Use Only - General]

Hi Richi and Jakub

> -Original Message-
> From: Jakub Jelinek 
> Sent: Saturday, October 22, 2022 10:41 PM
> To: Richard Biener 
> Cc: Kumar, Venkataramanan ; Joshi,
> Tejas Sanjay ; gcc-patches@gcc.gnu.org;
> honza.hubi...@gmail.com
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen4 CPU
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> On Fri, Oct 21, 2022 at 01:51:55PM +0200, Richard Biener via Gcc-patches
> wrote:
> > > > > BTW: Perhaps znver1.md is not the right filename anymore, since
> > > > > it hosts
> > > > all four Zen schedulers.
> > > >
> > > > I have renamed the file to znver.md in this revision, PFA.
> > > > Thank you for the review, we will push it for trunk if we don't
> > > > get any further comments.
> > >
> > > I have pushed the patch on behalf of Tejas.
> >
> > This grew insn-automata.cc from 201502 lines to 639968 lines and the
> > build of the automata (genautomata) to several minutes in my dev tree.
>
> Yeah, in my unoptimized non-bootstrapped development tree genautomata
> now takes over 12 minutes on a fast box, that is simply not acceptable.

Thank you for notifying us.

tejassanjay.jo...@amd.com has posted a patch for review to fix this (as per 
Honza's comments).
Ref: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604144.html

Sorry for the inconvenience caused.

Regards,
Venkat.

>
> Jakub



RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-21 Thread Kumar, Venkataramanan via Gcc-patches
Hi all, 

> -Original Message-
> From: Joshi, Tejas Sanjay 
> Sent: Monday, October 17, 2022 8:09 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kumar, Venkataramanan ;
> honza.hubi...@gmail.com; Uros Bizjak 
> Subject: RE: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen4 CPU
> 
> [Public]
> 
> Hi,
> 
> > BTW: Perhaps znver1.md is not the right filename anymore, since it hosts
> all four Zen schedulers.
> 
> I have renamed the file to znver.md in this revision, PFA.
> Thank you for the review, we will push it for trunk if we don't get any
> further comments.

I have pushed the patch on behalf of Tejas. 

Regards,
Venkat.



RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2021-03-31 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Honza, 

> -Original Message-
> From: Jan Hubicka 
> Sent: Wednesday, March 31, 2021 1:27 PM
> To: Kumar, Venkataramanan 
> Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3
> CPU
> 
> [CAUTION: External Email]
> 
> > [AMD Public Use]
> >
> > Hi Honza,
> >
> > > -Original Message-
> > > From: Jan Hubicka 
> > > Sent: Wednesday, March 31, 2021 1:15 AM
> > > To: Kumar, Venkataramanan 
> > > Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH] [X86_64]: Enable support for next generation
> > > AMD Zen3 CPU
> > >
> > > [CAUTION: External Email]
> > >
> > > Hi,
> > > this patch backports the initial support to gcc10 branch.  Since the
> > > trunk and branch diverged there is non-trivial change to cpuinfo
> > > discovery.  I do;
> > >
> > > --- a/libgcc/config/i386/cpuinfo.c
> > > +++ b/libgcc/config/i386/cpuinfo.c
> > > @@ -111,6 +111,12 @@ get_amd_cpu (unsigned int family, unsigned int
> model)
> > >if (model >= 0x30)
> > >  __cpu_model.__cpu_subtype = AMDFAM17H_ZNVER2;
> > >break;
> > > +case 0x19:
> > > +  __cpu_model.__cpu_type = AMDFAM19H;
> > > +  /* AMD family 19h version 1.  */
> > > +  if (model <= 0x0f)
> > > +   __cpu_model.__cpu_subtype = AMDFAM19H_ZNVER3;
> > > +  break;
> > >  default:
> > >break;
> > >  }
> > >
> > > While your patch also sets ZNVER3 for case where VAES is supporte
> > > that would require backporting more of logic detecting VAES.  Is
> > > that necessary?
> >
> > I think you are referring to the below change.
> >
> > diff --git a/gcc/config/i386/driver-i386.c
> > b/gcc/config/i386/driver-i386.c index ecdad5765d5..2bfa037dd8b 100644
> > --- a/gcc/config/i386/driver-i386.c
> > +++ b/gcc/config/i386/driver-i386.c
> > @@ -455,6 +455,8 @@ const char *host_detect_local_cpu (int argc, const
> char **argv)
> > processor = PROCESSOR_GEODE;
> >else if (has_feature (FEATURE_MOVBE) && family == 22)
> > processor = PROCESSOR_BTVER2;
> > +  else if (has_feature (FEATURE_VAES))
> > +   processor = PROCESSOR_ZNVER3;
> >else if (has_feature (FEATURE_CLWB))
> > processor = PROCESSOR_ZNVER2;
> >
> > My understanding is that when we use -march=native on znver3 machine it
> would check for "vaes" to detect the machine.
> > Otherwise it would detect it as znver2 machine.  So we need that detection
> logic.
> 
> I was wondering about
> 
> +case 0x19:
> +  cpu_model->__cpu_type = AMDFAM19H;
> +  /* AMD family 19h version 1.  */
> +  if (model <= 0x0f)
> +   {
> + cpu = "znver3";
> + CHECK___builtin_cpu_is ("znver3");
> + cpu_model->__cpu_subtype = AMDFAM19H_ZNVER3;
> +   }
> +  else if (has_cpu_feature (cpu_model, cpu_features2,
> +   FEATURE_VAES))
> +   {
> + cpu = "znver3";
> + CHECK___builtin_cpu_is ("znver3");
> + cpu_model->__cpu_subtype = AMDFAM19H_ZNVER3;
> +   }
> +  break;
> 
> For znver3 we detect the model number and I wonder why we also test the VAES
> feature when we don't do that for other families.

Ah I see,  I will double check on the model numbers again.
Yes we can remove the code which checks for VAES here.


> 
> Honza

Regards,
Venkat.


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2021-03-31 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Honza,

> -Original Message-
> From: Jan Hubicka 
> Sent: Wednesday, March 31, 2021 1:15 AM
> To: Kumar, Venkataramanan 
> Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3
> CPU
> 
> [CAUTION: External Email]
> 
> Hi,
> this patch backports the initial support to gcc10 branch.  Since the
> trunk and branch diverged there is non-trivial change to cpuinfo
> discovery.  I do;
> 
> --- a/libgcc/config/i386/cpuinfo.c
> +++ b/libgcc/config/i386/cpuinfo.c
> @@ -111,6 +111,12 @@ get_amd_cpu (unsigned int family, unsigned int model)
>if (model >= 0x30)
>  __cpu_model.__cpu_subtype = AMDFAM17H_ZNVER2;
>break;
> +case 0x19:
> +  __cpu_model.__cpu_type = AMDFAM19H;
> +  /* AMD family 19h version 1.  */
> +  if (model <= 0x0f)
> +   __cpu_model.__cpu_subtype = AMDFAM19H_ZNVER3;
> +  break;
>  default:
>break;
>  }
> 
> While your patch also sets ZNVER3 for case where VAES is supporte that
> would require backporting more of logic detecting VAES.  Is that
> necessary? 

I think you are referring to the below change. 

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index ecdad5765d5..2bfa037dd8b 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -455,6 +455,8 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
processor = PROCESSOR_GEODE;
   else if (has_feature (FEATURE_MOVBE) && family == 22)
processor = PROCESSOR_BTVER2;
+  else if (has_feature (FEATURE_VAES))
+   processor = PROCESSOR_ZNVER3;
   else if (has_feature (FEATURE_CLWB))
processor = PROCESSOR_ZNVER2;

My understanding is that when we use -march=native on znver3 machine it would 
check for "vaes" to detect the machine.
Otherwise it would detect it as znver2 machine.  So we need that detection 
logic.


> I see it may make znver3 to be defaulted on future znver4 if
> it stays with amdfam19, but we did not do this before.
Usually we have at least one differentiating ISA or change in family name or 
model name going from one processor to another. 

> 
> Bootstrapped/regtested x86_64-linux.  With -march=native on znver3
> machine we get right flags, but trunk in addition passes:
> 
> -mno-amx-bf16
> -mno-amx-int8
> -mno-amx-tile
> -mno-avxvnni
> -mno-hreset
> -mno-kl
> -mno-serialize
> -mno-tsxldtrk
> -mno-uintr
> -mno-widekl
> 
> Which are options we did not backported.
> Atop of that I plan to backport the tuning patches with exception of
> gather which seems bit controversal and can wait for gcc11.

Ok that seems fine.

Regards,
Venkat.

> 
> Honza
> 
> 2021-03-30  Jan Hubicka  
> 
> Backport
> 
> Venkataramanan Kumar  
> Sharavan Kumar  
> * common/config/i386/cpuinfo.h (get_amd_cpu) recognize znver3.
> * common/config/i386/i386-common.c (processor_names): Add
> znver3.
> (processor_alias_table): Add znver3 and AMDFAM19H entry.
> * common/config/i386/i386-cpuinfo.h (processor_types): Add
> AMDFAM19H.
> (processor_subtypes): AMDFAM19H_ZNVER3.
> * config.gcc (i[34567]86-*-linux* | ...): Likewise.
> * config/i386/driver-i386.c: (host_detect_local_cpu): Let
> -march=native recognize znver3 processors.
> * config/i386/i386-c.c (ix86_target_macros_internal): Add
> znver3.
> * config/i386/i386-options.c (m_znver3): New definition.
> (m_ZNVER): Include m_znver3.
> (processor_cost_table): Add znver3.
> * config/i386/i386.c (ix86_reassociation_width): Likewise.
> * config/i386/i386.h (TARGET_znver3): New definition.
> (enum processor_type): Add PROCESSOR_ZNVER3.
> * config/i386/i386.md (define_attr "cpu"): Add znver3.
> * config/i386/x86-tune-sched.c: (ix86_issue_rate): Likewise.
> (ix86_adjust_cost): Likewise.
> * config/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS:
> Likewise.
> * config/i386/znver1.md: Add new reservations for znver3.
> * doc/extend.texi: Add details about znver3.
> * doc/invoke.texi: Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-03-30  Jan Hubicka  
> 
> * gcc.target/i386/funcspec-56.inc: Handle new march.
> 
> libgcc/ChangeLog:
> 
> 2021-03-30  Jan Hubicka  
> 
> * config/i386/cpuinfo.c (get_amd_cpu): Support amdfam19.
> * config/i386/cpuinfo.h (enum processor_types): Add AMDFAM19H.
> (enum processor_subtypes): Add AMDFAM19H_ZNVER3.
> 
> diff --git a/gcc/common/config/i386/i386-common.c
> b/gcc/common/config/i386/i386-common.c
> index 1e4d25f052a..97335d42af1 100644
> --- a/gcc/common/config/i386/i386-common.c
> +++ b/gcc/common/config/i386/i386-common.c
> @@ -1582,7 +1582,8 @@ const char *const processor_names[] =
>"btver1",
>"btver2",
>"znver1",
> -  "znver2"
> +  "znver2",
> +  "znver3"
>  

RE: znver3 tuning part 1

2021-03-23 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Honza,


> -Original Message-
> From: Jan Hubicka 
> Sent: Monday, March 22, 2021 4:31 PM
> To: Kumar, Venkataramanan 
> Cc: gcc-patches@gcc.gnu.org; mjam...@suse.cz
> Subject: Re: znver3 tuning part 1
> 
> [CAUTION: External Email]
> 
> > > Hi,
> > > I plan to commit some retuning of znver3 codegen that is based on
> > > real hardware benchmarks.  It turns out that there are not too many
> > > changes necessary sinze Zen3 is quite smooth upgrade to Zen2.  In summary:
> > >
> > >  - some instructions (like idiv) have shorter latencies.  Adjusting
> > >costs reduces code size a bit but seems within noise in benchmark
> > >(since our cost calculation is quite off anyway because it does not
> > >account register pressure and parallelism that does make huge
> > >difference here)
> > >  - gather instructions are still microcoded but a lot faster than in
> > >znver1/znver2 and it turns out they are now beneficial for few tsmc
> > >benchmarks, so I plan to enable them.
> >
> > Can we get a copy of this benchmark to try ?
> > we need to check on bigger benchmarks like SPEC also.
> 
> Yes, I am also running specs.  However for basic instruction selection tuning
> smaller benchmarks are doing quite well.  In general if there are relatively
> natural loops where gather helps, i think we should enable it and try to fix
> possible regressions (I did not see one in spec runs, but I plan to do more
> benhcmarking this week).

Okay Thank you.  

> 
> I did some work on TSVC mostly because zen3 seems very smooth update to
> zen2 for instruction selection (which is already happy with almost everything
> especially for scalar code) and vectorizer costs seems to be place where we
> seem to have most room for improvement.
> 
> I briefly analyzed all tsvc kernels where we regress compared to clang, aocc 
> and
> icc.  You can search tsvc in bugzilla. Richard also wrote some observations 
> there.
> These are related to missing features rather than cost model however.
> 
> One problem of tsvc is that it is FP only.  I hacked it for integer but it 
> would be
> nice to have someting else as well.
> >
> > >
> > >It seems we missed revisiting this for znver2 tuning.
> > >I think even for znver2 it may make sense to re-enable them, so I
> > >will benchmark this as well.
> > >  - memcpy/memset expansion seems to work same way as for znver2,
> > >so I am keeping same changes.
> > >  - instruction scheduler is already modified in trunk to some degree
> > >reflecting new units.  Problem with instruction scheduling is that
> > >it treats zen as in-order CPU and is unlikely going to fill all
> > >execution resources this way.
> > >We may want to try to model the out-of-order nature similar way as
> > >LLVM does, but at the other hand the current scheduling logic seems
> > >to do mostly fine (i.e. not worse than llvm's).  What matters is
> > >to schedule for long latencies and just after branch boundaries
> > >where simplified model seems to do just fine.
> >
> > So we can keep the existing model for znver3 for GCC 11 ?
> 
> I think so - I experimented with making the model bit more precise and it does
> not seem to add any performance improvements and makes the automaton a
> lot bigger.  The existing model already handles the updated
> zen3 latencies...
> 
> I think the only possible iprovment here would be to start modelling 
> explicitly the
> out of order nature but even then I am not sure how much benefits that can
> bring (given that we are limited to relatively small basic blocks and do not 
> have a
> lot of information needed to model the execution precisely). Do you have some
> options on this?

Given that basic blocks are small and hardware itself reorders the 
instructions, I don't think precisely modelling the scheduler will give much 
benefit.

> 
> Honza

Regards,
Venkat.


RE: znver3 tuning part 1

2021-03-22 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Official Use Only - Internal Distribution Only]

Hi Honza,

Thank you for working on this.  

> -Original Message-
> From: Gcc-patches  On Behalf Of Jan
> Hubicka
> Sent: Monday, March 15, 2021 3:33 PM
> To: gcc-patches@gcc.gnu.org; mjam...@suse.cz
> Subject: znver3 tuning part 1
> 
> [CAUTION: External Email]
> 
> Hi,
> I plan to commit some retuning of znver3 codegen that is based on real
> hardware benchmarks.  It turns out that there are not too many changes
> necessary sinze Zen3 is quite smooth upgrade to Zen2.  In summary:
> 
>  - some instructions (like idiv) have shorter latencies.  Adjusting
>costs reduces code size a bit but seems within noise in benchmark
>(since our cost calculation is quite off anyway because it does not
>account register pressure and parallelism that does make huge
>difference here)
>  - gather instructions are still microcoded but a lot faster than in
>znver1/znver2 and it turns out they are now beneficial for few tsmc
>benchmarks, so I plan to enable them.

Can we get a copy of this benchmark to try ?  
we need to check on bigger benchmarks like SPEC also. 

> 
>It seems we missed revisiting this for znver2 tuning.
>I think even for znver2 it may make sense to re-enable them, so I
>will benchmark this as well.
>  - memcpy/memset expansion seems to work same way as for znver2,
>so I am keeping same changes.
>  - instruction scheduler is already modified in trunk to some degree
>reflecting new units.  Problem with instruction scheduling is that
>it treats zen as in-order CPU and is unlikely going to fill all
>execution resources this way.
>We may want to try to model the out-of-order nature similar way as
>LLVM does, but at the other hand the current scheduling logic seems
>to do mostly fine (i.e. not worse than llvm's).  What matters is
>to schedule for long latencies and just after branch boundaries
>where simplified model seems to do just fine.

So we can keep the existing model for znver3 for GCC 11 ?

>  - some move instruction latencies does not reflect reality
>(at least the published latencies by Agner Fog or AMD optimization
>manual that themseleves does not agree with each otehr).
>Adjusting tables however triggers regressions in ImageMagick and
>parest, so I am still looking if there is easy fix for this and if
>not, I will wait for next stage1 with these.
>Interesting property is that reg-reg moves are a zero latency.
>Since costs are officially relative to reg-reg move it makes it bit
>hard to define here :)
>  - fmadd was optimized and it is now 4 cycles (was 5 and 6 cycles on
>znver2 and znver1 respectively) like on Intel. However there is still
>problem with extending the critical chain in matrix multiplication
>loop.  The difference seems to be that Intel implementation needs the
>accumulator value to be ready only 1 cycle after the execution
>started processing the multiplication.
> 
>So there is still a performance regression on matmul and thus I am
>keeping the logic to break critical chains.

My observation is also same here. 

> 
> This first patch is no-op and it only copies the cost tables.  I will adjust 
> them one-
> by-one for easier hunting of possible regressions.
> 
> Honza
> 
> 2021-03-15  Jan Hubicka  
> 
> * config/i386/i386-options.c (processor_cost_table): Add znver3_cost.
> * config/i386/x86-tune-costs.h (znver3_cost): New gobal variable; copy
> of znver2_cost.
> 
> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index e93935f6f2c..7865bc110a3 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -743,7 +743,7 @@ static const struct processor_costs
> *processor_cost_table[] =
>_cost,
>_cost,
>_cost,
> -  _cost
> +  _cost
>  };
> 
>  /* Guarantee that the array is aligned with enum processor_type.  */ diff 
> --git
> a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
> index cc27c7911e3..e655e668c7a 100644
> --- a/gcc/config/i386/x86-tune-costs.h
> +++ b/gcc/config/i386/x86-tune-costs.h
> @@ -1688,6 +1688,140 @@ struct processor_costs znver2_cost = {
>"16",/* Func alignment.  */
>  };
> 
> +struct processor_costs znver3_cost = {
> +  {
> +  /* Start of register allocator costs.  integer->integer move cost is
> +2. */
> +
> +  /* reg-reg moves are done by renaming and thus they are even cheaper than
> + 1 cycle.  Because reg-reg move cost is 2 and following tables correspond
> + to doubles of latencies, we do not model this correctly.  It does not
> + seem to make practical difference to bump prices up even more.  */
> +  6,   /* cost for loading QImode using
> +  movzbl.  */
> +  {6, 6, 6},   /* cost of loading integer 

RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-05 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]


As per https://gcc.gnu.org/codingconventions.html#ChangeLogs

--Snip--
ChangeLogs
ChangeLog entries are part of git commit messages and are automatically put 
into a corresponding ChangeLog file.
--Snip--

This means Changelog files will be updated automatically?  I did not do 
anything to Change log files while pushing. 
The Change log contents are part of my commit message. 

Regards,
Venkat.

> -Original Message-
> From: Gcc-patches  On Behalf Of
> Kumar, Venkataramanan via Gcc-patches
> Sent: Saturday, December 5, 2020 1:09 PM
> To: Jan Hubicka ; Uros Bizjak 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen3 CPU
> 
> [CAUTION: External Email]
> 
> [AMD Public Use]
> 
> Hi Honza,
> 
> > -Original Message-
> > From: Jan Hubicka 
> > Sent: Saturday, December 5, 2020 1:06 AM
> > To: Uros Bizjak 
> > Cc: Kumar, Venkataramanan ; gcc-
> > patc...@gcc.gnu.org
> > Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> > Zen3 CPU
> >
> > [CAUTION: External Email]
> >
> > > On Fri, Dec 4, 2020 at 6:50 PM Kumar, Venkataramanan
> > >  wrote:
> > > >
> > > > [AMD Public Use]
> > > >
> > > > Hi Uros
> > > >
> > > > > -Original Message-
> > > > > From: Uros Bizjak 
> > > > > Sent: Friday, December 4, 2020 2:30 PM
> > > > > To: Kumar, Venkataramanan 
> > > > > Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> > > > > 
> > > > > Subject: Re: [PATCH] [X86_64]: Enable support for next
> > > > > generation AMD
> > > > > Zen3 CPU
> > > > >
> > > > > [CAUTION: External Email]
> > > > >
> > > > > On Thu, Dec 3, 2020 at 4:29 PM Kumar, Venkataramanan
> > > > >  wrote:
> > > > > >
> > > > > > [AMD Public Use]
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi Maintainers,
> > > > > >
> > > > > >
> > > > > >
> > > > > > PFA, the patch that enables support for the next generation
> > > > > > AMD
> > > > > > Zen3
> > > > > CPU via -march=znver3.
> > > > > >
> > > > > > This is a very basic enablement patch. As of now the cost,
> > > > > > tuning and
> > > > > scheduler changes are kept same as znver2.
> > > > > >
> > > > > > Further changes to the cost and tunings will be done later.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Ok for trunk ?
> > > > >
> > > > > Please also add a new target to multiversioning and
> > > > > corresponding testcases. As an example, how this is done
> > > > > nowadays, please see a submission for a different target at [1].
> > > > >
> > > > > BTW: It looks that multiversioning testcases lack AMD targets.
> > > > > Can you please add a testcase similar to
> > > > > testsuite/g++.target/i386/mv16.C and also add AMD targets to
> > testsuite/gcc.target/i386/funcspec-56.inc.
> > > > > (this can be done in a follow-up patch).
> > > > >
> > > > > [1]
> > > > >
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > > gcc
> > > > > .gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> > > > >
> > July%2F549699.htmldata=04%7C01%7CVenkataramanan.Kumar%40
> > > > >
> > amd.com%7Cb53d6be6a0d6439396ae08d8983308e9%7C3dd8961fe4884e
> > > > >
> > 608e11a82d994e183d%7C0%7C0%7C637426692241855598%7CUnknown
> > > > >
> > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > > > >
> > WwiLCJXVCI6Mn0%3D%7C1000sdata=VAPPvfzv%2FMCRiXSn2eBNn
> > > > > 7bVIReoEHLkAtFgV%2BTFR4I%3Dreserved=0
> > > > >
> > > >
> > > > Please find attached the version 2 patch.
> > > >
> > > > I have made additional changes as suggested by you.
> > > > 1.  Added the AMD Zen targets to funcspec-56.inc file in the tests.
> > > > 2.  To covers multiversioning  added a new test with some set of
> > > > AMD
> > targets detected by builtin_cpus similar to mv16.C.
> > > >
> > > > is ok for trunk ?
> > >
> > > LGTM (I didn't review scheduling changes in detail).
> >
> > I checked the scheudling changes and they are OK. So the patch is OK
> > overall.
> >
> > Even with respect to Jason's point on possibly regressing primary
> > target (breaking -march=native on zen3 machine counts as a
> > regression), the risks here are low. There is nothing really controveral in 
> > the
> patch.
> >
> > It would be nice to setup the regular benchmarking on zen3 machine,
> > like we do for zen1/2.
> > Honza
> 
> Thank you for reviewing the patch.  I pushed the patch to the gcc trunk.
> 
> Ref:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
> gnu.org%2Fgit%2F%3Fp%3Dgcc.git%3Ba%3Dcommit%3Bh%3D3e2ae3ee285a
> 57455d5a23bd352a68c289130186data=04%7C01%7Cvenkataramanan.k
> umar%40amd.com%7C03e85fcff1fe4d8386b508d898f0cb19%7C3dd8961fe488
> 4e608e11a82d994e183d%7C0%7C0%7C637427507262548698%7CUnknown%7
> CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> LCJXVCI6Mn0%3D%7C1000sdata=TcAN3MO7J5nyIjF7RshCS0n5XfketTz
> Cvw6clctIfAI%3Dreserved=0
> 
> > >
> > > Uros.
> 
> Regards,
> Venkat.


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Honza,

> -Original Message-
> From: Jan Hubicka 
> Sent: Saturday, December 5, 2020 1:06 AM
> To: Uros Bizjak 
> Cc: Kumar, Venkataramanan ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen3 CPU
> 
> [CAUTION: External Email]
> 
> > On Fri, Dec 4, 2020 at 6:50 PM Kumar, Venkataramanan
> >  wrote:
> > >
> > > [AMD Public Use]
> > >
> > > Hi Uros
> > >
> > > > -Original Message-
> > > > From: Uros Bizjak 
> > > > Sent: Friday, December 4, 2020 2:30 PM
> > > > To: Kumar, Venkataramanan 
> > > > Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> > > > 
> > > > Subject: Re: [PATCH] [X86_64]: Enable support for next generation
> > > > AMD
> > > > Zen3 CPU
> > > >
> > > > [CAUTION: External Email]
> > > >
> > > > On Thu, Dec 3, 2020 at 4:29 PM Kumar, Venkataramanan
> > > >  wrote:
> > > > >
> > > > > [AMD Public Use]
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Hi Maintainers,
> > > > >
> > > > >
> > > > >
> > > > > PFA, the patch that enables support for the next generation AMD
> > > > > Zen3
> > > > CPU via -march=znver3.
> > > > >
> > > > > This is a very basic enablement patch. As of now the cost,
> > > > > tuning and
> > > > scheduler changes are kept same as znver2.
> > > > >
> > > > > Further changes to the cost and tunings will be done later.
> > > > >
> > > > >
> > > > >
> > > > > Ok for trunk ?
> > > >
> > > > Please also add a new target to multiversioning and corresponding
> > > > testcases. As an example, how this is done nowadays, please see a
> > > > submission for a different target at [1].
> > > >
> > > > BTW: It looks that multiversioning testcases lack AMD targets. Can
> > > > you please add a testcase similar to
> > > > testsuite/g++.target/i386/mv16.C and also add AMD targets to
> testsuite/gcc.target/i386/funcspec-56.inc.
> > > > (this can be done in a follow-up patch).
> > > >
> > > > [1]
> > > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > gcc
> > > > .gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> > > >
> July%2F549699.htmldata=04%7C01%7CVenkataramanan.Kumar%40
> > > >
> amd.com%7Cb53d6be6a0d6439396ae08d8983308e9%7C3dd8961fe4884e
> > > >
> 608e11a82d994e183d%7C0%7C0%7C637426692241855598%7CUnknown
> > > >
> %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > > >
> WwiLCJXVCI6Mn0%3D%7C1000sdata=VAPPvfzv%2FMCRiXSn2eBNn
> > > > 7bVIReoEHLkAtFgV%2BTFR4I%3Dreserved=0
> > > >
> > >
> > > Please find attached the version 2 patch.
> > >
> > > I have made additional changes as suggested by you.
> > > 1.  Added the AMD Zen targets to funcspec-56.inc file in the tests.
> > > 2.  To covers multiversioning  added a new test with some set of AMD
> targets detected by builtin_cpus similar to mv16.C.
> > >
> > > is ok for trunk ?
> >
> > LGTM (I didn't review scheduling changes in detail).
> 
> I checked the scheudling changes and they are OK. So the patch is OK
> overall.
> 
> Even with respect to Jason's point on possibly regressing primary target
> (breaking -march=native on zen3 machine counts as a regression), the risks
> here are low. There is nothing really controveral in the patch.
> 
> It would be nice to setup the regular benchmarking on zen3 machine, like
> we do for zen1/2.
> Honza

Thank you for reviewing the patch.  I pushed the patch to the gcc trunk.

Ref: 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3e2ae3ee285a57455d5a23bd352a68c289130186

> >
> > Uros.

Regards,
Venkat.


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Uros,

> -Original Message-
> From: Uros Bizjak 
> Sent: Friday, December 4, 2020 11:31 PM
> To: Kumar, Venkataramanan 
> Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> 
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen3 CPU
> 
> [CAUTION: External Email]
> 
> On Fri, Dec 4, 2020 at 6:50 PM Kumar, Venkataramanan
>  wrote:
> >
> > [AMD Public Use]
> >
> > Hi Uros
> >
> > > -Original Message-
> > > From: Uros Bizjak 
> > > Sent: Friday, December 4, 2020 2:30 PM
> > > To: Kumar, Venkataramanan 
> > > Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> > > 
> > > Subject: Re: [PATCH] [X86_64]: Enable support for next generation
> > > AMD
> > > Zen3 CPU
> > >
> > > [CAUTION: External Email]
> > >
> > > On Thu, Dec 3, 2020 at 4:29 PM Kumar, Venkataramanan
> > >  wrote:
> > > >
> > > > [AMD Public Use]
> > > >
> > > >
> > > >
> > > >
> > > > Hi Maintainers,
> > > >
> > > >
> > > >
> > > > PFA, the patch that enables support for the next generation AMD
> > > > Zen3
> > > CPU via -march=znver3.
> > > >
> > > > This is a very basic enablement patch. As of now the cost, tuning
> > > > and
> > > scheduler changes are kept same as znver2.
> > > >
> > > > Further changes to the cost and tunings will be done later.
> > > >
> > > >
> > > >
> > > > Ok for trunk ?
> > >
> > > Please also add a new target to multiversioning and corresponding
> > > testcases. As an example, how this is done nowadays, please see a
> > > submission for a different target at [1].
> > >
> > > BTW: It looks that multiversioning testcases lack AMD targets. Can
> > > you please add a testcase similar to
> > > testsuite/g++.target/i386/mv16.C and also add AMD targets to
> testsuite/gcc.target/i386/funcspec-56.inc.
> > > (this can be done in a follow-up patch).
> > >
> > > [1]
> > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgc
> > > c
> > > .gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> > >
> July%2F549699.htmldata=04%7C01%7CVenkataramanan.Kumar%40
> > >
> amd.com%7Cb53d6be6a0d6439396ae08d8983308e9%7C3dd8961fe4884e
> > >
> 608e11a82d994e183d%7C0%7C0%7C637426692241855598%7CUnknown
> > >
> %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > >
> WwiLCJXVCI6Mn0%3D%7C1000sdata=VAPPvfzv%2FMCRiXSn2eBNn
> > > 7bVIReoEHLkAtFgV%2BTFR4I%3Dreserved=0
> > >
> >
> > Please find attached the version 2 patch.
> >
> > I have made additional changes as suggested by you.
> > 1.  Added the AMD Zen targets to funcspec-56.inc file in the tests.
> > 2.  To covers multiversioning  added a new test with some set of AMD
> targets detected by builtin_cpus similar to mv16.C.
> >
> > is ok for trunk ?
> 
> LGTM (I didn't review scheduling changes in detail).

Thank you for reviewing the patch.  
I will wait for a day or two,  if I don’t get further comments I will commit 
the patch .

Regards,
Venkat.

> 
> Uros.


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Uros

> -Original Message-
> From: Uros Bizjak 
> Sent: Friday, December 4, 2020 2:30 PM
> To: Kumar, Venkataramanan 
> Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> 
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen3 CPU
> 
> [CAUTION: External Email]
> 
> On Thu, Dec 3, 2020 at 4:29 PM Kumar, Venkataramanan
>  wrote:
> >
> > [AMD Public Use]
> >
> >
> >
> >
> > Hi Maintainers,
> >
> >
> >
> > PFA, the patch that enables support for the next generation AMD Zen3
> CPU via -march=znver3.
> >
> > This is a very basic enablement patch. As of now the cost, tuning and
> scheduler changes are kept same as znver2.
> >
> > Further changes to the cost and tunings will be done later.
> >
> >
> >
> > Ok for trunk ?
> 
> Please also add a new target to multiversioning and corresponding
> testcases. As an example, how this is done nowadays, please see a
> submission for a different target at [1].
> 
> BTW: It looks that multiversioning testcases lack AMD targets. Can you
> please add a testcase similar to testsuite/g++.target/i386/mv16.C and also
> add AMD targets to testsuite/gcc.target/i386/funcspec-56.inc.
> (this can be done in a follow-up patch).
> 
> [1]
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc
> .gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> July%2F549699.htmldata=04%7C01%7CVenkataramanan.Kumar%40
> amd.com%7Cb53d6be6a0d6439396ae08d8983308e9%7C3dd8961fe4884e
> 608e11a82d994e183d%7C0%7C0%7C637426692241855598%7CUnknown
> %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> WwiLCJXVCI6Mn0%3D%7C1000sdata=VAPPvfzv%2FMCRiXSn2eBNn
> 7bVIReoEHLkAtFgV%2BTFR4I%3Dreserved=0
> 

Please find attached the version 2 patch.

I have made additional changes as suggested by you. 
1.  Added the AMD Zen targets to funcspec-56.inc file in the tests.
2.  To covers multiversioning  added a new test with some set of AMD targets 
detected by builtin_cpus similar to mv16.C. 

is ok for trunk ? 

Regards,
Venkat.

> Uros.


X86_64-Enable-support-for-next-generation-AMD-Znver3-V2.patch
Description: X86_64-Enable-support-for-next-generation-AMD-Znver3-V2.patch


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Honza,

> -Original Message-
> From: Jan Hubicka 
> Sent: Friday, December 4, 2020 5:25 PM
> To: Kumar, Venkataramanan 
> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak 
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen3 CPU
> 
> [CAUTION: External Email]
> 
> > [AMD Official Use Only - Internal Distribution Only]
> >
> > Hi Maintainers,
> >
> > PFA, the patch that enables support for the next generation AMD Zen3
> CPU via -march=znver3.
> > This is a very basic enablement patch. As of now the cost, tuning and
> scheduler changes are kept same as znver2.
> > Further changes to the cost and tunings will be done later.
> 
> Hello,
> the changes to x86-tune.def and x86-tune-sched.c seems fine to me.
> There is one patch on
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
> gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> May%2F545415.htmldata=04%7C01%7CVenkataramanan.Kumar%40a
> md.com%7Ce25a15789ca8494e5f4d08d8984b7615%7C3dd8961fe4884e608e1
> 1a82d994e183d%7C0%7C0%7C637426797168805753%7CUnknown%7CTWFpb
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> 6Mn0%3D%7C1000sdata=1hax0HYCbxs1LGQEFULlvLh%2BTRo3xJzuj70
> kIRtFnDk%3Dreserved=0
> I did not see significant difference on specs, but do we want to change the
> default?

You mean the tune " X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL" has no impact on 
SPEC and  other benchmarks ?
I have not done performance experiments with this tune on Zen machines.   

Let me check on this and get back to you. 

> 
> Honza
> >
> > Ok for trunk ?
> >
> > Regards,
> > Venkat.
> 

Regards,
Venkat.


[PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-03 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]


Hi Maintainers,

PFA, the patch that enables support for the next generation AMD Zen3 CPU via 
-march=znver3.
This is a very basic enablement patch. As of now the cost, tuning and scheduler 
changes are kept same as znver2.
Further changes to the cost and tunings will be done later.

Ok for trunk ?

Regards,
Venkat.



X86_64-Enable-support-for-next-generation-AMD-Znver3.patch
Description: X86_64-Enable-support-for-next-generation-AMD-Znver3.patch


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-03 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Thanks Uros, I forgot to change.

Please ignore this thread . I will send fresh  one.

Regards,
Venkat.

-Original Message-
From: Uros Bizjak  
Sent: Thursday, December 3, 2020 8:44 PM
To: Kumar, Venkataramanan 
Cc: gcc-patches@gcc.gnu.org; Jan Hubicka 
Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

[CAUTION: External Email]

The message says that it is for internal distribution. Please repost.

Thanks,
Uros.

On Thu, Dec 3, 2020 at 4:11 PM Kumar, Venkataramanan 
 wrote:
>
> [AMD Official Use Only - Internal Distribution Only]
>
>
> Hi Maintainers,
>
>
>
> PFA, the patch that enables support for the next generation AMD Zen3 CPU via 
> -march=znver3.
>
> This is a very basic enablement patch. As of now the cost, tuning and 
> scheduler changes are kept same as znver2.
>
> Further changes to the cost and tunings will be done later.
>
>
>
> Ok for trunk ?
>
>
>
> Regards,
>
> Venkat.


[PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-03 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Official Use Only - Internal Distribution Only]

Hi Maintainers,

PFA, the patch that enables support for the next generation AMD Zen3 CPU via 
-march=znver3.
This is a very basic enablement patch. As of now the cost, tuning and scheduler 
changes are kept same as znver2.
Further changes to the cost and tunings will be done later.

Ok for trunk ?

Regards,
Venkat.


X86_64-Enable-support-for-next-generation-AMD-Znver3.patch
Description: X86_64-Enable-support-for-next-generation-AMD-Znver3.patch