Re: [PATCH] AArch64: Tune case-values-threshold
Wilco Dijkstra writes: > Hi Richard, > >> The problem is that you're effectively asking for these values to be >> taken on faith without providing any analysis and without describing >> how you arrived at the new numbers. Did you try other values too? >> If so, how did they compare with the numbers that you finally chose? >> At least that would give an indication of where the boundaries are. > > Yes, I obviously tried other values, pretty much all in range 1-20. There is > generally a range of 4-5 values that are very similar in size, and then you > choose one in the middle which also looks good for performance. > >> For example, it's easier to believe that 8 is the right value for -Os if >> you say that you tried 9 and 7 as well, and they were worse than 8 by X% >> and Y%. This would also help anyone who wants to tweak the numbers >> again in future. > > For -Os, the size range for values 6-10 is within 0.01% so they are virtually > identical and I picked the median. Whether this will remain best in the future > is unclear since it depends on so many things, so at some point it needs > to be looked at again, just like most other tunings. Thanks. These details are useful. For example, if someone finds a compelling reason to bump the new values by +/-2 (to help with a particular test case) then it sounds we should accept that, since it wouldn't conflict with your work. So the patch is OK, thanks. (FWIW, I tried building a linux kernel I had lying around at -Os, which also showed an improvement of ~0.07%.) Richard
Re: [PATCH] AArch64: Tune case-values-threshold
Hi Richard, > The problem is that you're effectively asking for these values to be > taken on faith without providing any analysis and without describing > how you arrived at the new numbers. Did you try other values too? > If so, how did they compare with the numbers that you finally chose? > At least that would give an indication of where the boundaries are. Yes, I obviously tried other values, pretty much all in range 1-20. There is generally a range of 4-5 values that are very similar in size, and then you choose one in the middle which also looks good for performance. > For example, it's easier to believe that 8 is the right value for -Os if > you say that you tried 9 and 7 as well, and they were worse than 8 by X% > and Y%. This would also help anyone who wants to tweak the numbers > again in future. For -Os, the size range for values 6-10 is within 0.01% so they are virtually identical and I picked the median. Whether this will remain best in the future is unclear since it depends on so many things, so at some point it needs to be looked at again, just like most other tunings. > BTW, which CPU did you use to do the experiments? Are the tuning > parameters for that CPU already consistent with the new generic values? This was done on Neoverse N1. Almost no CPUs use per-CPU tuning for this. Cheers, Wilco
Re: [PATCH] AArch64: Tune case-values-threshold
On 19 October 2021 15:23:58 CEST, Richard Sandiford via Gcc-patches wrote: >Wilco Dijkstra writes: >> Hi Richard, >> >>> I'm just concerned that here we're using the same explanation but with >>> different numbers. Why are the new numbers more right than the old ones >>> (especially when it comes to code size, where the trade-off hasn't >>> really changed)? >> >> Like all tuning/costing parameters, these values are never fixed but change >> over time due to optimizations, micro architectures and workloads. >> The previous values were out of date so that's why I retuned them by >> benchmarking different values and choosing the best combinations. >> >>> It would be good to have more discussion of why certain numbers are >>> too small or too high, and why 8 is the right pivot point for -Os. >> >> You mean add more discussion in the comment? That comment is already overly >> large and too specific - it would be better to reduce it. The -Os value was >> never >> tuned, and 8 turns out to be faster and smaller than GCC's default. > >The problem is that you're effectively asking for these values to be >taken on faith without providing any analysis and without describing >how you arrived at the new numbers. Did you try other values too? >If so, how did they compare with the numbers that you finally chose? >At least that would give an indication of where the boundaries are. Maybe you can show csibe benchmark numbers to show the effects: http://szeged.github.io/csibe/ thanks, > >For example, it's easier to believe that 8 is the right value for -Os if >you say that you tried 9 and 7 as well, and they were worse than 8 by X% >and Y%. This would also help anyone who wants to tweak the numbers >again in future. > >BTW, which CPU did you use to do the experiments? Are the tuning >parameters for that CPU already consistent with the new generic values? > >Thanks, >Richard
Re: [PATCH] AArch64: Tune case-values-threshold
Wilco Dijkstra writes: > Hi Richard, > >> I'm just concerned that here we're using the same explanation but with >> different numbers. Why are the new numbers more right than the old ones >> (especially when it comes to code size, where the trade-off hasn't >> really changed)? > > Like all tuning/costing parameters, these values are never fixed but change > over time due to optimizations, micro architectures and workloads. > The previous values were out of date so that's why I retuned them by > benchmarking different values and choosing the best combinations. > >> It would be good to have more discussion of why certain numbers are >> too small or too high, and why 8 is the right pivot point for -Os. > > You mean add more discussion in the comment? That comment is already overly > large and too specific - it would be better to reduce it. The -Os value was > never > tuned, and 8 turns out to be faster and smaller than GCC's default. The problem is that you're effectively asking for these values to be taken on faith without providing any analysis and without describing how you arrived at the new numbers. Did you try other values too? If so, how did they compare with the numbers that you finally chose? At least that would give an indication of where the boundaries are. For example, it's easier to believe that 8 is the right value for -Os if you say that you tried 9 and 7 as well, and they were worse than 8 by X% and Y%. This would also help anyone who wants to tweak the numbers again in future. BTW, which CPU did you use to do the experiments? Are the tuning parameters for that CPU already consistent with the new generic values? Thanks, Richard
Re: [PATCH] AArch64: Tune case-values-threshold
Hi Richard, > I'm just concerned that here we're using the same explanation but with > different numbers. Why are the new numbers more right than the old ones > (especially when it comes to code size, where the trade-off hasn't > really changed)? Like all tuning/costing parameters, these values are never fixed but change over time due to optimizations, micro architectures and workloads. The previous values were out of date so that's why I retuned them by benchmarking different values and choosing the best combinations. > It would be good to have more discussion of why certain numbers are > too small or too high, and why 8 is the right pivot point for -Os. You mean add more discussion in the comment? That comment is already overly large and too specific - it would be better to reduce it. The -Os value was never tuned, and 8 turns out to be faster and smaller than GCC's default. Cheers, Wilco
Re: [PATCH] AArch64: Tune case-values-threshold
Wilco Dijkstra writes: > Tune the case-values-threshold setting for modern cores. A value of 11 > improves > SPECINT2017 by 0.2% and reduces codesize by 0.04%. With -Os use value 8 which > reduces codesize by 0.07%. > > Passes regress, OK for commit? > > ChangeLog: > > 2021-10-18 Wilco Dijkstra > > * config/aarch64/aarch64.c (aarch64_case_values_threshold): > Change to 8 with -Os, 11 otherwise. > > --- > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > f5b25a7f7041645921e6ad85714efda73b993492..adc5256c5ccc1182710d87cc6a1091083d888663 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -9360,8 +9360,8 @@ aarch64_cannot_force_const_mem (machine_mode mode > ATTRIBUTE_UNUSED, rtx x) > The expansion for a table switch is quite expensive due to the number > of instructions, the table lookup and hard to predict indirect jump. > When optimizing for speed, and -O3 enabled, use the per-core tuning if > - set, otherwise use tables for > 16 cases as a tradeoff between size and > - performance. When optimizing for size, use the default setting. */ > + set, otherwise use tables for >= 11 cases as a tradeoff between size and > + performance. When optimizing for size, use 8 for smallest codesize. */ I'm just concerned that here we're using the same explanation but with different numbers. Why are the new numbers more right than the old ones (especially when it comes to code size, where the trade-off hasn't really changed)? It would be good to have more discussion of why certain numbers are too small or too high, and why 8 is the right pivot point for -Os. Thanks, Richard > > static unsigned int > aarch64_case_values_threshold (void) > @@ -9372,7 +9372,7 @@ aarch64_case_values_threshold (void) >&& selected_cpu->tune->max_case_values != 0) > return selected_cpu->tune->max_case_values; >else > -return optimize_size ? default_case_values_threshold () : 17; > +return optimize_size ? 8 : 11; > } > > /* Return true if register REGNO is a valid index register.
[PATCH] AArch64: Tune case-values-threshold
Tune the case-values-threshold setting for modern cores. A value of 11 improves SPECINT2017 by 0.2% and reduces codesize by 0.04%. With -Os use value 8 which reduces codesize by 0.07%. Passes regress, OK for commit? ChangeLog: 2021-10-18 Wilco Dijkstra * config/aarch64/aarch64.c (aarch64_case_values_threshold): Change to 8 with -Os, 11 otherwise. --- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index f5b25a7f7041645921e6ad85714efda73b993492..adc5256c5ccc1182710d87cc6a1091083d888663 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -9360,8 +9360,8 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) The expansion for a table switch is quite expensive due to the number of instructions, the table lookup and hard to predict indirect jump. When optimizing for speed, and -O3 enabled, use the per-core tuning if - set, otherwise use tables for > 16 cases as a tradeoff between size and - performance. When optimizing for size, use the default setting. */ + set, otherwise use tables for >= 11 cases as a tradeoff between size and + performance. When optimizing for size, use 8 for smallest codesize. */ static unsigned int aarch64_case_values_threshold (void) @@ -9372,7 +9372,7 @@ aarch64_case_values_threshold (void) && selected_cpu->tune->max_case_values != 0) return selected_cpu->tune->max_case_values; else -return optimize_size ? default_case_values_threshold () : 17; +return optimize_size ? 8 : 11; } /* Return true if register REGNO is a valid index register.