Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores
Hi Kyrill, > Hmm, I'm not too confident on that. I'd support such a change for the > generic arm_cortex_tune, definitely, and the Armv8-a based ones, but I > don't think the argument is as strong for Cortex-A7, Cortex-A8, Cortex-A9. > > So let's make the change for the Armv8-A-based cores now. If you get > benchmarking data for the older ones (such systems may or may not be > easy to get a hold of) we can update those separately. I ran some experiments on Cortex-A53 and this shows the difference between 2, 3 and 4 is less than for out-of-order cores (which clearly prefer 2). So it seems alright to set it to 4 for the older in-order cores - see updated patch below. >> Set max_cond_insns >> to 4 on Thumb-2 architectures given it's already limited to that by >> MAX_INSN_PER_IT_BLOCK. Also use the CPU tuning setting when a CPU/tune >> is selected if -mrestrict-it is not explicitly set. > > This can go in as a separate patch from the rest, thanks. Sure, I'll split that part off into a separate patch. Cheers, Wilco [PATCH v2][ARM] Improve max_cond_insns setting for Cortex cores Various CPUs have max_cond_insns set to 5 due to historical reasons. Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A cores, so change it to 2. Set it to 4 on older in-order cores as that is the MAX_INSN_PER_IT_BLOCK limit for Thumb-2. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-12-03 Wilco Dijkstra * config/arm/arm.c (arm_v6t2_tune): Set max_cond_insns to 4. (arm_cortex_tune): Set max_cond_insns to 2. (arm_cortex_a8_tune): Set max_cond_insns to 4. (arm_cortex_a7_tune): Likewise. (arm_cortex_a35_tune): Set max_cond_insns to 2. (arm_cortex_a53_tune): Likewise. (arm_cortex_a5_tune): Set max_cond_insns to 4. (arm_cortex_a9_tune): Likewise. (arm_v6m_tune): Likewise. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a6b401b7f2e3738ff68316bd83d6e5a2bcf0e7d7..daebe76352d62ad94556762b4e3bc3d0532ad411 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1947,7 +1947,7 @@ const struct tune_params arm_v6t2_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 4, /* Max cond insns. */ 8, /* Memset max inline. */ 1, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -1971,7 +1971,7 @@ const struct tune_params arm_cortex_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -1993,7 +1993,7 @@ const struct tune_params arm_cortex_a8_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 4, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2015,7 +2015,7 @@ const struct tune_params arm_cortex_a7_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 4, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2059,7 +2059,7 @@ const struct tune_params arm_cortex_a35_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 1, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2081,7 +2081,7 @@ const struct tune_params arm_cortex_a53_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2,
Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores
Hi Kyrill, > Hmm, I'm not too confident on that. I'd support such a change for the > generic arm_cortex_tune, definitely, and the Armv8-a based ones, but I > don't think the argument is as strong for Cortex-A7, Cortex-A8, Cortex-A9. > > So let's make the change for the Armv8-A-based cores now. If you get > benchmarking data for the older ones (such systems may or may not be > easy to get a hold of) we can update those separately. I ran some experiments on Cortex-A53 and this shows the difference between 2, 3 and 4 is less than for out-of-order cores (which clearly prefer 2). So it seems alright to set it to 4 for the older in-order cores - see updated patch below. >> Set max_cond_insns >> to 4 on Thumb-2 architectures given it's already limited to that by >> MAX_INSN_PER_IT_BLOCK. Also use the CPU tuning setting when a CPU/tune >> is selected if -mrestrict-it is not explicitly set. > > This can go in as a separate patch from the rest, thanks. Sure, I'll split that part off into a separate patch. Cheers, Wilco [PATCH v2][ARM] Improve max_cond_insns setting for Cortex cores Various CPUs have max_cond_insns set to 5 due to historical reasons. Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A cores, so change it to 2. Set it to 4 on older in-order cores as that is the MAX_INSN_PER_IT_BLOCK limit for Thumb-2. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-12-03 Wilco Dijkstra * config/arm/arm.c (arm_v6t2_tune): Set max_cond_insns to 4. (arm_cortex_tune): Set max_cond_insns to 2. (arm_cortex_a8_tune): Set max_cond_insns to 4. (arm_cortex_a7_tune): Likewise. (arm_cortex_a35_tune): Set max_cond_insns to 2. (arm_cortex_a53_tune): Likewise. (arm_cortex_a5_tune): Set max_cond_insns to 4. (arm_cortex_a9_tune): Likewise. (arm_v6m_tune): Likewise. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a6b401b7f2e3738ff68316bd83d6e5a2bcf0e7d7..daebe76352d62ad94556762b4e3bc3d0532ad411 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1947,7 +1947,7 @@ const struct tune_params arm_v6t2_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 4, /* Max cond insns. */ 8, /* Memset max inline. */ 1, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -1971,7 +1971,7 @@ const struct tune_params arm_cortex_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -1993,7 +1993,7 @@ const struct tune_params arm_cortex_a8_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 4, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2015,7 +2015,7 @@ const struct tune_params arm_cortex_a7_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 4, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2059,7 +2059,7 @@ const struct tune_params arm_cortex_a35_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 1, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2081,7 +2081,7 @@ const struct tune_params arm_cortex_a53_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2,
Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores
Hi Wilco, On 11/19/19 3:11 PM, Wilco Dijkstra wrote: ping Various CPUs have max_cond_insns set to 5 due to historical reasons. Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A cores, so change it to 2 for all Cortex-A cores. Hmm, I'm not too confident on that. I'd support such a change for the generic arm_cortex_tune, definitely, and the Armv8-a based ones, but I don't think the argument is as strong for Cortex-A7, Cortex-A8, Cortex-A9. So let's make the change for the Armv8-A-based cores now. If you get benchmarking data for the older ones (such systems may or may not be easy to get a hold of) we can update those separately. Set max_cond_insns to 4 on Thumb-2 architectures given it's already limited to that by MAX_INSN_PER_IT_BLOCK. Also use the CPU tuning setting when a CPU/tune is selected if -mrestrict-it is not explicitly set. This can go in as a separate patch from the rest, thanks. On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well as a 0.4% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-08-19 Wilco Dijkstra * gcc/config/arm/arm.c (arm_option_override_internal): Use max_cond_insns from CPU tuning unless -mrestrict-it is used. (arm_v6t2_tune): set max_cond_insns to 4. (arm_cortex_tune): set max_cond_insns to 2. (arm_cortex_a8_tune): Likewise. (arm_cortex_a7_tune): Likewise. (arm_cortex_a35_tune): Likewise. (arm_cortex_a53_tune): Likewise. (arm_cortex_a5_tune): Likewise. (arm_cortex_a9_tune): Likewise. (arm_v6m_tune): set max_cond_insns to 4. No "gcc/" in the ChangeLog path. Thanks, Kyrill --- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 628cf02f23fb29392a63d87f561c3ee2fb73a515..38ac16ad1def91ca78ccfa98fd1679b2b5114851 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1943,7 +1943,7 @@ const struct tune_params arm_v6t2_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 4, /* Max cond insns. */ 8, /* Memset max inline. */ 1, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -1968,7 +1968,7 @@ const struct tune_params arm_cortex_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -1991,7 +1991,7 @@ const struct tune_params arm_cortex_a8_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2014,7 +2014,7 @@ const struct tune_params arm_cortex_a7_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2060,7 +2060,7 @@ const struct tune_params arm_cortex_a35_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 1, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2083,7 +2083,7 @@ const struct tune_params arm_cortex_a53_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2167,9 +2167,6 @@ const struct tune_params
Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores
ping Various CPUs have max_cond_insns set to 5 due to historical reasons. Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A cores, so change it to 2 for all Cortex-A cores. Set max_cond_insns to 4 on Thumb-2 architectures given it's already limited to that by MAX_INSN_PER_IT_BLOCK. Also use the CPU tuning setting when a CPU/tune is selected if -mrestrict-it is not explicitly set. On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well as a 0.4% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-08-19 Wilco Dijkstra * gcc/config/arm/arm.c (arm_option_override_internal): Use max_cond_insns from CPU tuning unless -mrestrict-it is used. (arm_v6t2_tune): set max_cond_insns to 4. (arm_cortex_tune): set max_cond_insns to 2. (arm_cortex_a8_tune): Likewise. (arm_cortex_a7_tune): Likewise. (arm_cortex_a35_tune): Likewise. (arm_cortex_a53_tune): Likewise. (arm_cortex_a5_tune): Likewise. (arm_cortex_a9_tune): Likewise. (arm_v6m_tune): set max_cond_insns to 4. --- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 628cf02f23fb29392a63d87f561c3ee2fb73a515..38ac16ad1def91ca78ccfa98fd1679b2b5114851 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1943,7 +1943,7 @@ const struct tune_params arm_v6t2_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 4, /* Max cond insns. */ 8, /* Memset max inline. */ 1, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -1968,7 +1968,7 @@ const struct tune_params arm_cortex_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -1991,7 +1991,7 @@ const struct tune_params arm_cortex_a8_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2014,7 +2014,7 @@ const struct tune_params arm_cortex_a7_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2060,7 +2060,7 @@ const struct tune_params arm_cortex_a35_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 1, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2083,7 +2083,7 @@ const struct tune_params arm_cortex_a53_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2167,9 +2167,6 @@ const struct tune_params arm_xgene1_tune = tune_params::SCHED_AUTOPREF_OFF }; -/* Branches can be dual-issued on Cortex-A5, so conditional execution is - less appealing. Set max_insns_skipped to a low value. */ - const struct tune_params arm_cortex_a5_tune = { _extra_costs, @@ -2178,7 +2175,7 @@ const struct tune_params arm_cortex_a5_tune = arm_cortex_a5_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 1, /* Max cond insns. */ + 2, /* Max cond insns. */
[PATCH][ARM] Improve max_cond_insns setting for Cortex cores
Various CPUs have max_cond_insns set to 5 due to historical reasons. Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A cores, so change it to 2 for all Cortex-A cores. Set max_cond_insns to 4 on Thumb-2 architectures given it's already limited to that by MAX_INSN_PER_IT_BLOCK. Also use the CPU tuning setting when a CPU/tune is selected if -mrestrict-it is not explicitly set. On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well as a 0.4% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-08-19 Wilco Dijkstra * gcc/config/arm/arm.c (arm_option_override_internal): Use max_cond_insns from CPU tuning unless -mrestrict-it is used. (arm_v6t2_tune): set max_cond_insns to 4. (arm_cortex_tune): set max_cond_insns to 2. (arm_cortex_a8_tune): Likewise. (arm_cortex_a7_tune): Likewise. (arm_cortex_a35_tune): Likewise. (arm_cortex_a53_tune): Likewise. (arm_cortex_a5_tune): Likewise. (arm_cortex_a9_tune): Likewise. (arm_v6m_tune): set max_cond_insns to 4. --- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 628cf02f23fb29392a63d87f561c3ee2fb73a515..38ac16ad1def91ca78ccfa98fd1679b2b5114851 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1943,7 +1943,7 @@ const struct tune_params arm_v6t2_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 4, /* Max cond insns. */ 8, /* Memset max inline. */ 1, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -1968,7 +1968,7 @@ const struct tune_params arm_cortex_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -1991,7 +1991,7 @@ const struct tune_params arm_cortex_a8_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2014,7 +2014,7 @@ const struct tune_params arm_cortex_a7_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2060,7 +2060,7 @@ const struct tune_params arm_cortex_a35_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 1, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2083,7 +2083,7 @@ const struct tune_params arm_cortex_a53_tune = arm_default_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2167,9 +2167,6 @@ const struct tune_params arm_xgene1_tune = tune_params::SCHED_AUTOPREF_OFF }; -/* Branches can be dual-issued on Cortex-A5, so conditional execution is - less appealing. Set max_insns_skipped to a low value. */ - const struct tune_params arm_cortex_a5_tune = { _extra_costs, @@ -2178,7 +2175,7 @@ const struct tune_params arm_cortex_a5_tune = arm_cortex_a5_branch_cost, _default_vec_cost, 1, /* Constant limit. */ - 1, /* Max cond insns. */ + 2, /* Max cond insns. */ 8,