Re: [PATCH 2/2] Introduce prefetch-dynamic-strides option.
On 05/01/2018 03:30 PM, Jeff Law wrote: On 01/22/2018 06:46 AM, Luis Machado wrote: The following patch adds an option to control software prefetching of memory references with non-constant/unknown strides. Currently we prefetch these references if the pass thinks there is benefit to doing so. But, since this is all based on heuristics, it's not always the case that we end up with better performance. For Falkor there is also the problem of conflicts with the hardware prefetcher, so we need to be more conservative in terms of what we issue software prefetch hints for. This also aligns GCC with what LLVM does for Falkor. Similarly to the previous patch, the defaults guarantee no change in behavior for other targets and architectures. I've regression-tested and bootstrapped it on aarch64-linux. No problems found. Ok? 2018-01-22 Luis MachadoIntroduce option to control whether the software prefetch pass should issue prefetch hints for non-constant strides. gcc/ * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) : New const unsigned int field. * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include prefetch_dynamic_strides. (exynosm1_prefetch_tune): Likewise. (thunderxt88_prefetch_tune): Likewise. (thunderx_prefetch_tune): Likewise. (thunderx2t99_prefetch_tune): Likewise. (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0. (aarch64_override_options_internal): Update to set PARAM_PREFETCH_DYNAMIC_STRIDES. * doc/invoke.texi (prefetch-dynamic-strides): Document new option. * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New. * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define. * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for prefetch-dynamic-strides setting. OK for the trunk. jeff Thanks. Committed as r259996.
Re: [PATCH 2/2] Introduce prefetch-dynamic-strides option.
On 01/22/2018 06:46 AM, Luis Machado wrote: > The following patch adds an option to control software prefetching of memory > references with non-constant/unknown strides. > > Currently we prefetch these references if the pass thinks there is benefit to > doing so. But, since this is all based on heuristics, it's not always the case > that we end up with better performance. > > For Falkor there is also the problem of conflicts with the hardware > prefetcher, > so we need to be more conservative in terms of what we issue software prefetch > hints for. > > This also aligns GCC with what LLVM does for Falkor. > > Similarly to the previous patch, the defaults guarantee no change in behavior > for other targets and architectures. > > I've regression-tested and bootstrapped it on aarch64-linux. No problems > found. > > Ok? > > 2018-01-22 Luis Machado> > Introduce option to control whether the software prefetch pass should > issue prefetch hints for non-constant strides. > > gcc/ > * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) > : New const unsigned int field. > * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include > prefetch_dynamic_strides. > (exynosm1_prefetch_tune): Likewise. > (thunderxt88_prefetch_tune): Likewise. > (thunderx_prefetch_tune): Likewise. > (thunderx2t99_prefetch_tune): Likewise. > (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0. > (aarch64_override_options_internal): Update to set > PARAM_PREFETCH_DYNAMIC_STRIDES. > * doc/invoke.texi (prefetch-dynamic-strides): Document new option. > * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New. > * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define. > * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for > prefetch-dynamic-strides setting. OK for the trunk. jeff
Re: [PATCH 2/2] Introduce prefetch-dynamic-strides option.
Hi, On 01/23/2018 07:40 AM, Kyrill Tkachov wrote: Hi Luis, On 22/01/18 13:46, Luis Machado wrote: The following patch adds an option to control software prefetching of memory references with non-constant/unknown strides. Currently we prefetch these references if the pass thinks there is benefit to doing so. But, since this is all based on heuristics, it's not always the case that we end up with better performance. For Falkor there is also the problem of conflicts with the hardware prefetcher, so we need to be more conservative in terms of what we issue software prefetch hints for. This also aligns GCC with what LLVM does for Falkor. Similarly to the previous patch, the defaults guarantee no change in behavior for other targets and architectures. I've regression-tested and bootstrapped it on aarch64-linux. No problems found. Ok? This also looks like a sensible approach to me with a caveat inline. The same general comments as for patch [1/2] apply. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 8736bd9..22bd9ae 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -230,6 +230,9 @@ struct cpu_prefetch_tune const int l1_cache_size; const int l1_cache_line_size; const int l2_cache_size; + /* Whether software prefetch hints should be issued for non-constant + strides. */ + const unsigned int prefetch_dynamic_strides; I understand that the midend PARAMs are defined as integers, but I think the backend tuning option here is better represented as a boolean as it really is just a yes/no decision. I started off with a boolean to be honest. Then i noticed the midend only used integers, which i restricted to the range of 0..1. I'll change this locally to use booleans again. Thanks! Luis
Re: [PATCH 2/2] Introduce prefetch-dynamic-strides option.
Hi Luis, On 22/01/18 13:46, Luis Machado wrote: The following patch adds an option to control software prefetching of memory references with non-constant/unknown strides. Currently we prefetch these references if the pass thinks there is benefit to doing so. But, since this is all based on heuristics, it's not always the case that we end up with better performance. For Falkor there is also the problem of conflicts with the hardware prefetcher, so we need to be more conservative in terms of what we issue software prefetch hints for. This also aligns GCC with what LLVM does for Falkor. Similarly to the previous patch, the defaults guarantee no change in behavior for other targets and architectures. I've regression-tested and bootstrapped it on aarch64-linux. No problems found. Ok? This also looks like a sensible approach to me with a caveat inline. The same general comments as for patch [1/2] apply. Thanks, Kyrill 2018-01-22 Luis MachadoIntroduce option to control whether the software prefetch pass should issue prefetch hints for non-constant strides. gcc/ * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) : New const unsigned int field. * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include prefetch_dynamic_strides. (exynosm1_prefetch_tune): Likewise. (thunderxt88_prefetch_tune): Likewise. (thunderx_prefetch_tune): Likewise. (thunderx2t99_prefetch_tune): Likewise. (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0. (aarch64_override_options_internal): Update to set PARAM_PREFETCH_DYNAMIC_STRIDES. * doc/invoke.texi (prefetch-dynamic-strides): Document new option. * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New. * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define. * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for prefetch-dynamic-strides setting. --- gcc/config/aarch64/aarch64-protos.h | 3 +++ gcc/config/aarch64/aarch64.c| 11 +++ gcc/doc/invoke.texi | 10 ++ gcc/params.def | 9 + gcc/params.h| 2 ++ gcc/tree-ssa-loop-prefetch.c| 10 ++ 6 files changed, 45 insertions(+) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 8736bd9..22bd9ae 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -230,6 +230,9 @@ struct cpu_prefetch_tune const int l1_cache_size; const int l1_cache_line_size; const int l2_cache_size; + /* Whether software prefetch hints should be issued for non-constant + strides. */ + const unsigned int prefetch_dynamic_strides; I understand that the midend PARAMs are defined as integers, but I think the backend tuning option here is better represented as a boolean as it really is just a yes/no decision. /* The minimum constant stride beyond which we should use prefetch hints for. */ const int minimum_stride; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 0ed9f14..713b230 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune = -1, /* l1_cache_size */ -1, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -557,6 +558,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune = -1, /* l1_cache_size */ 64, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -567,6 +569,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 1024,/* l2_cache_size */ + 0, /* prefetch_dynamic_strides */ 2048,/* minimum_stride */ 3/* default_opt_level */ }; @@ -577,6 +580,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ 16*1024, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ 3/* default_opt_level */ }; @@ -587,6 +591,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ -1,
[PATCH 2/2] Introduce prefetch-dynamic-strides option.
The following patch adds an option to control software prefetching of memory references with non-constant/unknown strides. Currently we prefetch these references if the pass thinks there is benefit to doing so. But, since this is all based on heuristics, it's not always the case that we end up with better performance. For Falkor there is also the problem of conflicts with the hardware prefetcher, so we need to be more conservative in terms of what we issue software prefetch hints for. This also aligns GCC with what LLVM does for Falkor. Similarly to the previous patch, the defaults guarantee no change in behavior for other targets and architectures. I've regression-tested and bootstrapped it on aarch64-linux. No problems found. Ok? 2018-01-22 Luis MachadoIntroduce option to control whether the software prefetch pass should issue prefetch hints for non-constant strides. gcc/ * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) : New const unsigned int field. * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include prefetch_dynamic_strides. (exynosm1_prefetch_tune): Likewise. (thunderxt88_prefetch_tune): Likewise. (thunderx_prefetch_tune): Likewise. (thunderx2t99_prefetch_tune): Likewise. (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0. (aarch64_override_options_internal): Update to set PARAM_PREFETCH_DYNAMIC_STRIDES. * doc/invoke.texi (prefetch-dynamic-strides): Document new option. * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New. * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define. * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for prefetch-dynamic-strides setting. --- gcc/config/aarch64/aarch64-protos.h | 3 +++ gcc/config/aarch64/aarch64.c| 11 +++ gcc/doc/invoke.texi | 10 ++ gcc/params.def | 9 + gcc/params.h| 2 ++ gcc/tree-ssa-loop-prefetch.c| 10 ++ 6 files changed, 45 insertions(+) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 8736bd9..22bd9ae 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -230,6 +230,9 @@ struct cpu_prefetch_tune const int l1_cache_size; const int l1_cache_line_size; const int l2_cache_size; + /* Whether software prefetch hints should be issued for non-constant + strides. */ + const unsigned int prefetch_dynamic_strides; /* The minimum constant stride beyond which we should use prefetch hints for. */ const int minimum_stride; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 0ed9f14..713b230 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune = -1, /* l1_cache_size */ -1, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -557,6 +558,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune = -1, /* l1_cache_size */ 64, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -567,6 +569,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 1024,/* l2_cache_size */ + 0, /* prefetch_dynamic_strides */ 2048,/* minimum_stride */ 3/* default_opt_level */ }; @@ -577,6 +580,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ 16*1024, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ 3/* default_opt_level */ }; @@ -587,6 +591,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -597,6 +602,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 256,