[PATCH] PR rtl-optimization/106594: Preserve zero_extend when cheap.
This patch addresses PR rtl-optimization/106594, a significant performance regression affecting aarch64 recently introduced (exposed) by one of my recent RTL simplification improvements. Firstly many thanks to Tamar Christina for confirming that the core of this patch provides ~5% performance improvement on some on his benchmarks. GCC's combine pass uses the function expand_compound_operation to conceptually simplify/canonicalize SIGN_EXTEND and ZERO_EXTEND as a pair of shift operations, as not all targets support extension instructions [technically ZERO_EXTEND may potentially be simplified/ canonicalized to an AND operation, but the theory remains the same]. In that function, around line 7226 of combine.cc, there's an optimization that's remarkably similar to part of my recent simplify-rtx patch posted at https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599196.html The comment above this code reads: /* Convert sign extension to zero extension, if we know that the high bit is not set, as this is easier to optimize. It will be converted back to cheaper alternative in make_extraction. */ which is exactly the SIGN_EXTEND to ZERO_EXTEND canonicalization that we now perform in simplify-rtx. The significant difference is that this code checks the backend's RTL costs, via set_src_cost, and selects either the SIGN_EXTEND, the ZERO_EXTEND or the pair of SHIFTs depending on which is cheaper. The problem is that now that SIGN_EXTEND is converted to ZERO_EXTEND earlier, this transform/check is no longer triggered, and as a result the incoming ZERO_EXTEND is always converted to a pair of shifts, irrespective of the backend's costs. The latent bug, revealed by benchmarking, is that we should avoid converting SIGN_EXTEND or ZERO_EXTEND into (slower) shifts on targets where extensions are cheap (i.e. a single instruction, that's cheaper than two shift instructions, or as cheap as an AND). This core fix (and performance improvement) is the first chunk of the attached patch. Unfortunately (as is often the case), as soon as you tweak the RTL stream/canonical forms of instructions, you expose a number of missed optimization issues, in both the middle-end and backends, that were expecting one pattern but now see a (cheaper) equivalent pattern... The remaining chunks affecting expand_compound_operation, prevent combine from generating SUBREGs of RTX other than REG or MEM (considered invalid RTL) where previously gen_lowpart would have generated a CLOBBER. In simplify_unary_operation_1, the middle-end can create rtx for FFS, PARITY and POPCOUNT where the operand mode didn't match the result mode [which is no longer supported according to the RTL documentation]. In i386.md, I needed to add variations of the define_insn_and_split patterns for *clzsi2_lzcnt_zext, *clzsi2_lzcnt_zext_falsedep, *bmi2_bzhi_zero_extendsidi_4 , *popcountsi2_zext, *popcountsi2_zext_falsedep, *popcounthi2 to recognize ZERO_EXTEND in addition to the previous AND forms, and I added a variation of a popcount-related peephole2 now that we generate one less instruction in the input sequence that it's expecting to match. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. I'm happy to help fix any turbulence encountered by targets with cheap sign/zero extension operations, but most should see a performance improvement, hopefully to better than before the identified performance regression. Ok for mainline? Fingers-crossed, Uros can review the x86 backend changes, which are potentially independent (fixing regressions caused by the middle-end changes), but included in this post to provide better context. TIA. 2022-09-12 Roger Sayle gcc/ChangeLog PR rtl-optimization/106594 * gcc/combine.cc (expand_compound_operation): Don't expand/transform ZERO_EXTEND or SIGN_EXTEND on targets where rtx_cost claims they are cheap. If gen_lowpart returns a SUBREG of something other than a REG or a MEM, i.e. invalid RTL, return the original expression. * gcc/simplify-rtx.cc (simplify_unary_operation_1) : Avoid generating FFS with mismatched operand and result modes, by using an explicit SIGN_EXTEND/ZERO_EXTEND instead. : Likewise, for POPCOUNT of ZERO_EXTEND. : Likewise, for PARITY of {ZERO,SIGN}_EXTEND. * config/i386/i386.md (*clzsi2_lzcnt_zext_2): define_insn_and_split to match ZERO_EXTEND form of *clzsi2_lzcnt_zext. (*clzsi2_lzcnt_zext_2_falsedep): Likewise, new define_insn to match ZERO_EXTEND form of *clzsi2_lzcnt_zext_falsedep. (*bmi2_bzhi_zero_extendsidi_5): Likewise, new define_insn to match ZERO_EXTEND form of *bmi2_bzhi_zero_extendsidi. (*popcountsi2_zext_2): Likewise, new define_insn_and_split to match ZERO_EXTEND form of *popcountsi2_zext. (*popcountsi2_zext_2_falsedep):
[PATCH 2/2] xtensa: Implement new target hook: TARGET_CONSTANT_OK_FOR_CPROP_P
This patch implements new target hook TARGET_CONSTANT_OK_FOR_CPROP_P in order to exclude CONST_INTs that cannot fit into a MOVI machine instruction from cprop. gcc/ChangeLog: * config/xtensa/xtensa.c (TARGET_CONSTANT_OK_FOR_CPROP_P): New macro definition. (xtensa_constant_ok_for_cprop_p): Implement the hook as mentioned above. --- gcc/config/xtensa/xtensa.cc | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc index ac52c015a94..5c432cc65aa 100644 --- a/gcc/config/xtensa/xtensa.cc +++ b/gcc/config/xtensa/xtensa.cc @@ -191,6 +191,7 @@ static bool xtensa_can_eliminate (const int from ATTRIBUTE_UNUSED, static HOST_WIDE_INT xtensa_starting_frame_offset (void); static unsigned HOST_WIDE_INT xtensa_asan_shadow_offset (void); static bool xtensa_function_ok_for_sibcall (tree, tree); +static bool xtensa_constant_ok_for_cprop_p (const_rtx); static rtx xtensa_delegitimize_address (rtx); @@ -345,12 +346,15 @@ static rtx xtensa_delegitimize_address (rtx); #undef TARGET_HAVE_SPECULATION_SAFE_VALUE #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed -#undef TARGET_DELEGITIMIZE_ADDRESS -#define TARGET_DELEGITIMIZE_ADDRESS xtensa_delegitimize_address - #undef TARGET_FUNCTION_OK_FOR_SIBCALL #define TARGET_FUNCTION_OK_FOR_SIBCALL xtensa_function_ok_for_sibcall +#undef TARGET_CONSTANT_OK_FOR_CPROP_P +#define TARGET_CONSTANT_OK_FOR_CPROP_P xtensa_constant_ok_for_cprop_p + +#undef TARGET_DELEGITIMIZE_ADDRESS +#define TARGET_DELEGITIMIZE_ADDRESS xtensa_delegitimize_address + struct gcc_target targetm = TARGET_INITIALIZER; @@ -4983,6 +4987,16 @@ xtensa_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED, tree exp ATTRIBUTE_U return true; } +/* Implement TARGET_CONSTANT_OK_FOR_CPROP_P. */ +static bool +xtensa_constant_ok_for_cprop_p (const_rtx x) +{ + if (CONST_INT_P (x) && ! xtensa_simm12b (INTVAL (x))) +return false; + + return true; +} + static rtx xtensa_delegitimize_address (rtx op) { -- 2.20.1
[PATCH 1/2] Add new target hook: constant_ok_for_cprop_p
Hi, Many RISC machines, as we know, have some restrictions on placing register-width constants in the source of load-immediate machine instructions, so the target must provide a solution for that in the machine description. A naive way would be to solve it early, ie. to replace with read constants pooled in memory when expanding to RTL. Alternatively, a more fancy approach would be to forgo placement in the constant pool until somewhere before the reload/LRA eg. the "split1" pass to give the optimization passes that involve immediates a chance to work. If we choose the latter, we can expect better results with RTL if-conversion, constant folding, etc., but it often propagates constants that are too large in size to resolve to a simple load-immediate instruction. This is because constant propagation has no way of telling about it, so this patch provides it. === This new target hook can be used to tell cprop whether or not to propagate a constant depending on its contents. For backwards compatibility, the default setting for this hook retains the old behavior. gcc/ChangeLog: * hooks.h (hook_bool_const_rtx_true): New prototype. * hooks.cc (hook_bool_const_rtx_true): New default hook. * target.def (constant_ok_for_cprop_p): New target hook. * cprop.cc (cprop_constant_p): Change to use the hook. * doc/tm.texi.in, (TARGET_CONSTANT_OK_FOR_CPROP_P): New @hook. * doc/tm.texi (TARGET_CONSTANT_OK_FOR_CPROP_P): New document. --- gcc/cprop.cc | 4 +++- gcc/doc/tm.texi| 12 gcc/doc/tm.texi.in | 2 ++ gcc/hooks.cc | 7 +++ gcc/hooks.h| 1 + gcc/target.def | 14 ++ 6 files changed, 39 insertions(+), 1 deletion(-) diff --git a/gcc/cprop.cc b/gcc/cprop.cc index 580f811545d..dfb1e88e9b4 100644 --- a/gcc/cprop.cc +++ b/gcc/cprop.cc @@ -40,6 +40,7 @@ along with GCC; see the file COPYING3. If not see #include "dbgcnt.h" #include "cfgloop.h" #include "gcse.h" +#include "target.h" /* An obstack for our working variables. */ @@ -249,7 +250,8 @@ insert_set_in_table (rtx dest, rtx src, rtx_insn *insn, static bool cprop_constant_p (const_rtx x) { - return CONSTANT_P (x) && (GET_CODE (x) != CONST || shared_const_p (x)); + return CONSTANT_P (x) && targetm.constant_ok_for_cprop_p (x) +&& (GET_CODE (x) != CONST || shared_const_p (x)); } /* Determine whether the rtx X should be treated as a register that can diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 858bfb80cec..83151626a71 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -12187,6 +12187,18 @@ MIPS, where add-immediate takes a 16-bit signed value, is zero, which disables this optimization. @end deftypevr +@deftypefn {Target Hook} bool TARGET_CONSTANT_OK_FOR_CPROP_P (const_rtx @var{cst}) +On some target machines, such as RISC ones, load-immediate instructions +often have a limited range (for example, within signed 12 bits or less). +Because they will be typically placed into the constant pool, +unconditionally propagating constants that exceed such limit can lead to +increased number of instruction and/or memory read access. +This target hook should return @code{false} if @var{cst}, a candidate for +constant propagation, is undesirable as a source for load-immediate +instructions. +The default version of this hook always returns @code{true}. +@end deftypefn + @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_ASAN_SHADOW_OFFSET (void) Return the offset bitwise ored into shifted address to get corresponding Address Sanitizer shadow memory address. NULL if Address Sanitizer is not diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 21b849ea32a..147331b0f53 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7887,6 +7887,8 @@ and the associated definitions of those functions. @hook TARGET_CONST_ANCHOR +@hook TARGET_CONSTANT_OK_FOR_CPROP_P + @hook TARGET_ASAN_SHADOW_OFFSET @hook TARGET_MEMMODEL_CHECK diff --git a/gcc/hooks.cc b/gcc/hooks.cc index b29233f4f85..67bf3553d26 100644 --- a/gcc/hooks.cc +++ b/gcc/hooks.cc @@ -82,6 +82,13 @@ hook_bool_mode_true (machine_mode) return true; } +/* Generic hook that takes (const_rtx) and returns true. */ +bool +hook_bool_const_rtx_true (const_rtx) +{ + return true; +} + /* Generic hook that takes (machine_mode, machine_mode) and returns true. */ bool hook_bool_mode_mode_true (machine_mode, machine_mode) diff --git a/gcc/hooks.h b/gcc/hooks.h index 1056e1e9e4d..d001f8fb9dc 100644 --- a/gcc/hooks.h +++ b/gcc/hooks.h @@ -30,6 +30,7 @@ extern bool hook_bool_bool_gcc_optionsp_false (bool, struct gcc_options *); extern bool hook_bool_const_int_const_int_true (const int, const int); extern bool hook_bool_mode_false (machine_mode); extern bool hook_bool_mode_true (machine_mode); +extern bool hook_bool_const_rtx_true (const_rtx); extern bool hook_bool_mode_mode_true (machine_mode, machine_mode); extern bool
Re: [PATCH v2] gcov: Respect triplet when looking for gcov
Le 11/09/2022 à 18:04, Torbjorn SVENSSON a écrit : Can you fix it for me and submit it or do you want me to send a v3? For trivial things like this, there is no need for a v3 (nor was there for a v2). Do you miss a git write account and need someone to push for you?
Re: [PATCH v2] gcov: Respect triplet when looking for gcov
Hi, On 2022-09-11 16:34, Mikael Morin wrote: Hello, diff --git a/gcc/testsuite/gcc.misc-tests/gcov.exp b/gcc/testsuite/gcc.misc-tests/gcov.exp index 82376d90ac2..a55ce234f6e 100644 --- a/gcc/testsuite/gcc.misc-tests/gcov.exp +++ b/gcc/testsuite/gcc.misc-tests/gcov.exp @@ -24,9 +24,9 @@ global GCC_UNDER_TEST (...) } else { - set GCOV gcov + set GCOV {transform gcov] Typo: I guess the opening curly bracket '{' should be a square one '['? Yes. Apparently I was too stressed when preparing this patch. Can you fix it for me and submit it or do you want me to send a v3?
Re: [PATCH v2] gcov: Respect triplet when looking for gcov
Hello, diff --git a/gcc/testsuite/gcc.misc-tests/gcov.exp b/gcc/testsuite/gcc.misc-tests/gcov.exp index 82376d90ac2..a55ce234f6e 100644 --- a/gcc/testsuite/gcc.misc-tests/gcov.exp +++ b/gcc/testsuite/gcc.misc-tests/gcov.exp @@ -24,9 +24,9 @@ global GCC_UNDER_TEST (...) } else { -set GCOV gcov +set GCOV {transform gcov] Typo: I guess the opening curly bracket '{' should be a square one '['?
Re: [PATCH] analyzer: consider empty ranges and zero byte accesses [PR106845]
> ...it took me a moment to realize that the analyzer "sees" that this is > "main", and thus buf_size is 0. > > Interestingly, if I rename it to not be "main" (and thus buf_size could > be non-zero), we still don't complain: > https://godbolt.org/z/PezfTo9Mz > Presumably this is a known limitation of the symbolic bounds checking? Yeah. I do only try structural equality for binaryop_svalues. The example does result in a call to eval_condition_without_cm with two unaryop_svalue(NOP_EXPR, initial_svalue ('buf_size')) that have different types ('unsigned int' and 'sizetype'). Thus, lhs == rhs is false and eval_condition_without_cm does return UNKNOWN. Changing the type of buf_size to size_t removes the UNARYOP wrapping and thus, emits a warning: https://godbolt.org/z/4sh7TM4v1 [0] Otherwise, we could also do a call to structural_equality for unaryop_svalue inside eval_condition_without_cm and ignore a type mismatch for unaryop_svalues. That way, the analyzer would complain about your example. Not 100% sure but I think it is okay to ignore the type here for unaryop_svalues as long as the leafs match up. If you agree, I can prepare a patch [1]. [0] I've seen you pushed a patch that displays the capacity as a new event at region_creation. My patches did that by overwriting whats printed using describe_region_creation_event. Should I remove all those now unneccessary describe_region_creation_event overloads? [1] Below is how that would probably look like. --- gcc/analyzer/region-model.cc | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc index 82006405373..4a9f0ff1e86 100644 --- a/gcc/analyzer/region-model.cc +++ b/gcc/analyzer/region-model.cc @@ -4190,6 +4190,24 @@ region_model::eval_condition_without_cm (const svalue *lhs, } } + if (lhs->get_kind () == SK_UNARYOP) +{ + switch (op) + { + default: + break; + case EQ_EXPR: + case LE_EXPR: + case GE_EXPR: + { + tristate res = structural_equality (lhs, rhs); + if (res.is_true ()) + return res; + } + break; + } +} + return tristate::TS_UNKNOWN; } @@ -4307,9 +4325,7 @@ region_model::structural_equality (const svalue *a, const svalue *b) const { const unaryop_svalue *un_a = as_a (a); if (const unaryop_svalue *un_b = dyn_cast (b)) - return tristate (pending_diagnostic::same_tree_p (un_a->get_type (), - un_b->get_type ()) - && un_a->get_op () == un_b->get_op () + return tristate (un_a->get_op () == un_b->get_op () && structural_equality (un_a->get_arg (), un_b->get_arg ())); } -- 2.37.3
Re: [PATCH] Fortran: Add IEEE_SIGNBIT and IEEE_FMA functions
Le 11/09/2022 à 11:57, FX a écrit : As a first step, one could check the use rename lists; what's done for iso_fortran_env can be used as an example. Yes, but iso_fortran_env is handled entirely in front-end, not through external files. That's true, but the standard check doesn't really depend on that. It only needs the u->use_name for each use rename u.
Re: [PATCH] Fortran: Add IEEE_SIGNBIT and IEEE_FMA functions
Hi Mikael, > As a first step, one could check the use rename lists; what's done for > iso_fortran_env can be used as an example. Yes, but iso_fortran_env is handled entirely in front-end, not through external files. This is what I plan to do when migrating the IEEE modules to front-end, but it is a big task. > Another possibility is mimicking or modifying gfc_resolve_intrinsic, which > already does a similar job for intrinsic procedures. That’s probably the best place to put it for now, indeed. Thanks for the advice. FX
Re: [PATCH] Fortran: Add IEEE_SIGNBIT and IEEE_FMA functions
Le 10/09/2022 à 12:14, FX via Fortran a écrit : If you have a solution for the standards checking, I’ll add it. As a first step, one could check the use rename lists; what's done for iso_fortran_env can be used as an example. To diagnose the other usages, the check could be put in resolve_symbol but it would diagnose it even if not used, so one can add a check on attr.referenced (I hope it can be relied upon). Another possibility is mimicking or modifying gfc_resolve_intrinsic, which already does a similar job for intrinsic procedures. I hope this helps.
Re: [PATCH] analyzer: consider empty ranges and zero byte accesses [PR106845]
On Sun, 2022-09-11 at 10:21 +0200, Bernhard Reutner-Fischer wrote: > On 11 September 2022 10:04:51 CEST, David Malcolm via Gcc-patches > wrote: > > > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr106845.c > > > @@ -0,0 +1,11 @@ > > > +int buf_size; > > > + > > > +int > > > +main (void) > > > +{ > > > + char buf[buf_size]; > > > + > > > + __builtin_memset ([1], 0, buf_size); > > > + > > > + return 0; > > > +} > > > > ...it took me a moment to realize that the analyzer "sees" that > > this is > > "main", and thus buf_size is 0. > > Is this a valid assumption? Not always, but often. I suppose we could add an option for this. > > What if I have a lib (preloaded maybe) that sets it to 42? ...or, say, a C++ ctor for a global object that runs before main, that has side-effects (see e.g. PR analyzer/97115). > > BTW, do we handle -Wl,-init,youre_toast > where main isn't the entry point? The analyzer currently has no knowledge of this; it blithely assumes that no code runs before "main". It also doesn't report about "leaks" that happen when returning from main, whereas in theory someone could, say, implement the guts of their program in an atexit handler. I'm making assumptions in order to try to be more useful for the common cases, potentially at the expense of the less common ones. I'm not particularly familiar with the pre-main startup and post-main shutdown of a process; feel free to file a bug if you want -fanalyzer to be able to handle this kind of thing (links to pertinent docs would be helpful!) Thanks Dave > > Just curious.. > thanks, > > > > > Interestingly, if I rename it to not be "main" (and thus buf_size > > could > > be non-zero), we still don't complain: > > https://godbolt.org/z/PezfTo9Mz > > Presumably this is a known limitation of the symbolic bounds > > checking? > > > > Thanks > > Dave > > >
Re: [PATCH] analyzer: consider empty ranges and zero byte accesses [PR106845]
On 11 September 2022 10:04:51 CEST, David Malcolm via Gcc-patches wrote: >> +++ b/gcc/testsuite/gcc.dg/analyzer/pr106845.c >> @@ -0,0 +1,11 @@ >> +int buf_size; >> + >> +int >> +main (void) >> +{ >> + char buf[buf_size]; >> + >> + __builtin_memset ([1], 0, buf_size); >> + >> + return 0; >> +} > >...it took me a moment to realize that the analyzer "sees" that this is >"main", and thus buf_size is 0. Is this a valid assumption? What if I have a lib (preloaded maybe) that sets it to 42? BTW, do we handle -Wl,-init,youre_toast where main isn't the entry point? Just curious.. thanks, > >Interestingly, if I rename it to not be "main" (and thus buf_size could >be non-zero), we still don't complain: > https://godbolt.org/z/PezfTo9Mz >Presumably this is a known limitation of the symbolic bounds checking? > >Thanks >Dave >
Re: [PATCH] analyzer: consider empty ranges and zero byte accesses [PR106845]
On Sun, 2022-09-11 at 00:19 +0200, Tim Lange wrote: > Hi, > > see my patch below for a fix of pr106845. I decided to allow > bit_ranges > and byte_ranges to have a size of zero and rather only add an > assertion > to the functions that assume a non-zero size. That way is more > elegant in > the caller than restricting byte_range to only represent non-empty > ranges. Agreed. > > - Tim > > This patch adds handling of empty ranges in bit_range and byte_range > and > adds an assertion to member functions that assume a positive size. > Further, the patch fixes an ICE caused by an empty byte_range passed > to > byte_range::exceeds_p. > > Regression-tested on Linux x86_64. > Thanks - the patch is OK for trunk, though... [...snip...] > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr106845.c > @@ -0,0 +1,11 @@ > +int buf_size; > + > +int > +main (void) > +{ > + char buf[buf_size]; > + > + __builtin_memset ([1], 0, buf_size); > + > + return 0; > +} ...it took me a moment to realize that the analyzer "sees" that this is "main", and thus buf_size is 0. Interestingly, if I rename it to not be "main" (and thus buf_size could be non-zero), we still don't complain: https://godbolt.org/z/PezfTo9Mz Presumably this is a known limitation of the symbolic bounds checking? Thanks Dave