[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |15.0 Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #16 from Andrew Pinski --- `__builtin_popcount (x) <= 1` and `__builtin_popcount (x) > 1` are now handled. ``` x && x == (x & -x) x && (x & x-1) == 0 ``` is PR 94787 . So closing as fixed; note I don't know when I will get around to PR 94787 because I am putting on a pause at fixing popcount issues for now.
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 --- Comment #15 from GCC Commits --- The trunk branch has been updated by Andrew Pinski : https://gcc.gnu.org/g:75a4143d69a842d691c14a477b0ba277d8708fc7 commit r15-3550-g75a4143d69a842d691c14a477b0ba277d8708fc7 Author: Andrew Pinski Date: Thu Aug 29 12:10:44 2024 -0700 middle-end: also optimized `popcount(a) <= 1` [PR90693] This expands on optimizing `popcount(a) == 1` to also handle `popcount(a) <= 1`. `<= 1` can be expanded as `(a & -a) == 0` like what is done for `== 1` if we know that a was nonzero. We have to do the optimization in 2 places due to if we have an optab entry for popcount or not. Built and tested for aarch64-linux-gnu. PR middle-end/90693 gcc/ChangeLog: * internal-fn.cc (expand_POPCOUNT): Handle the second argument being `-1` for `<= 1`. * tree-ssa-math-opts.cc (match_single_bit_test): Handle LE/GT cases. (math_opts_dom_walker::after_dom_children): Call match_single_bit_test for LE_EXPR/GT_EXPR also. gcc/testsuite/ChangeLog: * gcc.target/aarch64/popcnt-le-1.c: New test. * gcc.target/aarch64/popcnt-le-2.c: New test. * gcc.target/aarch64/popcnt-le-3.c: New test. Signed-off-by: Andrew Pinski
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 Andrew Pinski changed: What|Removed |Added Assignee|wilco at gcc dot gnu.org |pinskia at gcc dot gnu.org --- Comment #14 from Andrew Pinski --- > __builtin_popcount (x) == 1 into x == (x & -x) Actually that should be `__builtin_popcount (x) <= 1` Anyways I am going to implement the rest here due to PR 94787 .
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 --- Comment #13 from Andrew Pinski --- (In reply to Andrew Pinski from comment #12) > (In reply to Piotr Siupa from comment #11) > > However, I've noticed that: > > bool foo(unsigned x) > > { > > if (x == 0) > > return true; > > else > > return std::has_single_bit(x); > > } > > > Oh that is because expand does not use flow sensitive ranges/non-zero bits > there. There is talk about adding the ability for that but nothing has been > done yet. Well that also should be transformed into `__builtin_popcount(a) <= 1` which then gets expanded into `(v & (v - 1)) == 0`. I will be handling both of those via PR 94787 .
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 --- Comment #12 from Andrew Pinski --- (In reply to Piotr Siupa from comment #11) > However, I've noticed that: > bool foo(unsigned x) > { > if (x == 0) > return true; > else > return std::has_single_bit(x); > } Oh that is because expand does not use flow sensitive ranges/non-zero bits there. There is talk about adding the ability for that but nothing has been done yet.
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 --- Comment #11 from Piotr Siupa --- Thanks! Now the generated assembly is one instruction shorter. It works for: bool foo(unsigned x) { [[assume(x != 0)]]; return std::has_single_bit(x); } and for: bool foo(unsigned x) { if (x == 0) std::unreachable(); else return std::has_single_bit(x); } However, I've noticed that: bool foo(unsigned x) { if (x == 0) return true; else return std::has_single_bit(x); } still uses (X ^ (X - 1)) > (X - 1).
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 --- Comment #10 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:0152637c74c9eab7658483b1cca4e3d584dd4262 commit r14-6940-g0152637c74c9eab7658483b1cca4e3d584dd4262 Author: Jakub Jelinek Date: Fri Jan 5 11:16:58 2024 +0100 Improve __builtin_popcount* (x) == 1 generation if x is known != 0 [PR90693] We expand __builtin_popcount* (x) == 1 as x ^ (x - 1) > x - 1, either unconditionally in tree-ssa-math-opts.cc if we don't have a direct optab support for popcount, or during expansion where we compare the costs of comparison of the popcount against one vs. the above expression. As mentioned in the PR, if we know from ranger that the argument is not zero, we can emit x & (x - 1) == 0 test which is same number of GIMPLE statements, but on many targets cheaper (e.g. whenever an AND instruction can also set flags on whether result was zero or not). The following patch does that. 2024-01-05 Jakub Jelinek PR tree-optimization/90693 * tree-ssa-math-opts.cc (match_single_bit_test): If tree_expr_nonzero_p (arg), remember it in the second argument to IFN_POPCOUNT or lower it as arg & (arg - 1) == 0 rather than arg ^ (arg - 1) > arg - 1. * internal-fn.cc (expand_POPCOUNT): If second argument to IFN_POPCOUNT suggests arg is non-zero, try to expand it as arg & (arg - 1) == 0 rather than arg ^ (arg - 1) > arg - 1. * gcc.target/i386/pr90693-2.c: New test.
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 --- Comment #9 from Jakub Jelinek --- Created attachment 56984 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56984&action=edit gcc14-pr90693.patch Untested patch to do that.
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 Piotr Siupa changed: What|Removed |Added CC||piotrsiupa at gmail dot com --- Comment #8 from Piotr Siupa --- I did a benchmark and (x & (x-1)) == 0 and it seems to be about 2x as fast as the currently generated code (at least on my AMD64 processor). Maybe it should be used if x is guaranteed to not be zero, e.g. if (x == 0) std::unreachable().
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- I guess now the important question is if what we produce with the patch is faster (or at least not slower) than what we used to generate on all targets; though, if that isn't the case, best fix might be to adjust target costs.
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 --- Comment #6 from Wilco --- Thanks Jakub - with the 2nd patch we get the expected sequence on AArch64: sub x1, x0, #1 eor x0, x0, x1 cmp x0, x1 csetx0, hi
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 --- Comment #5 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:103a3966bc7b68f91b6cd3c5beb330c4b8570c90 commit r14-5613-g103a3966bc7b68f91b6cd3c5beb330c4b8570c90 Author: Jakub Jelinek Date: Mon Nov 20 10:03:20 2023 +0100 tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693] On Fri, Nov 17, 2023 at 03:01:04PM +0100, Jakub Jelinek wrote: > As a follow-up, I'm considering changing in this routine the popcount > call to IFN_POPCOUNT with 2 arguments and during expansion test costs. Here is the follow-up which does the rtx costs testing. 2023-11-20 Jakub Jelinek PR tree-optimization/90693 * tree-ssa-math-opts.cc (match_single_bit_test): Mark POPCOUNT with result only used in equality comparison against 1 with direct optab support as .POPCOUNT call with 2 arguments. * internal-fn.h (expand_POPCOUNT): Declare. * internal-fn.def (DEF_INTERNAL_INT_EXT_FN): New macro, document it, undefine at the end. (POPCOUNT): Use it instead of DEF_INTERNAL_INT_FN. * internal-fn.cc (DEF_INTERNAL_INT_EXT_FN): Define to nothing before inclusion to define expanders. (expand_POPCOUNT): New function.
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 --- Comment #4 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:d0b6b7f8a6b8115b033441590a6304fb088d193c commit r14-5612-gd0b6b7f8a6b8115b033441590a6304fb088d193c Author: Jakub Jelinek Date: Mon Nov 20 10:00:09 2023 +0100 tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] Per the earlier discussions on this PR, the following patch folds popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=) if the corresponding popcount optab isn't implemented (I think any double-word popcount or call will be necessarily slower than the above cheap 3 op check and even for -Os larger or same size). I've noticed e.g. C++ aligned new starts with std::has_single_bit which does popcount (x) == 1. As a follow-up, I'm considering changing in this routine the popcount call to IFN_POPCOUNT with 2 arguments and during expansion test costs. 2023-11-20 Jakub Jelinek PR tree-optimization/90693 * tree-ssa-math-opts.cc (match_single_bit_test): New function. (math_opts_dom_walker::after_dom_children): Call it for EQ_EXPR and NE_EXPR assignments and GIMPLE_CONDs. * gcc.target/i386/pr90693.c: New test.
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Keywords||missed-optimization Component|middle-end |tree-optimization