[Bug tree-optimization/90693] Missing popcount simplifications

2024-09-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #16 from Andrew Pinski  ---
`__builtin_popcount (x) <= 1` and `__builtin_popcount (x) > 1` are now handled.

```
x && x == (x & -x)
x && (x & x-1) == 0
```

is PR 94787 .

So closing as fixed; note I don't know when I will get around to PR 94787
because I am putting on a pause at fixing popcount issues for now.

[Bug tree-optimization/90693] Missing popcount simplifications

2024-09-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

--- Comment #15 from GCC Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:75a4143d69a842d691c14a477b0ba277d8708fc7

commit r15-3550-g75a4143d69a842d691c14a477b0ba277d8708fc7
Author: Andrew Pinski 
Date:   Thu Aug 29 12:10:44 2024 -0700

middle-end: also optimized `popcount(a) <= 1` [PR90693]

This expands on optimizing `popcount(a) == 1` to also handle
`popcount(a) <= 1`. `<= 1` can be expanded as `(a & -a) == 0`
like what is done for `== 1` if we know that a was nonzero.
We have to do the optimization in 2 places due to if we have
an optab entry for popcount or not.

Built and tested for aarch64-linux-gnu.

PR middle-end/90693

gcc/ChangeLog:

* internal-fn.cc (expand_POPCOUNT): Handle the second argument
being `-1` for `<= 1`.
* tree-ssa-math-opts.cc (match_single_bit_test): Handle LE/GT
cases.
(math_opts_dom_walker::after_dom_children): Call
match_single_bit_test
for LE_EXPR/GT_EXPR also.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt-le-1.c: New test.
* gcc.target/aarch64/popcnt-le-2.c: New test.
* gcc.target/aarch64/popcnt-le-3.c: New test.

Signed-off-by: Andrew Pinski 

[Bug tree-optimization/90693] Missing popcount simplifications

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|wilco at gcc dot gnu.org   |pinskia at gcc dot 
gnu.org

--- Comment #14 from Andrew Pinski  ---
> __builtin_popcount (x) == 1 into x == (x & -x)

Actually that should be `__builtin_popcount (x) <= 1`

Anyways I am going to implement the rest here due to PR 94787 .

[Bug tree-optimization/90693] Missing popcount simplifications

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

--- Comment #13 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #12)
> (In reply to Piotr Siupa from comment #11)
> > However, I've noticed that:
> > bool foo(unsigned x)
> > {
> > if (x == 0)
> > return true;
> > else
> > return std::has_single_bit(x);
> > }
> 
> 
> Oh that is because expand does not use flow sensitive ranges/non-zero bits
> there. There is talk about adding the ability for that but nothing has been
> done yet.

Well that also should be transformed into `__builtin_popcount(a) <= 1` which
then gets expanded into `(v & (v - 1)) == 0`. I will be handling both of those
via PR 94787 .

[Bug tree-optimization/90693] Missing popcount simplifications

2024-03-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

--- Comment #12 from Andrew Pinski  ---
(In reply to Piotr Siupa from comment #11)
> However, I've noticed that:
> bool foo(unsigned x)
> {
> if (x == 0)
> return true;
> else
> return std::has_single_bit(x);
> }


Oh that is because expand does not use flow sensitive ranges/non-zero bits
there. There is talk about adding the ability for that but nothing has been
done yet.

[Bug tree-optimization/90693] Missing popcount simplifications

2024-01-07 Thread piotrsiupa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

--- Comment #11 from Piotr Siupa  ---
Thanks! Now the generated assembly is one instruction shorter.

It works for:
bool foo(unsigned x)
{
[[assume(x != 0)]];
return std::has_single_bit(x);
}
and for:
bool foo(unsigned x)
{
if (x == 0)
std::unreachable();
else
return std::has_single_bit(x);
}

However, I've noticed that:
bool foo(unsigned x)
{
if (x == 0)
return true;
else
return std::has_single_bit(x);
}
still uses (X ^ (X - 1)) > (X - 1).

[Bug tree-optimization/90693] Missing popcount simplifications

2024-01-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

--- Comment #10 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:0152637c74c9eab7658483b1cca4e3d584dd4262

commit r14-6940-g0152637c74c9eab7658483b1cca4e3d584dd4262
Author: Jakub Jelinek 
Date:   Fri Jan 5 11:16:58 2024 +0100

Improve __builtin_popcount* (x) == 1 generation if x is known != 0
[PR90693]

We expand __builtin_popcount* (x) == 1 as
x ^ (x - 1) > x - 1, either unconditionally in tree-ssa-math-opts.cc
if we don't have a direct optab support for popcount, or during
expansion where we compare the costs of comparison of the popcount
against one vs. the above expression.
As mentioned in the PR, if we know from ranger that the argument is
not zero, we can emit x & (x - 1) == 0 test which is same number of
GIMPLE statements, but on many targets cheaper (e.g. whenever an AND
instruction can also set flags on whether result was zero or not).

The following patch does that.

2024-01-05  Jakub Jelinek  

PR tree-optimization/90693
* tree-ssa-math-opts.cc (match_single_bit_test): If
tree_expr_nonzero_p (arg), remember it in the second argument to
IFN_POPCOUNT or lower it as arg & (arg - 1) == 0 rather than
arg ^ (arg - 1) > arg - 1.
* internal-fn.cc (expand_POPCOUNT): If second argument to
IFN_POPCOUNT suggests arg is non-zero, try to expand it as
arg & (arg - 1) == 0 rather than arg ^ (arg - 1) > arg - 1.

* gcc.target/i386/pr90693-2.c: New test.

[Bug tree-optimization/90693] Missing popcount simplifications

2024-01-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

--- Comment #9 from Jakub Jelinek  ---
Created attachment 56984
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56984&action=edit
gcc14-pr90693.patch

Untested patch to do that.

[Bug tree-optimization/90693] Missing popcount simplifications

2024-01-01 Thread piotrsiupa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

Piotr Siupa  changed:

   What|Removed |Added

 CC||piotrsiupa at gmail dot com

--- Comment #8 from Piotr Siupa  ---
I did a benchmark and (x & (x-1)) == 0 and it seems to be about 2x as fast as
the currently generated code (at least on my AMD64 processor).

Maybe it should be used if x is guaranteed to not be zero, e.g. if (x == 0)
std::unreachable().

[Bug tree-optimization/90693] Missing popcount simplifications

2023-11-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
I guess now the important question is if what we produce with the patch is
faster (or at least not slower) than what we used to generate on all targets;
though, if that isn't the case, best fix might be to adjust target costs.

[Bug tree-optimization/90693] Missing popcount simplifications

2023-11-20 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

--- Comment #6 from Wilco  ---
Thanks Jakub - with the 2nd patch we get the expected sequence on AArch64:

sub x1, x0, #1
eor x0, x0, x1
cmp x0, x1
csetx0, hi

[Bug tree-optimization/90693] Missing popcount simplifications

2023-11-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:103a3966bc7b68f91b6cd3c5beb330c4b8570c90

commit r14-5613-g103a3966bc7b68f91b6cd3c5beb330c4b8570c90
Author: Jakub Jelinek 
Date:   Mon Nov 20 10:03:20 2023 +0100

tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1)
optimization for direct optab [PR90693]

On Fri, Nov 17, 2023 at 03:01:04PM +0100, Jakub Jelinek wrote:
> As a follow-up, I'm considering changing in this routine the popcount
> call to IFN_POPCOUNT with 2 arguments and during expansion test costs.

Here is the follow-up which does the rtx costs testing.

2023-11-20  Jakub Jelinek  

PR tree-optimization/90693
* tree-ssa-math-opts.cc (match_single_bit_test): Mark POPCOUNT with
result only used in equality comparison against 1 with direct optab
support as .POPCOUNT call with 2 arguments.
* internal-fn.h (expand_POPCOUNT): Declare.
* internal-fn.def (DEF_INTERNAL_INT_EXT_FN): New macro, document
it,
undefine at the end.
(POPCOUNT): Use it instead of DEF_INTERNAL_INT_FN.
* internal-fn.cc (DEF_INTERNAL_INT_EXT_FN): Define to nothing
before
inclusion to define expanders.
(expand_POPCOUNT): New function.

[Bug tree-optimization/90693] Missing popcount simplifications

2023-11-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:d0b6b7f8a6b8115b033441590a6304fb088d193c

commit r14-5612-gd0b6b7f8a6b8115b033441590a6304fb088d193c
Author: Jakub Jelinek 
Date:   Mon Nov 20 10:00:09 2023 +0100

tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1)
optimization [PR90693]

Per the earlier discussions on this PR, the following patch folds
popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=)
if the corresponding popcount optab isn't implemented (I think any
double-word popcount or call will be necessarily slower than the
above cheap 3 op check and even for -Os larger or same size).

I've noticed e.g. C++ aligned new starts with std::has_single_bit
which does popcount (x) == 1.

As a follow-up, I'm considering changing in this routine the popcount
call to IFN_POPCOUNT with 2 arguments and during expansion test costs.

2023-11-20  Jakub Jelinek  

PR tree-optimization/90693
* tree-ssa-math-opts.cc (match_single_bit_test): New function.
(math_opts_dom_walker::after_dom_children): Call it for EQ_EXPR
and NE_EXPR assignments and GIMPLE_CONDs.

* gcc.target/i386/pr90693.c: New test.

[Bug tree-optimization/90693] Missing popcount simplifications

2021-08-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Keywords||missed-optimization
  Component|middle-end  |tree-optimization