[Bug tree-optimization/94793] Failure to optimize clz idiom

2023-01-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94793

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Andrew Carlotti :

https://gcc.gnu.org/g:d347fbf774dc50bf7511f4dc6bc74547ed364995

commit r13-5193-gd347fbf774dc50bf7511f4dc6bc74547ed364995
Author: Andrew Carlotti 
Date:   Thu Nov 10 15:56:51 2022 +

Add cltz_complement idiom recognition

This recognises patterns of the form:
while (n) { n >>= 1 }

This patch results in improved (but still suboptimal) codegen:

foo (unsigned int b) {
int c = 0;

while (b) {
b >>= 1;
c++;
}

return c;
}

foo:
.LFB11:
.cfi_startproc
cbz w0, .L3
clz w1, w0
tst x0, 1
mov w0, 32
sub w0, w0, w1
cselw0, w0, wzr, ne
ret

The conditional is unnecessary. phiopt could recognise a redundant csel
(using cond_removal_in_builtin_zero_pattern) when one of the inputs is a
clz call, but it cannot recognise the redunancy when the input is (e.g.)
(32 - clz).

I could perhaps extend this function to recognise this pattern in a later
patch, if this is a good place to recognise more patterns.

gcc/ChangeLog:

PR tree-optimization/94793
* tree-scalar-evolution.cc (expression_expensive_p): Add checks
for c[lt]z optabs.
* tree-ssa-loop-niter.cc (build_cltz_expr): New.
(number_of_iterations_cltz_complement): New.
(number_of_iterations_bitcount): Add call to the above.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_clz)
(check_effective_target_clzl, check_effective_target_clzll)
(check_effective_target_ctz, check_effective_target_clzl)
(check_effective_target_ctzll): New.
* gcc.dg/tree-ssa/cltz-complement-max.c: New test.
* gcc.dg/tree-ssa/clz-complement-char.c: New test.
* gcc.dg/tree-ssa/clz-complement-int.c: New test.
* gcc.dg/tree-ssa/clz-complement-long-long.c: New test.
* gcc.dg/tree-ssa/clz-complement-long.c: New test.
* gcc.dg/tree-ssa/ctz-complement-char.c: New test.
* gcc.dg/tree-ssa/ctz-complement-int.c: New test.
* gcc.dg/tree-ssa/ctz-complement-long-long.c: New test.
* gcc.dg/tree-ssa/ctz-complement-long.c: New test.

[Bug tree-optimization/94793] Failure to optimize clz idiom

2022-05-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94793

Tamar Christina  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |andrew.carlotti at arm 
dot com

--- Comment #4 from Tamar Christina  ---
Assigning it to Andrew who'll take a crack at it in GCC 13

[Bug tree-optimization/94793] Failure to optimize clz idiom

2021-08-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94793

--- Comment #3 from Andrew Pinski  ---
*** Bug 99887 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/94793] Failure to optimize clz idiom

2020-05-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94793

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
You mean like PR82479 ?  Though, __builtin_clz is UB at zero, so it would need
to be v >>= 1; r = v ? 32 - __builtin_clz(v) : 0; v = 0; or so, except that on
aarch64 (or where we CLZ_DEFINED_VALUE_AT_ZERO as 2, i.e. guarantee if CLZ ifn
or builtins to have the behavior defined at GIMPLE rather than say just RTL or
never).  And on other targets that have CLZ_DEFINED_VALUE_AT_ZERO 1 with value
of precision, perhaps convert that conditional into just CLZ at RTL time.

[Bug tree-optimization/94793] Failure to optimize clz idiom

2020-04-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94793

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2020-04-27

--- Comment #1 from Richard Biener  ---
Confirmed.  Could be handled similar to how we handle popcount.