https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
Andrew Pinski changed:
What|Removed |Added
Ever confirmed|0 |1
Assignee|unassigned at gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #14
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
--- Comment #13 from Andrew Pinski ---
For aarch64, we get:
is_power2_popcnt(int):
fmovs0, w0
cnt v0.8b, v0.8b
addvb0, v0.8b
fmovw0, s0
cmp w0, 1
csetw0, eq
ret
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
--- Comment #12 from Jonathan Wakely ---
r11-4843 should have removed the small difference between the std and popcount
benchmarks.
I still see a small advantage to arithmetic for skylake i7-6700 CPU @ 3.40GHz
and i7-8650U CPU @ 1.90GHz though.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
--- Comment #11 from Hongtao.liu ---
(In reply to gcc-bugs from comment #10)
> And maybe a related question:
>
> I know that an arithmetic implementation might auto-vectorize, but would a
> popcount implementation do that too?
>
> Since
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
--- Comment #10 from gcc-bugs at marehr dot dialup.fu-berlin.de ---
And maybe a related question:
I know that an arithmetic implementation might auto-vectorize, but would a
popcount implementation do that too?
Since AVX512_BITALG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
--- Comment #9 from gcc-bugs at marehr dot dialup.fu-berlin.de ---
Thank you for so many responses
(In reply to Thomas Koenig from comment #1)
> Could you post the benchmark and the exact architecture where the arithmetic
> version is faster?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
--- Comment #7 from gcc-bugs at marehr dot dialup.fu-berlin.de ---
Created attachment 49530
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49530=edit
CMakeLists.txt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
--- Comment #8 from gcc-bugs at marehr dot dialup.fu-berlin.de ---
Created attachment 49531
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49531=edit
has_single_bit_benchmark.cpp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
--- Comment #6 from Jonathan Wakely ---
As an aside, libstdc++ does already use the ((x-1) & x) == 0 idiom in
where we are happy for zero to be treated as a power
of two (because we call _Power_of_2(n+1) and we want the result to be true for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
--- Comment #5 from Jakub Jelinek ---
Unfortunately we don't TER calls, so the expander doesn't see POPCOUNT (x) == 1
or POPCOUNT (x) <= 1
and the isel pass seems to be (at least so far) extremely vector specific;
another option is do it in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
Jakub Jelinek changed:
What|Removed |Added
CC||jakub at gcc dot gnu.org
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
--- Comment #3 from Hongtao.liu ---
for testcase:
---
#include
bool
is_power2_popcnt (int a)
{
return __builtin_popcount (a) == 1;
}
bool
is_power2_arithmetic (int a)
{
return !(a & (a - 1)) && a;
}
---
gcc -O2 -mavx2 -S got
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
Richard Biener changed:
What|Removed |Added
CC||crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
Thomas Koenig changed:
What|Removed |Added
Keywords||missed-optimization
15 matches
Mail list logo