https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81602
Hongtao Liu changed:
What|Removed |Added
CC||liuhongt at gcc dot gnu.org
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81602
--- Comment #4 from Andrew Pinski ---
Interesting clang does:
```
movzx ecx, word ptr [rdi + 2*rax]
popcnt ecx, ecx
lea rsi, [rsi + 2*rcx]
```
While GCC 14+ does:
```
xor eax, eax
add rdi,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81602
Andrew Pinski changed:
What|Removed |Added
Severity|normal |enhancement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81602
--- Comment #3 from Peter Cordes ---
Forgot to mention: memory-source popcnt with an indexed addressing mode would
also be worse on SnB/IvB: it can't stay micro-fused, so the front-end
un-laminates it in the issue stage.
Haswell and later can ke
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81602
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #2 fro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81602
--- Comment #1 from Uroš Bizjak ---
(In reply to Christoph Diegelmann from comment #0)
> GCC misses an optimization on this:
>
> #include
> #include "immintrin.h"
>
> void test(std::uint16_t* mask, std::uint16_t* data) {
> for (int i = 0;