[Bug target/113859] popcount HI can be vectorized for non-SVE

2024-06-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859

--- Comment #4 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #3)
> Patch was posted:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650311.html

Latest patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653405.html

[Bug target/113859] popcount HI can be vectorized for non-SVE

2024-05-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||patch
URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2024-May/650
   ||311.html

--- Comment #3 from Andrew Pinski  ---
Patch was posted:
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650311.html

[Bug target/113859] popcount HI can be vectorized for non-SVE

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
   Last reconfirmed||2024-03-05

--- Comment #2 from Andrew Pinski  ---
Mine.

[Bug target/113859] popcount HI can be vectorized for non-SVE

2024-02-09 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859

--- Comment #1 from Andrew Pinski  ---
SI (and DI) can be optimized too.

LLVM is produces for int:
ldr d0, [x0]
cnt v0.8b, v0.8b
uaddlp  v0.4h, v0.8b
uaddlp  v0.2s, v0.4h
str d0, [x1]
ret

And for long:
```
ldr q0, [x0]
cnt v0.16b, v0.16b
uaddlp  v0.8h, v0.16b
uaddlp  v0.4s, v0.8h
uaddlp  v0.2d, v0.4s
str q0, [x1]
ret
```

That is for SLP version:
```
void f(unsigned long *  __restrict b, unsigned long * __restrict d)
{
d[0]  = __builtin_popcountll(b[0]);
d[1]  = __builtin_popcountll(b[1]);
}
```
s/long/int/ in the first case.

Note using SVE is better than the above if it is available and that is part of
PR 113860 though.