https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82459
--- Comment #4 from Peter Cordes ---
The VPAND instructions in the 256-bit version are a missed-optimization.
I had another look at this with current trunk. Code-gen is similar to before
with -march=skylake-avx512 -mprefer-vector-width=512.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82459
--- Comment #3 from Peter Cordes ---
I had another look at this with current trunk. Code-gen is similar to before
with -march=skylake-avx512 -mprefer-vector-width=512. (If we improve code-gen
for that choice, it will make it a win in more
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82459
Andrew Senkevich changed:
What|Removed |Added
CC||andrew.n.senkevich at gmail
dot co
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82459
--- Comment #1 from Peter Cordes ---
BTW, if we *are* using vpmovwb, it supports a memory operand. It doesn't save
any front-end uops on Skylake-avx512, just code-size. Unless it means less
efficient packing in the uop cache (since all uops