[Bug target/80355] Improve __builtin_shuffle on AVX512F

2021-08-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #6 from Jakub Jelinek  ---
Fixed.

[Bug target/80355] Improve __builtin_shuffle on AVX512F

2021-08-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:50b5877925ef5ae8e9f913d6d2b5ce0204ebc588

commit r12-2837-g50b5877925ef5ae8e9f913d6d2b5ce0204ebc588
Author: Jakub Jelinek 
Date:   Tue Aug 10 12:38:00 2021 +0200

i386: Allow some V32HImode and V64QImode permutations even without AVX512BW
[PR80355]

When working on the PR, I've noticed we generate terrible code for
V32HImode or V64QImode permutations for -mavx512f -mno-avx512bw.
Generally we can't do much with such permutations, but since PR68655
we can handle at least some, those expressible using V16SImode or V8DImode
permutations, but that wasn't reachable, because
ix86_vectorize_vec_perm_const
didn't even try, it said without TARGET_AVX512BW it can't do anything, and
with it can do everything, no d.testing_p attempts.

This patch makes it try it for TARGET_AVX512F && !TARGET_AVX512BW.

The first hunk is to avoid ICE, expand_vec_perm_even_odd_1 asserts d->vmode
isn't V32HImode because expand_vec_perm_1 for AVX512BW handles already
all permutations, but when we let it through without !TARGET_AVX512BW,
expand_vec_perm_1 doesn't handle it.

If we want, that hunk can be dropped if we implement in
expand_vec_perm_even_odd_1 and its helper the even permutation as
vpmovdw + vpmovdw + vinserti64x4 and odd permutation as
vpsrld $16 + vpsrld $16 + vpmovdw + vpmovdw + vinserti64x4.

2021-08-10  Jakub Jelinek  

PR target/80355
* config/i386/i386-expand.c (expand_vec_perm_even_odd): Return
false
for V32HImode if !TARGET_AVX512BW.
(ix86_vectorize_vec_perm_const) :
If !TARGET_AVX512BW and TARGET_AVX512F and d.testing_p, don't fail
early, but actually check the permutation.

* gcc.target/i386/avx512f-pr80355-2.c: New test.

[Bug target/80355] Improve __builtin_shuffle on AVX512F

2021-08-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:7665af0b1a964b1baae3a59b22fcc420369c63cf

commit r12-2835-g7665af0b1a964b1baae3a59b22fcc420369c63cf
Author: Jakub Jelinek 
Date:   Tue Aug 10 11:34:53 2021 +0200

i386: Improve single operand AVX512F permutations [PR80355]

On the following testcase we emit
vmovdqa32   .LC0(%rip), %zmm1
vpermd  %zmm0, %zmm1, %zmm0
and
vmovdqa64   .LC1(%rip), %zmm1
vpermq  %zmm0, %zmm1, %zmm0
instead of
vshufi32x4  $78, %zmm0, %zmm0, %zmm0
and
vshufi64x2  $78, %zmm0, %zmm0, %zmm0
we can emit with the patch.  We have patterns that match two argument
permutations for vshuf[if]*, but for one argument it doesn't trigger.
Either we can add two patterns for that, or we would need to add another
routine to i386-expand.c that would transform under certain condition
these cases to the two argument vshuf*, doing it in sse.md looked simpler.
We don't need this for 32-byte vectors, we already emit single insn
permutation that doesn't need memory op there.

2021-08-10  Jakub Jelinek  

PR target/80355
* config/i386/sse.md
(*avx512f_shuf_64x2_1_1,
*avx512f_shuf_32x4_1_1): New define_insn
patterns.

* gcc.target/i386/avx512f-pr80355-1.c: New test.

[Bug target/80355] Improve __builtin_shuffle on AVX512F

2021-08-09 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355

--- Comment #3 from Jakub Jelinek  ---
Created attachment 51278
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51278=edit
gcc12-pr80355-2.patch

And this incremental patch makes it handle even similar V32HImode/V64QImode
permutations with -mavx512f -mno-avx512bw.

[Bug target/80355] Improve __builtin_shuffle on AVX512F

2021-08-09 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355

--- Comment #2 from Jakub Jelinek  ---
Created attachment 51277
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51277=edit
gcc12-pr80355-1.patch

Untested fix.  For 32-byte vectors/AVX512VL we don't need this, we already emit
vperm2i128 or vpermq.

[Bug target/80355] Improve __builtin_shuffle on AVX512F

2021-08-07 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

[Bug target/80355] Improve __builtin_shuffle on AVX512F

2021-08-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2021-08-07
 Status|UNCONFIRMED |NEW
   Severity|normal  |enhancement
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
Confirmed.