[Bug target/100637] [i386] Vectorize 4-byte vectors

2022-01-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

Uroš Bizjak  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Uroš Bizjak  ---
Let's say this is done now.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2022-01-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:b5193e352981fab8441c600b0a50efe1f30c1d30

commit r12-6533-gb5193e352981fab8441c600b0a50efe1f30c1d30
Author: Uros Bizjak 
Date:   Wed Jan 12 19:59:57 2022 +0100

i386: Add CC clobber and splits for 32-bit vector mode logic insns
[PR100673, PR103861]

Add CC clobber to 32-bit vector mode logic insns to allow variants with
general-purpose registers.  Also improve ix86_sse_movcc to emit insn with
CC clobber for narrow vector modes in order to re-enable conditional moves
for 16-bit and 32-bit narrow vector modes with -msse2.

2022-01-12  Uroš Bizjak  

gcc/ChangeLog:

PR target/100637
PR target/103861
* config/i386/i386-expand.c (ix86_emit_vec_binop): New static
function.
(ix86_expand_sse_movcc): Use ix86_emit_vec_binop instead of
gen_rtx_X
when constructing vector logic RTXes.
(expand_vec_perm_pshufb2): Ditto.
* config/i386/mmx.md (negv2qi): Disparage GPR alternative a bit.
(v2qi3): Ditto.
(vcond): Re-enable for TARGET_SSE2.
(vcondu): Ditto.
(vcond_mask_): Ditto.
(one_cmpl2): Remove expander.
(one_cmpl2): Rename from one_cmplv2qi.
Use VI_16_32 mode iterator.
(one_cmpl2 splitters): Use VI_16_32 mode iterator.
Use lowpart_subreg instead of gen_lowpart to create subreg.
(*andnot3): Merge from "*andnot" and
"*andnotv2qi3" insn patterns using VI_16_32 mode iterator.
Disparage GPR alternative a bit.  Add CC clobber.
(*andnot3 splitters): Use VI_16_32 mode iterator.
Use lowpart_subreg instead of gen_lowpart to create subreg.
(*3): Merge from
"*" and "*v2qi3" insn
patterns
using VI_16_32 mode iterator.  Disparage GPR alternative a bit.
Add CC clobber.
(*3 splitters):Use VI_16_32 mode
iterator.  Use lowpart_subreg instead of gen_lowpart to create
subreg.

gcc/testsuite/ChangeLog:

PR target/100637
PR target/103861
* g++.target/i386/pr100637-1b.C (dg-options):
Use -msse2 instead of -msse4.1.
* g++.target/i386/pr100637-1w.C (dg-options): Ditto.
* g++.target/i386/pr103861-1.C (dg-options): Ditto.
* gcc.target/i386/pr100637-4b.c (dg-options): Ditto.
* gcc.target/i386/pr103861-4.c (dg-options): Ditto.
* gcc.target/i386/pr100637-1b.c: Remove scan-assembler
directives for logic instructions.
* gcc.target/i386/pr100637-1w.c: Ditto.
* gcc.target/i386/warn-vect-op-2.c:
Update dg-warning for vector logic operation.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-07-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:663a014e77709bfbd4145c605b178169eaf334fc

commit r12-2136-g663a014e77709bfbd4145c605b178169eaf334fc
Author: Uros Bizjak 
Date:   Thu Jul 8 12:19:54 2021 +0200

i386: Add pack/unpack patterns for 32bit vectors [PR100637]

V1SI mode shift is needed to shift 32bit operands and consequently we
need to implement V1SI moves and pushes.

2021-07-08  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/i386-expand.c (ix86_expand_sse_unpack):
Handle V4QI mode.
* config/i386/mmx.md (V_32): New mode iterator.
(mov): Use V_32 mode iterator.
(*mov_internal): Ditto.
(*push2_rex64): Ditto.
(*push2): Ditto.
(movmisalign): Ditto.
(mmx_v1si3): New insn pattern.
(sse4_1_v2qiv2hi2): Ditto.
(vec_unpacks_lo_v4qi): New expander.
(vec_unpacks_hi_v4qi): Ditto.
(vec_unpacku_lo_v4qi): Ditto.
(vec_unpacku_hi_v4qi): Ditto.
* config/i386/i386.h (VALID_SSE2_REG_MODE): Add V1SImode.
(VALID_INT_MODE_P): Ditto.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-07-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

--- Comment #11 from Uroš Bizjak  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:be8749f939a933bca6de19d9cf1a510d5954c2fa

commit r12-2036-gbe8749f939a933bca6de19d9cf1a510d5954c2fa
Author: Uros Bizjak 
Date:   Mon Jul 5 21:05:10 2021 +0200

i386: Implement 4-byte vector (V4QI/V2HI) constant permutations

2021-07-05  Uroš Bizjak  

gcc/
* config/i386/i386-expand.c (ix86_split_mmx_punpck):
Handle V4QI and V2HI modes.
(expand_vec_perm_blend): Allow 4-byte vector modes with
TARGET_SSE4_1.
Handle V4QI mode. Emit mmx_pblendvb32 for 4-byte modes.
(expand_vec_perm_pshufb): Rewrite to use switch statemets.
Handle 4-byte dual operands with TARGET_XOP and single operands
with TARGET_SSSE3.  Emit mmx_ppermv32 for TARGET_XOP and
mmx_pshufbv4qi3 for TARGET_SSSE3.
(expand_vec_perm_pblendv): Allow 4-byte vector modes with
TARGET_SSE4_1.
(expand_vec_perm_interleave2): Allow 4-byte vector modes.
(expand_vec_perm_pshufb2): Allow 4-byte vector modes with
TARGET_SSSE3.
(expand_vec_perm_even_odd_1): Handle V4QI mode.
(expand_vec_perm_broadcast_1): Handle V4QI mode.
(ix86_vectorize_vec_perm_const): Handle V4QI mode.
* config/i386/mmx.md (mmx_ppermv32): New insn pattern.
(mmx_pshufbv4qi3): Ditto.
(*mmx_pblendw32): Ditto.
(*mmx_pblendw64): Rename from *mmx_pblendw.
(mmx_punpckhbw_low): New insn_and_split pattern.
(mmx_punpcklbw_low): Ditto.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-06-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:64735dc923e0a1a2e04c5313471d91ca8b954e9a

commit r12-1266-g64735dc923e0a1a2e04c5313471d91ca8b954e9a
Author: Uros Bizjak 
Date:   Mon Jun 7 22:58:15 2021 +0200

i386: Add init pattern for V4QI vectors [PR100637]

2021-06-07  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
Handle V4QI mode.
(ix86_expand_vector_init_one_nonzero): Ditto.
(ix86_expand_vector_init_one_var): Ditto.
(ix86_expand_vector_init_general): Ditto.
* config/i386/mmx.md (vec_initv4qiqi): New expander.

gcc/testsuite/

PR target/100637
* gcc.target/i386/pr100637-5b.c: New test.
* gcc.target/i386/pr100637-5w.c: Ditto.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-06-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:8d7dae0eb366a88a1baba1857ecc54c09e4a520e

commit r12-1215-g8d7dae0eb366a88a1baba1857ecc54c09e4a520e
Author: Uros Bizjak 
Date:   Fri Jun 4 17:37:15 2021 +0200

i386: Add init pattern for V2HI vectors [PR100637]

2021-06-03  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
Handle V2HI mode.
(ix86_expand_vector_init_general): Ditto.
Use SImode instead of word_mode for logic operations
when GET_MODE_SIZE (mode) < UNITS_PER_WORD.
(expand_vec_perm_even_odd_1): Assert that V2HI mode should be
implemented by expand_vec_perm_1.
(expand_vec_perm_broadcast_1): Assert that V2HI and V4HI modes
should be implemented using standard shuffle patterns.
(ix86_vectorize_vec_perm_const): Handle V2HImode.  Add V4HI and
V2HI modes to modes, implementable with shuffle for one operand.
* config/i386/mmx.md (*punpckwd): New insn_and_split pattern.
(*pshufw_1): New insn pattern.
(*vec_dupv2hi): Ditto.
(vec_initv2hihi): New expander.

gcc/testsuite/

PR target/100637
* gcc.dg/vect/slp-perm-9.c (dg-final): Adjust dumps for vect32
targets.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-06-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:5883e567564c5b3caecba0c13e8a360a14cdc846

commit r12-1197-g5883e567564c5b3caecba0c13e8a360a14cdc846
Author: Uros Bizjak 
Date:   Thu Jun 3 20:05:31 2021 +0200

i386: Add insert and extract patterns for 4-byte vectors [PR100637]

The patch introduces insert and extract patterns for 4-byte vectors.
It effectively only emits PINSR and PEXTR instructions when available,
otherwise falls back to generic code that emulates these instructions
via inserts, extracts, logic operations and shifts in integer registers.

Please note that generic fallback produces better code than the current
approach of constructing new vector in memory (due to store forwarding
stall)
so also enable QImode 8-byte vector inserts only with TARGET_SSE4_1.

2021-06-03  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/i386-expand.c (ix86_expand_vector_set):
Handle V2HI and V4QI modes.
(ix86_expand_vector_extract): Ditto.
* config/i386/mmx.md (*pinsrw): New insn pattern.
(*pinsrb): Ditto.
(*pextrw): Ditto.
(*pextrw_zext): Ditto.
(*pextrb): Ditto.
(*pextrb_zext): Ditto.
(vec_setv2hi): New expander.
(vec_extractv2hihi): Ditto.
(vec_setv4qi): Ditto.
(vec_extractv4qiqi): Ditto.

(vec_setv8qi): Enable only for TARGET_SSE4_1.
(vec_extractv8qiqi): Ditto.

gcc/testsuite/

PR target/100637
* gcc.target/i386/vperm-v2hi.c: New test.
* gcc.target/i386/vperm-v4qi.c: Ditto.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-05-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:6c67afaf524a5e0e9220f78271a0f5764ca27bd0

commit r12-1092-g6c67afaf524a5e0e9220f78271a0f5764ca27bd0
Author: Uros Bizjak 
Date:   Thu May 27 14:46:45 2021 +0200

i386: Add XOP comparisons for 4- and 8-byte vectors [PR100637]

2021-05-27  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/i386-expand.c (ix86_expand_int_sse_cmp):
For TARGET_XOP bypass SSE comparisons for all supported vector
modes.
* config/i386/mmx.md (*xop_maskcmp3): New insn
pattern.
(*xop_maskcmp3): Ditto.
(*xop_maskcmp_uns3): Ditto.
(*xop_maskcmp_uns3): Ditto.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-05-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:04ba00d4ed735242c5284d2c623a3a9d42d94742

commit r12-1085-g04ba00d4ed735242c5284d2c623a3a9d42d94742
Author: Uros Bizjak 
Date:   Thu May 27 09:22:01 2021 +0200

i386: Add uavg_ceil patterns for 4-byte vectors [PR100637]

2021-05-27  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/mmx.md (uavgv4qi3_ceil): New insn pattern.
(uavgv2hi3_ceil): Ditto.

gcc/testsuite/

PR target/100637
* gcc.target/i386/pr100637-3b.c (avgu): New test.
* gcc.target/i386/pr100637-3w.c (avgu): Ditto.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-05-21 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:2df9d3c52e6758f6640e7c0ae0b7502c7cc1d430

commit r12-973-g2df9d3c52e6758f6640e7c0ae0b7502c7cc1d430
Author: Uros Bizjak 
Date:   Fri May 21 13:03:04 2021 +0200

i386: Add comparisons for 4-byte vectors [PR100637]

2021-05-21  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/i386-expand.c (ix86_expand_sse_movcc):
Handle V4QI and V2HI modes.
(ix86_expand_sse_movcc): Ditto.
* config/i386/mmx.md (*3):
New instruction pattern.
(*eq3): Ditto.
(*gt3): Ditto.
(*xop_pcmov_): Ditto.
(mmx_pblendvb32): Ditto.
(mmx_pblendvb64): Rename from mmx_pblendvb.
(vec_cmp): New expander.
(vec_cmpu): Ditto.
(vcond): Ditto.
(vcondu): Ditto.
(vcond_mask_): Ditto.

gcc/testsuite/

PR target/100637
* g++.target/i386/pr100637-1b.C: New test.
* g++.target/i386/pr100637-1w.C: Ditto.
* gcc.target/i386/pr100637-2b.c: Ditto.
* gcc.target/i386/pr100637-2w.c: Ditto.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-05-21 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:dcde81134cb24da8e261a4346c806c676297922b

commit r12-960-gdcde81134cb24da8e261a4346c806c676297922b
Author: Uros Bizjak 
Date:   Fri May 21 08:01:34 2021 +0200

i386: Add minmax and abs patterns for 4-byte vectors [PR100637]

2021-05-21  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/mmx.md (SMAXMIN_MMXMODEI): New mode iterator.
(3): Macroize expander
from v4hi3> and 3
using SMAXMIN_MMXMODEI mode iterator.
(*v4qi3): New insn pattern.
(*v2hi3): Ditto.
(SMAXMIN_VI_32): New mode iterator.
(mode3): New expander.

(UMAXMIN_MMXMODEI): New mode iterator.
(3): Macroize expander
from v8qi3> and 3
using UMAXMIN_MMXMODEI mode iterator.
(*v4qi3): New insn pattern.
(*v2hi3): Ditto.
(UMAXMIN_VI_32): New mode iterator.
(mode3): New expander.

(abs2): New insn pattern.
(ssse3_abs2, abs2): Move from ...
* config/i386/sse.md: ... here.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-05-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:507359e1d4d18614eb9679043995edf0675b6ff5

commit r12-940-g507359e1d4d18614eb9679043995edf0675b6ff5
Author: Uros Bizjak 
Date:   Thu May 20 11:11:21 2021 +0200

i386: Add mult-high and shift patterns for 4-byte vectors [PR100637]

2021-05-20  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/mmx.md (Yv_Yw): Revert adding V4QI and V2HI modes.
(*3): Use Yw instad of 
constrint.
(mulv4hi3_highpart): New expander.
(*mulv2hi3_highpart): New insn pattern.
(mulv2hi3_higpart): New expander.
(*v2hi3): New insn pattern.
(v2hi3): New expander.
* config/i386/sse.md (smulhrsv2hi3): New expander.
(*smulhrsv2hi3): New insn pattern.

gcc/testsuite/

PR target/100637
* gcc.target/i386/pr100637-1w.c (shl, ashr, lshr): New tests.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-05-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:46ca31d65092e5afcef292f807fcf14c5363280d

commit r12-883-g46ca31d65092e5afcef292f807fcf14c5363280d
Author: Uros Bizjak 
Date:   Tue May 18 17:25:54 2021 +0200

i386: Implement 4-byte vector support [PR100637]

Add infrastructure, logic and arithmetic support for 4-byte vectors.
These can be used with SSE2 targets, where movd instructions from/to
XMM registers are available.  x86_64 ABI passes 4-byte vectors in
integer registers, so also add logic operations with integer registers.

2021-05-18  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/i386.h (VALID_SSE2_REG_MODE):
Add V4QI and V2HI modes.
(VALID_INT_MODE_P): Ditto.
* config/i386/mmx.md (VI_32): New mode iterator.
(mmxvecsize): Handle V4QI and V2HI.
(Yv_Yw): Ditto.
(mov): New expander.
(*mov_internal): New insn pattern.
(movmisalign): New expander.
(neg): New expander.
(3): New expander.
(*3): New insn pattern.
(mulv2hi3): New expander.
(*mulv2hi3): New insn pattern.
(one_cmpl2): New expander.
(*andnot3): New insn pattern.
(3): New expander.
(*3): New insn pattern.

gcc/testsuite/

PR target/100637
* gcc.target/i386/pr100637-1b.c: New test.
* gcc.target/i386/pr100637-1w.c: Ditto.

* gcc.target/i386/pr92658-avx2-2.c: Do not XFAIL scan for pmovsxbq.
* gcc.target/i386/pr92658-avx2.c: Do not XFAIL scan for pmovzxbq.
* gcc.target/i386/pr92658-avx512vl.c: Do not XFAIL scan for
vpmovdb.
* gcc.target/i386/pr92658-sse4-2.c: Do not XFAIL scan for
pmovsxbd and pmovsxwq.
* gcc.target/i386/pr92658-sse4.c: Do not XFAIL scan for
pmovzxbd and pmovzxwq.

[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-05-17 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637

Uroš Bizjak  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
   Last reconfirmed||2021-05-17
 Ever confirmed|0   |1

--- Comment #1 from Uroš Bizjak  ---
Created attachment 50822
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50822=edit
Patch that enables vectorization of 4-byte vectors

The patch introduces infrastructure to vectorize 4-byte vectors on SSE2
targets. The vectorization of logic and plus/minus instructions is
demonstrated, using -O3 -msse2 produces for the above testcases:

foo:
movd%esi, %xmm0
movd%edi, %xmm2
movd%edx, %xmm1
pandn   %xmm2, %xmm0
paddb   %xmm0, %xmm1
movd%xmm1, %eax
ret

bar_b:
movdtb(%rip), %xmm0
movdsb(%rip), %xmm1
paddb   %xmm1, %xmm0
movd%xmm0, rb(%rip)
ret

bar_w:
movdtw(%rip), %xmm0
movdsw(%rip), %xmm1
paddw   %xmm1, %xmm0
movd%xmm0, rw(%rip)
ret