[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #14 from Uroš Bizjak --- Let's say this is done now.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 --- Comment #13 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:b5193e352981fab8441c600b0a50efe1f30c1d30 commit r12-6533-gb5193e352981fab8441c600b0a50efe1f30c1d30 Author: Uros Bizjak Date: Wed Jan 12 19:59:57 2022 +0100 i386: Add CC clobber and splits for 32-bit vector mode logic insns [PR100673, PR103861] Add CC clobber to 32-bit vector mode logic insns to allow variants with general-purpose registers. Also improve ix86_sse_movcc to emit insn with CC clobber for narrow vector modes in order to re-enable conditional moves for 16-bit and 32-bit narrow vector modes with -msse2. 2022-01-12 Uroš Bizjak gcc/ChangeLog: PR target/100637 PR target/103861 * config/i386/i386-expand.c (ix86_emit_vec_binop): New static function. (ix86_expand_sse_movcc): Use ix86_emit_vec_binop instead of gen_rtx_X when constructing vector logic RTXes. (expand_vec_perm_pshufb2): Ditto. * config/i386/mmx.md (negv2qi): Disparage GPR alternative a bit. (v2qi3): Ditto. (vcond): Re-enable for TARGET_SSE2. (vcondu): Ditto. (vcond_mask_): Ditto. (one_cmpl2): Remove expander. (one_cmpl2): Rename from one_cmplv2qi. Use VI_16_32 mode iterator. (one_cmpl2 splitters): Use VI_16_32 mode iterator. Use lowpart_subreg instead of gen_lowpart to create subreg. (*andnot3): Merge from "*andnot" and "*andnotv2qi3" insn patterns using VI_16_32 mode iterator. Disparage GPR alternative a bit. Add CC clobber. (*andnot3 splitters): Use VI_16_32 mode iterator. Use lowpart_subreg instead of gen_lowpart to create subreg. (*3): Merge from "*" and "*v2qi3" insn patterns using VI_16_32 mode iterator. Disparage GPR alternative a bit. Add CC clobber. (*3 splitters):Use VI_16_32 mode iterator. Use lowpart_subreg instead of gen_lowpart to create subreg. gcc/testsuite/ChangeLog: PR target/100637 PR target/103861 * g++.target/i386/pr100637-1b.C (dg-options): Use -msse2 instead of -msse4.1. * g++.target/i386/pr100637-1w.C (dg-options): Ditto. * g++.target/i386/pr103861-1.C (dg-options): Ditto. * gcc.target/i386/pr100637-4b.c (dg-options): Ditto. * gcc.target/i386/pr103861-4.c (dg-options): Ditto. * gcc.target/i386/pr100637-1b.c: Remove scan-assembler directives for logic instructions. * gcc.target/i386/pr100637-1w.c: Ditto. * gcc.target/i386/warn-vect-op-2.c: Update dg-warning for vector logic operation.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 --- Comment #12 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:663a014e77709bfbd4145c605b178169eaf334fc commit r12-2136-g663a014e77709bfbd4145c605b178169eaf334fc Author: Uros Bizjak Date: Thu Jul 8 12:19:54 2021 +0200 i386: Add pack/unpack patterns for 32bit vectors [PR100637] V1SI mode shift is needed to shift 32bit operands and consequently we need to implement V1SI moves and pushes. 2021-07-08 Uroš Bizjak gcc/ PR target/100637 * config/i386/i386-expand.c (ix86_expand_sse_unpack): Handle V4QI mode. * config/i386/mmx.md (V_32): New mode iterator. (mov): Use V_32 mode iterator. (*mov_internal): Ditto. (*push2_rex64): Ditto. (*push2): Ditto. (movmisalign): Ditto. (mmx_v1si3): New insn pattern. (sse4_1_v2qiv2hi2): Ditto. (vec_unpacks_lo_v4qi): New expander. (vec_unpacks_hi_v4qi): Ditto. (vec_unpacku_lo_v4qi): Ditto. (vec_unpacku_hi_v4qi): Ditto. * config/i386/i386.h (VALID_SSE2_REG_MODE): Add V1SImode. (VALID_INT_MODE_P): Ditto.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 --- Comment #11 from Uroš Bizjak --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:be8749f939a933bca6de19d9cf1a510d5954c2fa commit r12-2036-gbe8749f939a933bca6de19d9cf1a510d5954c2fa Author: Uros Bizjak Date: Mon Jul 5 21:05:10 2021 +0200 i386: Implement 4-byte vector (V4QI/V2HI) constant permutations 2021-07-05 Uroš Bizjak gcc/ * config/i386/i386-expand.c (ix86_split_mmx_punpck): Handle V4QI and V2HI modes. (expand_vec_perm_blend): Allow 4-byte vector modes with TARGET_SSE4_1. Handle V4QI mode. Emit mmx_pblendvb32 for 4-byte modes. (expand_vec_perm_pshufb): Rewrite to use switch statemets. Handle 4-byte dual operands with TARGET_XOP and single operands with TARGET_SSSE3. Emit mmx_ppermv32 for TARGET_XOP and mmx_pshufbv4qi3 for TARGET_SSSE3. (expand_vec_perm_pblendv): Allow 4-byte vector modes with TARGET_SSE4_1. (expand_vec_perm_interleave2): Allow 4-byte vector modes. (expand_vec_perm_pshufb2): Allow 4-byte vector modes with TARGET_SSSE3. (expand_vec_perm_even_odd_1): Handle V4QI mode. (expand_vec_perm_broadcast_1): Handle V4QI mode. (ix86_vectorize_vec_perm_const): Handle V4QI mode. * config/i386/mmx.md (mmx_ppermv32): New insn pattern. (mmx_pshufbv4qi3): Ditto. (*mmx_pblendw32): Ditto. (*mmx_pblendw64): Rename from *mmx_pblendw. (mmx_punpckhbw_low): New insn_and_split pattern. (mmx_punpcklbw_low): Ditto.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 --- Comment #10 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:64735dc923e0a1a2e04c5313471d91ca8b954e9a commit r12-1266-g64735dc923e0a1a2e04c5313471d91ca8b954e9a Author: Uros Bizjak Date: Mon Jun 7 22:58:15 2021 +0200 i386: Add init pattern for V4QI vectors [PR100637] 2021-06-07 Uroš Bizjak gcc/ PR target/100637 * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate): Handle V4QI mode. (ix86_expand_vector_init_one_nonzero): Ditto. (ix86_expand_vector_init_one_var): Ditto. (ix86_expand_vector_init_general): Ditto. * config/i386/mmx.md (vec_initv4qiqi): New expander. gcc/testsuite/ PR target/100637 * gcc.target/i386/pr100637-5b.c: New test. * gcc.target/i386/pr100637-5w.c: Ditto.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 --- Comment #9 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:8d7dae0eb366a88a1baba1857ecc54c09e4a520e commit r12-1215-g8d7dae0eb366a88a1baba1857ecc54c09e4a520e Author: Uros Bizjak Date: Fri Jun 4 17:37:15 2021 +0200 i386: Add init pattern for V2HI vectors [PR100637] 2021-06-03 Uroš Bizjak gcc/ PR target/100637 * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate): Handle V2HI mode. (ix86_expand_vector_init_general): Ditto. Use SImode instead of word_mode for logic operations when GET_MODE_SIZE (mode) < UNITS_PER_WORD. (expand_vec_perm_even_odd_1): Assert that V2HI mode should be implemented by expand_vec_perm_1. (expand_vec_perm_broadcast_1): Assert that V2HI and V4HI modes should be implemented using standard shuffle patterns. (ix86_vectorize_vec_perm_const): Handle V2HImode. Add V4HI and V2HI modes to modes, implementable with shuffle for one operand. * config/i386/mmx.md (*punpckwd): New insn_and_split pattern. (*pshufw_1): New insn pattern. (*vec_dupv2hi): Ditto. (vec_initv2hihi): New expander. gcc/testsuite/ PR target/100637 * gcc.dg/vect/slp-perm-9.c (dg-final): Adjust dumps for vect32 targets.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 --- Comment #8 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:5883e567564c5b3caecba0c13e8a360a14cdc846 commit r12-1197-g5883e567564c5b3caecba0c13e8a360a14cdc846 Author: Uros Bizjak Date: Thu Jun 3 20:05:31 2021 +0200 i386: Add insert and extract patterns for 4-byte vectors [PR100637] The patch introduces insert and extract patterns for 4-byte vectors. It effectively only emits PINSR and PEXTR instructions when available, otherwise falls back to generic code that emulates these instructions via inserts, extracts, logic operations and shifts in integer registers. Please note that generic fallback produces better code than the current approach of constructing new vector in memory (due to store forwarding stall) so also enable QImode 8-byte vector inserts only with TARGET_SSE4_1. 2021-06-03 Uroš Bizjak gcc/ PR target/100637 * config/i386/i386-expand.c (ix86_expand_vector_set): Handle V2HI and V4QI modes. (ix86_expand_vector_extract): Ditto. * config/i386/mmx.md (*pinsrw): New insn pattern. (*pinsrb): Ditto. (*pextrw): Ditto. (*pextrw_zext): Ditto. (*pextrb): Ditto. (*pextrb_zext): Ditto. (vec_setv2hi): New expander. (vec_extractv2hihi): Ditto. (vec_setv4qi): Ditto. (vec_extractv4qiqi): Ditto. (vec_setv8qi): Enable only for TARGET_SSE4_1. (vec_extractv8qiqi): Ditto. gcc/testsuite/ PR target/100637 * gcc.target/i386/vperm-v2hi.c: New test. * gcc.target/i386/vperm-v4qi.c: Ditto.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 --- Comment #7 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:6c67afaf524a5e0e9220f78271a0f5764ca27bd0 commit r12-1092-g6c67afaf524a5e0e9220f78271a0f5764ca27bd0 Author: Uros Bizjak Date: Thu May 27 14:46:45 2021 +0200 i386: Add XOP comparisons for 4- and 8-byte vectors [PR100637] 2021-05-27 Uroš Bizjak gcc/ PR target/100637 * config/i386/i386-expand.c (ix86_expand_int_sse_cmp): For TARGET_XOP bypass SSE comparisons for all supported vector modes. * config/i386/mmx.md (*xop_maskcmp3): New insn pattern. (*xop_maskcmp3): Ditto. (*xop_maskcmp_uns3): Ditto. (*xop_maskcmp_uns3): Ditto.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 --- Comment #6 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:04ba00d4ed735242c5284d2c623a3a9d42d94742 commit r12-1085-g04ba00d4ed735242c5284d2c623a3a9d42d94742 Author: Uros Bizjak Date: Thu May 27 09:22:01 2021 +0200 i386: Add uavg_ceil patterns for 4-byte vectors [PR100637] 2021-05-27 Uroš Bizjak gcc/ PR target/100637 * config/i386/mmx.md (uavgv4qi3_ceil): New insn pattern. (uavgv2hi3_ceil): Ditto. gcc/testsuite/ PR target/100637 * gcc.target/i386/pr100637-3b.c (avgu): New test. * gcc.target/i386/pr100637-3w.c (avgu): Ditto.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 --- Comment #5 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:2df9d3c52e6758f6640e7c0ae0b7502c7cc1d430 commit r12-973-g2df9d3c52e6758f6640e7c0ae0b7502c7cc1d430 Author: Uros Bizjak Date: Fri May 21 13:03:04 2021 +0200 i386: Add comparisons for 4-byte vectors [PR100637] 2021-05-21 Uroš Bizjak gcc/ PR target/100637 * config/i386/i386-expand.c (ix86_expand_sse_movcc): Handle V4QI and V2HI modes. (ix86_expand_sse_movcc): Ditto. * config/i386/mmx.md (*3): New instruction pattern. (*eq3): Ditto. (*gt3): Ditto. (*xop_pcmov_): Ditto. (mmx_pblendvb32): Ditto. (mmx_pblendvb64): Rename from mmx_pblendvb. (vec_cmp): New expander. (vec_cmpu): Ditto. (vcond): Ditto. (vcondu): Ditto. (vcond_mask_): Ditto. gcc/testsuite/ PR target/100637 * g++.target/i386/pr100637-1b.C: New test. * g++.target/i386/pr100637-1w.C: Ditto. * gcc.target/i386/pr100637-2b.c: Ditto. * gcc.target/i386/pr100637-2w.c: Ditto.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 --- Comment #4 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:dcde81134cb24da8e261a4346c806c676297922b commit r12-960-gdcde81134cb24da8e261a4346c806c676297922b Author: Uros Bizjak Date: Fri May 21 08:01:34 2021 +0200 i386: Add minmax and abs patterns for 4-byte vectors [PR100637] 2021-05-21 Uroš Bizjak gcc/ PR target/100637 * config/i386/mmx.md (SMAXMIN_MMXMODEI): New mode iterator. (3): Macroize expander from v4hi3> and 3 using SMAXMIN_MMXMODEI mode iterator. (*v4qi3): New insn pattern. (*v2hi3): Ditto. (SMAXMIN_VI_32): New mode iterator. (mode3): New expander. (UMAXMIN_MMXMODEI): New mode iterator. (3): Macroize expander from v8qi3> and 3 using UMAXMIN_MMXMODEI mode iterator. (*v4qi3): New insn pattern. (*v2hi3): Ditto. (UMAXMIN_VI_32): New mode iterator. (mode3): New expander. (abs2): New insn pattern. (ssse3_abs2, abs2): Move from ... * config/i386/sse.md: ... here.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 --- Comment #3 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:507359e1d4d18614eb9679043995edf0675b6ff5 commit r12-940-g507359e1d4d18614eb9679043995edf0675b6ff5 Author: Uros Bizjak Date: Thu May 20 11:11:21 2021 +0200 i386: Add mult-high and shift patterns for 4-byte vectors [PR100637] 2021-05-20 Uroš Bizjak gcc/ PR target/100637 * config/i386/mmx.md (Yv_Yw): Revert adding V4QI and V2HI modes. (*3): Use Yw instad of constrint. (mulv4hi3_highpart): New expander. (*mulv2hi3_highpart): New insn pattern. (mulv2hi3_higpart): New expander. (*v2hi3): New insn pattern. (v2hi3): New expander. * config/i386/sse.md (smulhrsv2hi3): New expander. (*smulhrsv2hi3): New insn pattern. gcc/testsuite/ PR target/100637 * gcc.target/i386/pr100637-1w.c (shl, ashr, lshr): New tests.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 --- Comment #2 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:46ca31d65092e5afcef292f807fcf14c5363280d commit r12-883-g46ca31d65092e5afcef292f807fcf14c5363280d Author: Uros Bizjak Date: Tue May 18 17:25:54 2021 +0200 i386: Implement 4-byte vector support [PR100637] Add infrastructure, logic and arithmetic support for 4-byte vectors. These can be used with SSE2 targets, where movd instructions from/to XMM registers are available. x86_64 ABI passes 4-byte vectors in integer registers, so also add logic operations with integer registers. 2021-05-18 Uroš Bizjak gcc/ PR target/100637 * config/i386/i386.h (VALID_SSE2_REG_MODE): Add V4QI and V2HI modes. (VALID_INT_MODE_P): Ditto. * config/i386/mmx.md (VI_32): New mode iterator. (mmxvecsize): Handle V4QI and V2HI. (Yv_Yw): Ditto. (mov): New expander. (*mov_internal): New insn pattern. (movmisalign): New expander. (neg): New expander. (3): New expander. (*3): New insn pattern. (mulv2hi3): New expander. (*mulv2hi3): New insn pattern. (one_cmpl2): New expander. (*andnot3): New insn pattern. (3): New expander. (*3): New insn pattern. gcc/testsuite/ PR target/100637 * gcc.target/i386/pr100637-1b.c: New test. * gcc.target/i386/pr100637-1w.c: Ditto. * gcc.target/i386/pr92658-avx2-2.c: Do not XFAIL scan for pmovsxbq. * gcc.target/i386/pr92658-avx2.c: Do not XFAIL scan for pmovzxbq. * gcc.target/i386/pr92658-avx512vl.c: Do not XFAIL scan for vpmovdb. * gcc.target/i386/pr92658-sse4-2.c: Do not XFAIL scan for pmovsxbd and pmovsxwq. * gcc.target/i386/pr92658-sse4.c: Do not XFAIL scan for pmovzxbd and pmovzxwq.
[Bug target/100637] [i386] Vectorize 4-byte vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100637 Uroš Bizjak changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Last reconfirmed||2021-05-17 Ever confirmed|0 |1 --- Comment #1 from Uroš Bizjak --- Created attachment 50822 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50822=edit Patch that enables vectorization of 4-byte vectors The patch introduces infrastructure to vectorize 4-byte vectors on SSE2 targets. The vectorization of logic and plus/minus instructions is demonstrated, using -O3 -msse2 produces for the above testcases: foo: movd%esi, %xmm0 movd%edi, %xmm2 movd%edx, %xmm1 pandn %xmm2, %xmm0 paddb %xmm0, %xmm1 movd%xmm1, %eax ret bar_b: movdtb(%rip), %xmm0 movdsb(%rip), %xmm1 paddb %xmm1, %xmm0 movd%xmm0, rb(%rip) ret bar_w: movdtw(%rip), %xmm0 movdsw(%rip), %xmm1 paddw %xmm1, %xmm0 movd%xmm0, rw(%rip) ret