[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #15 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:7eed861e8ca3f533e56dea6348573caa09f16f5e commit r14-4964-g7eed861e8ca3f533e56dea6348573caa09f16f5e Author: liuhongt Date: Mon Oct 23 13:40:10 2023 +0800 Support vec_cmpmn/vcondmn for v2hf/v4hf. gcc/ChangeLog: PR target/103861 * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle V2HF/V2BF/V4HF/V4BFmode. * config/i386/i386.cc (ix86_get_mask_mode): Return QImode when data_mode is V4HF/V2HFmode. * config/i386/mmx.md (vec_cmpv4hfqi): New expander. (vcond_mask_v4hi): Ditto. (vcond_mask_qi): Ditto. (vec_cmpv2hfqi): Ditto. (vcond_mask_v2hi): Ditto. (mmx_plendvb_): Add 2 combine splitters after the patterns. (mmx_pblendvb_v8qi): Ditto. (v2hi3): Add a combine splitter after the pattern. (3): Ditto. (v8qi3): Ditto. (3): Ditto. * config/i386/sse.md (vcond): Merge this with .. (vcond): .. this into .. (vcond): .. this, and extend to V8BF/V16BF/V32BFmode. gcc/testsuite/ChangeLog: * g++.target/i386/part-vect-vcondhf.C: New test. * gcc.target/i386/part-vect-vec_cmphf.c: New test.
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #14 from Uroš Bizjak --- Let's say this is done now.
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #13 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:7a7d8c3f6167fd45658ddbfa32adcfd2acc98eb4 commit r12-6562-g7a7d8c3f6167fd45658ddbfa32adcfd2acc98eb4 Author: Uros Bizjak Date: Thu Jan 13 20:48:18 2022 +0100 i386: Introduce V2QImode vectorized shifts [PR103861] Add V2QImode shift operations and split them to synthesized double HI/LO QImode operations with integer registers. Also robustify arithmetic split patterns. 2022-01-13 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/i386.md (*ashlqi_ext_2): New insn pattern. (*qi_ext_2): Ditto. * config/i386/mmx.md (v2qi): New insn_and_split pattern. gcc/testsuite/ChangeLog: PR target/103861 * gcc.target/i386/pr103861.c (shl,ashr,lshr): New tests.
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #12 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:b5193e352981fab8441c600b0a50efe1f30c1d30 commit r12-6533-gb5193e352981fab8441c600b0a50efe1f30c1d30 Author: Uros Bizjak Date: Wed Jan 12 19:59:57 2022 +0100 i386: Add CC clobber and splits for 32-bit vector mode logic insns [PR100673, PR103861] Add CC clobber to 32-bit vector mode logic insns to allow variants with general-purpose registers. Also improve ix86_sse_movcc to emit insn with CC clobber for narrow vector modes in order to re-enable conditional moves for 16-bit and 32-bit narrow vector modes with -msse2. 2022-01-12 Uroš Bizjak gcc/ChangeLog: PR target/100637 PR target/103861 * config/i386/i386-expand.c (ix86_emit_vec_binop): New static function. (ix86_expand_sse_movcc): Use ix86_emit_vec_binop instead of gen_rtx_X when constructing vector logic RTXes. (expand_vec_perm_pshufb2): Ditto. * config/i386/mmx.md (negv2qi): Disparage GPR alternative a bit. (v2qi3): Ditto. (vcond): Re-enable for TARGET_SSE2. (vcondu): Ditto. (vcond_mask_): Ditto. (one_cmpl2): Remove expander. (one_cmpl2): Rename from one_cmplv2qi. Use VI_16_32 mode iterator. (one_cmpl2 splitters): Use VI_16_32 mode iterator. Use lowpart_subreg instead of gen_lowpart to create subreg. (*andnot3): Merge from "*andnot" and "*andnotv2qi3" insn patterns using VI_16_32 mode iterator. Disparage GPR alternative a bit. Add CC clobber. (*andnot3 splitters): Use VI_16_32 mode iterator. Use lowpart_subreg instead of gen_lowpart to create subreg. (*3): Merge from "*" and "*v2qi3" insn patterns using VI_16_32 mode iterator. Disparage GPR alternative a bit. Add CC clobber. (*3 splitters):Use VI_16_32 mode iterator. Use lowpart_subreg instead of gen_lowpart to create subreg. gcc/testsuite/ChangeLog: PR target/100637 PR target/103861 * g++.target/i386/pr100637-1b.C (dg-options): Use -msse2 instead of -msse4.1. * g++.target/i386/pr100637-1w.C (dg-options): Ditto. * g++.target/i386/pr103861-1.C (dg-options): Ditto. * gcc.target/i386/pr100637-4b.c (dg-options): Ditto. * gcc.target/i386/pr103861-4.c (dg-options): Ditto. * gcc.target/i386/pr100637-1b.c: Remove scan-assembler directives for logic instructions. * gcc.target/i386/pr100637-1w.c: Ditto. * gcc.target/i386/warn-vect-op-2.c: Update dg-warning for vector logic operation.
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #11 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:820ac79e8448ad6c631e1387ba51a93dcf2b4e89 commit r12-6488-g820ac79e8448ad6c631e1387ba51a93dcf2b4e89 Author: Uros Bizjak Date: Tue Jan 11 19:23:15 2022 +0100 i386: Introduce V2QImode vector cmove for -msse4.1 [PR103861] This patch also moves V2HI and V4QImode vector conditional moves to SSE4.1 targets. Vector cmoves are implemented with SSE logic functions without -msse4.1, and they are hardly worthwile for narrow vector modes. More important, we would like to keep vector logic functions for GPR registers, and the current RTX description of 32-bit vector modes logic insns does not include the necessary CC reg clobber. Solve these issues by restricting vector cmove insns for these modes to -msse4.1, where logic instructions are avoided, and pblend insn is used instead. A follow-up patch will add clobbers and necessary splits to 32-bit vector mode logic insns, and in a future patch, ix86_sse_movcc will be improved to use expand_simple_{unop,binop} to emit logic insns, allowing us to re-enable 16-bit and 32-bit narrow vector cmoves for -msse2. 2022-01-11 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/mmx.md (vcond): Use VI_16_32 mode iterator. Enable for TARGET_SSE4_1. (vcondu): Ditto. (vcond_mask_): Ditto. (mmx_pblendvb_v8qi): Rename from mmx_pblendvb64. (mmx_pblendvb_): Rename from mmx_pblendvb32. Use VI_16_32 mode iterator. * config/i386/i386-expand.c (ix86_expand_sse_movcc): Update for rename. Handle V2QImode. (expand_vec_perm_blend): Update for rename. gcc/testsuite/ChangeLog: PR target/103861 * g++.target/i386/pr100637-1b.C (dg-options): Use -msse4 instead of -msse2. * g++.target/i386/pr100637-1w.C (dg-options): Ditto. * g++.target/i386/pr103861-1.C: New test. * gcc.target/i386/pr100637-4b.c (dg-options): Use -msse4 instead of -msse2. * gcc.target/i386/pr103861-4.c: New test.
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #10 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:04a745556021b7a1c6e81a41d0a12b60a4d9475d commit r12-6426-g04a745556021b7a1c6e81a41d0a12b60a4d9475d Author: Uros Bizjak Date: Mon Jan 10 20:59:02 2022 +0100 i386: Introduce V2QImode vector compares [PR103861] Add V2QImode vector compares with SSE registers. 2022-01-10 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/i386-expand.c (ix86_expand_int_sse_cmp): Handle V2QImode. * config/i386/mmx.md (3): Use VI1_16_32 mode iterator. (*eq3): Ditto. (*gt3): Ditto. (*xop_maskcmp3): Ditto. (*xop_maskcmp_uns3): Ditto. (vec_cmp): Ditto. (vec_cmpu): Ditto. gcc/testsuite/ChangeLog: PR target/103861 * gcc.target/i386/pr103861-2.c: New test.
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #9 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:c166632bd22d7da66354121502019fc9c92ef07f commit r12-6273-gc166632bd22d7da66354121502019fc9c92ef07f Author: Uros Bizjak Date: Wed Jan 5 23:16:34 2022 +0100 i386: Introduce V2QImode minmax, abs and uavgv2hi3_ceil [PR103861] Add V2QImode minmax, abs and uavxv2qi3_ceil operations with SSE registers. 2022-01-05 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/mmx.md (VI_16_32): New mode iterator. (VI1_16_32): Ditto. (mmxvecsize): Handle V2QI mode. (3): Rename from v4qi3. Use VI1_16_32 mode iterator. (3): Rename from v4qi3. Use VI1_16_32 mode iterator. (abs2): Use VI_16_32 mode iterator. (uavgv2qi3_ceil): New insn pattern. gcc/testsuite/ChangeLog: PR target/103861 * gcc.target/i386/pr103861-3.c: New test. * g++.dg/vect/slp-pr98855.cc (dg-final): Check that no vectorization using SLP was performed.
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #8 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:708b87dcb6e48cb48d170a4b3625088995377a5c commit r12-6215-g708b87dcb6e48cb48d170a4b3625088995377a5c Author: Uros Bizjak Date: Tue Jan 4 19:41:47 2022 +0100 i386: Introduce V2QImode vectorized logic [PR103861] Add V2QImode logic operations with SSE and GP registers and split them to V4QImode SSE instructions or SImode GP instructions. The patch also fixes PR target/103900. 2022-01-04 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/mmx.md (one_cmplv2qi3): New insn pattern. (one_cmplv2qi3 splitters): New post-reload splitters. (*andnotv2qi3): New insn pattern. (andnotv2qi3 splitters): New post-reload splitters. (v2qi3): New insn pattern. (v2qi3 splitters): New post-reload splitters. gcc/testsuite/ChangeLog: PR target/103861 * gcc.target/i386/warn-vect-op-2.c: Adjust warnings. * gcc.target/i386/pr103900.c: New test.
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #7 from Uroš Bizjak --- (In reply to Richard Biener from comment #6) > Not fully fixed I guess? Not yet. I have a bunch of follow-up patches for various operations.
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #6 from Richard Biener --- Not fully fixed I guess?
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #5 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:9ff206d3865df5cb8407490aa9481029beac087f commit r12-6173-g9ff206d3865df5cb8407490aa9481029beac087f Author: Uros Bizjak Date: Sun Jan 2 21:12:10 2022 +0100 i386: Introduce V2QImode vectorized arithmetic [PR103861] This patch adds basic V2QImode infrastructure and V2QImode arithmetic operations (plus, minus and neg). The patched compiler can emit SSE vectorized QImode operations (e.g. PADDB) with partial QImode vector, and also synthesized double HI/LO QImode operations with integer registers. The testcase: typedef char __v2qi __attribute__ ((__vector_size__ (2))); __v2qi plus (__v2qi a, __v2qi b) { return a + b; }; compiles with -O2 to: movl%edi, %edx movl%esi, %eax addb%sil, %dl addb%ah, %dh movl%edx, %eax ret which is much better than what the unpatched compiler produces: movl%edi, %eax movl%esi, %edx xorl%ecx, %ecx movb%dil, %cl movsbl %dh, %edx movsbl %ah, %eax addl%edx, %eax addb%sil, %cl movb%al, %ch movl%ecx, %eax ret The V2QImode vectorization does not require vector registers, so it can be enabled by default also for 32-bit targets without SSE. The patch also enables vectorized V2QImode sign/zero extends. 2021-12-30 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/i386.h (VALID_SSE2_REG_MODE): Add V2QImode. (VALID_INT_MODE_P): Ditto. * config/i386/i386.c (ix86_secondary_reload): Handle V2QImode reloads from SSE register to memory. (vector_mode_supported_p): Always return true for V2QImode. * config/i386/i386.md (*subqi_ext_2): New insn pattern. (*negqi_ext_2): Ditto. * config/i386/mmx.md (movv2qi): New expander. (movmisalignv2qi): Ditto. (*movv2qi_internal): New insn pattern. (*pushv2qi2): Ditto. (negv2qi2 and splitters): Ditto. (v2qi3 and splitters): Ditto. gcc/testsuite/ChangeLog: PR target/103861 * gcc.dg/store_merging_18.c (dg-options): Add -fno-tree-vectorize. * gcc.dg/store_merging_29.c (dg-options): Ditto. * gcc.target/i386/pr103861.c: New test. * gcc.target/i386/pr92658-avx512vl.c (dg-final): Remove vpmovqb scan-assembler xfail. * gcc.target/i386/pr92658-sse4.c (dg-final): Remove pmovzxbq scan-assembler xfail. * gcc.target/i386/pr92658-sse4-2.c (dg-final): Remove pmovsxbq scan-assembler xfail. * gcc.target/i386/warn-vect-op-2.c (dg-warning): Adjust warnings.
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization Last reconfirmed||2021-12-29 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Severity|normal |enhancement --- Comment #4 from Andrew Pinski --- Confirmed.
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #3 from Uroš Bizjak --- The patched compiler compiles the testcase from Comment #0 on x86_64 with -O2 to: plus: movl%edi, %edx movl%esi, %eax addb%sil, %dl addb%ah, %dh movl%edx, %eax ret and the testcase from Comment #1 to: foo: movzwl a(%rip), %edx movzwl b(%rip), %eax addb%dl, %al addb%dh, %ah movw%ax, r(%rip) ret Some additional examples: char r[2], a[2], b[2]; void maxs (void) { int i; for (i = 0; i < 2; i++) r[i] = a[i] > b[i] ? a[i] : b[i]; } compiles with -O2 -msse4 to: maxs: pinsrw $0, b(%rip), %xmm0 pinsrw $0, a(%rip), %xmm1 pmaxsb %xmm1, %xmm0 pextrw $0, %xmm0, r(%rip) ret
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #2 from Uroš Bizjak --- Created attachment 52087 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52087=edit Protorypw patch to vectorize with v2qi vectors Patch that implmenents V2QI moves, logic and basic arithmetic operations.
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #1 from Uroš Bizjak --- Also: char r[2], a[2], b[2]; void foo (void) { int i; for (i = 0; i < 2; i++) r[i] = a[i] + b[i]; }