[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 Uroš Bizjak changed: What|Removed |Added Target Milestone|--- |14.0 Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Uroš Bizjak --- Implemented for gcc-14.
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 --- Comment #9 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:6dd73f0f00f454a05552b008a1d56560bd3f1d4a commit r14-3471-g6dd73f0f00f454a05552b008a1d56560bd3f1d4a Author: Uros Bizjak Date: Thu Aug 24 22:23:52 2023 +0200 i386: Optimize pinsrq of 0 with index 1 into movq [PR94866] Add new pattern involving vec_merge RTX that is produced by combine from the combination of sse4_1_pinsrq and *movdi_internal: 7: r86:DI=0 8: r85:V2DI=vec_merge(vec_duplicate(r86:DI),r87:V2DI,0x2) REG_DEAD r87:V2DI REG_DEAD r86:DI Successfully matched this instruction: (set (reg:V2DI 85 [ a ]) (vec_merge:V2DI (reg:V2DI 87) (const_vector:V2DI [ (const_int 0 [0]) repeated x2 ]) (const_int 1 [0x1]))) PR target/94866 gcc/ChangeLog: * config/i386/sse.md (*sse2_movq128__1): New insn pattern. gcc/testsuite/ChangeLog: * g++.target/i386/pr94866.C: New test.
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 --- Comment #8 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #7) > (In reply to Hongtao.liu from comment #6) > > > So, the compiler still expects vec_concat/vec_select patterns to be > > > present. > > > > v2df foo_v2df (v2df x) > > { > >return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 }); > > } > > > > The testcase is not a typical vec_merge case, for vec_merge, the shuffle > > index should be {0, 3}. Here it happened to be a vec_merge because the > > second vector is all zero. And yes for this case, we still need to > > vec_concat:vec_select pattern. > > I guess the original patch is the way to go then. Yes.
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 --- Comment #7 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #6) > > So, the compiler still expects vec_concat/vec_select patterns to be present. > > v2df foo_v2df (v2df x) > { >return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 }); > } > > The testcase is not a typical vec_merge case, for vec_merge, the shuffle > index should be {0, 3}. Here it happened to be a vec_merge because the > second vector is all zero. And yes for this case, we still need to > vec_concat:vec_select pattern. I guess the original patch is the way to go then.
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 --- Comment #6 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #4) > (In reply to Hongtao.liu from comment #3) > > in x86 backend expand_vec_perm_1, we always tries vec_merge frist for > > !one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed > > which means we'd better to use vec_merge instead of vec_select:vec_concat > > when available in out backend pattern match. > > In fact, I tried to convert existing sse2_movq128 patterns to vec_merge, but > the patch regressed: > > -FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler movq > -FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler-not pxor > -FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-not pxor > -FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-times > (?n)(?:mov|psrldq).*%xmm[0-9] 12 > > So, the compiler still expects vec_concat/vec_select patterns to be present. v2df foo_v2df (v2df x) { return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 }); } The testcase is not a typical vec_merge case, for vec_merge, the shuffle index should be {0, 3}. Here it happened to be a vec_merge because the second vector is all zero. And yes for this case, we still need to vec_concat:vec_select pattern.
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 --- Comment #5 from Uroš Bizjak --- Created attachment 55778 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55778&action=edit Failing patch, for reference Patch that converts vec_concat/vec_select sse2_movq128 patterns to vec_merge.
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 --- Comment #4 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #3) > in x86 backend expand_vec_perm_1, we always tries vec_merge frist for > !one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed > which means we'd better to use vec_merge instead of vec_select:vec_concat > when available in out backend pattern match. In fact, I tried to convert existing sse2_movq128 patterns to vec_merge, but the patch regressed: -FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler movq -FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler-not pxor -FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-not pxor -FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-times (?n)(?:mov|psrldq).*%xmm[0-9] 12 So, the compiler still expects vec_concat/vec_select patterns to be present.
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #3 from Hongtao.liu --- in x86 backend expand_vec_perm_1, we always tries vec_merge frist for !one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed which means we'd better to use vec_merge instead of vec_select:vec_concat when available in out backend pattern match. Also for the view of avx512 kmask instructions, use vec_merge will help constant propagation. 20107 /* Try the SSE4.1 blend variable merge instructions. */ 20108 if (expand_vec_perm_blend (d)) 20109return true; 20110 20111 /* Try movss/movsd instructions. */ 20112 if (expand_vec_perm_movs (d)) 20113return true;
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 Uroš Bizjak changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com --- Comment #2 from Uroš Bizjak --- Created attachment 55776 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55776&action=edit Proposed patch Patch that introduces alternative MOVQ RTX definition.
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 Bug 94866 depends on bug 94864, which changed state. Bug 94864 Summary: Failure to combine vunpckhpd+movsd into single vunpckhpd https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 Richard Biener changed: What|Removed |Added Last reconfirmed||2020-04-30 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Depends on||94864 --- Comment #1 from Richard Biener --- We're expanding from _3 = BIT_INSERT_EXPR ; return _3; which ends up using (insn 8 7 9 (set (reg:V2DI 85) (vec_merge:V2DI (vec_duplicate:V2DI (reg:DI 86)) (reg:V2DI 85) (const_int 2 [0x2]))) "y.c":6:28 -1 so likely the vec_merge "issue" again. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864 [Bug 94864] Failure to combine vunpckhpd+movsd into single vunpckhpd