[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #11 from howarth at nitro dot med dot uc dot edu 2008-11-15 23:59 --- This test case fails at -m64 on i686-apple-darwin9 in current gcc trunk. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #12 from howarth at nitro dot med dot uc dot edu 2008-11-16 00:01 --- Created an attachment (id=16690) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16690action=view) assembly file generated for gcc.target/i386/pr32661-1.c at -m64 on i686-apple-darwin9 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #13 from howarth at nitro dot med dot uc dot edu 2008-11-16 00:01 --- Test fails as... Executing on host: /sw/src/fink.build/gcc44-4.3.999-20081115/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc44-4.3.999-20081115/darwin_objdi r/gcc/ /sw/src/fink.build/gcc44-4.3.999-20081115/gcc-4.4-20081115/gcc/testsuite/gcc.target/i386/pr32661-1.c -O2 -S -m64 -o pr32661-1.s (timeout = 300) PASS: gcc.target/i386/pr32661-1.c (test for excess errors) FAIL: gcc.target/i386/pr32661-1.c scan-assembler-times mov 2 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #10 from ubizjak at gmail dot com 2007-08-28 09:57 --- Fixed. -- ubizjak at gmail dot com changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
-- ubizjak at gmail dot com changed: What|Removed |Added Target Milestone|--- |4.3.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #9 from uros at gcc dot gnu dot org 2007-08-28 09:52 --- Subject: Bug 32661 Author: uros Date: Tue Aug 28 09:52:06 2007 New Revision: 127857 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=127857 Log: PR target/32661 * simplify-rtx.c (simplify_binary_operation_1) [VEC_SELECT]: Simplify nested VEC_SELECT (with optional VEC_CONCAT operator as operand) when top VEC_SELECT extracts scalar element. * config/i386/sse.md (*vec_extract_v4si_mem): New. (*vec_extract_v4sf_mem): Ditto. testsuite/ChangeLog: PR target/32661 * gcc.target/i386/pr32661.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr32661.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/simplify-rtx.c trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #7 from ubizjak at gmail dot com 2007-07-13 06:08 --- I have following patch that solves nested VEC_SELECT insn. However, I would like to enhance it for nested VEC_SELECT (VEC_SELECT (VEC_DUPLICATE (...))) that is generated i.e. for __builtin_ia32_vec_ext_v4si(*val, 2); Index: simplify-rtx.c === --- simplify-rtx.c (revision 126587) +++ simplify-rtx.c (working copy) @@ -2669,6 +2669,31 @@ simplify_binary_operation_1 (enum rtx_co if (GET_CODE (trueop0) == CONST_VECTOR) return CONST_VECTOR_ELT (trueop0, INTVAL (XVECEXP (trueop1, 0, 0))); + if (GET_CODE (trueop0) == VEC_SELECT + (GET_MODE (XEXP (trueop0, 0)) == GET_MODE (trueop0))) + { + rtx op = XEXP (trueop0, 0); + rtx sel = XEXP (trueop0, 1); + enum machine_mode opmode = GET_MODE (op); + rtvec vec; + rtx tmp; + + int elt_size = GET_MODE_SIZE (GET_MODE_INNER (opmode)); + int n_elts = GET_MODE_SIZE (opmode) / elt_size; + + int i = INTVAL (XVECEXP (trueop1, 0, 0)); + + gcc_assert (GET_CODE (sel) == PARALLEL); + gcc_assert (i n_elts); + + /* Select value, pointed by nested selector. */ + vec = rtvec_alloc (1); + RTVEC_ELT (vec, 0) = CONST_VECTOR_ELT (sel, i); + tmp = gen_rtx_PARALLEL (VOIDmode, vec); + + tmp = gen_rtx_fmt_ee (code, mode, op, tmp); + return tmp; + } } else { Index: config/i386/sse.md === --- config/i386/sse.md (revision 126587) +++ config/i386/sse.md (working copy) @@ -4578,6 +4578,22 @@ operands[1] = gen_rtx_REG (SImode, REGNO (operands[1])); }) +(define_insn_and_split *sse2_stored_1 + [(set (match_operand:SI 0 register_operand =r) + (vec_select:SI + (match_operand:V4SI 1 memory_operand o) + (parallel [(match_operand 2 const_0_to_3_operand )])))] + TARGET_SSE + # + reload_completed + [(const_int 0)] +{ + int i = INTVAL (operands[2]); + + emit_move_insn (operands[0], adjust_address (operands[1], SImode, i*4)); + DONE; +}) + (define_expand sse_storeq [(set (match_operand:DI 0 nonimmediate_operand ) (vec_select:DI -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #8 from ubizjak at gmail dot com 2007-07-13 13:25 --- Patch for SImode and SFmode vec_select at http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01263.html -- ubizjak at gmail dot com changed: What|Removed |Added URL|http://gcc.gnu.org/ml/gcc- |http://gcc.gnu.org/ml/gcc- |patches/2007- |patches/2007- |07/msg01077.html|07/msg01263.html Keywords||patch http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #2 from scovich at gmail dot com 2007-07-11 15:03 --- (In reply to comment #1) Confirmed, not a regression. Also affects 4.3. Changing target -- scovich at gmail dot com changed: What|Removed |Added Version|4.1.2 |4.3.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #3 from scovich at gmail dot com 2007-07-11 15:10 --- This bug also causes _mm_cvtsi128_si64x() (which calls __builtin_ia32_vec_ext_v2di) to emit suboptimal code. // g++-4.3-070710 -mtune=core2 -O3 -S -dp #include emmintrin.h long vector2long(__m128i* src) { return _mm_cvtsi128_si64x(*src); } Becomes _Z11vector2longPU8__vectorx: .LFB529: movdqa (%rdi), %xmm0 # 6 *movv2di_internal/2 [length = 3] movd%xmm0, %rax # 25*movdi_1_rex64/14 [length = 4] ret # 28return_internal [length = 1] This might be related to bug 32708 (and therefore have a similar fix?) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #4 from uros at gcc dot gnu dot org 2007-07-11 18:43 --- Subject: Bug 32661 Author: uros Date: Wed Jul 11 18:42:44 2007 New Revision: 126557 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=126557 Log: PR target/32661 * config/i386/sse.md (*sse2_storeq_rex64): Handle 64bit mem-reg moves. (*vec_extractv2di_1_sse2): Disable for TARGET_64BIT. (*vec_extractv2di_1_rex64): New insn pattern. testsuite/ChangeLog: PR target/32661 * gcc.target/i386/pr32661-1.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr32661-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #5 from ubizjak at gmail dot com 2007-07-11 18:47 --- (In reply to comment #3) This might be related to bug 32708 (and therefore have a similar fix?) Yes, DImode moves are implemented/fixed by the patch above. Your example now compiles to: movq(%rdi), %rax ret Other examples are shown in http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01077.html. SImode moves will be a bit harder, because shufps insn pattern is involved in the vector expansion. -- ubizjak at gmail dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |ubizjak at gmail dot com |dot org | URL||http://gcc.gnu.org/ml/gcc- ||patches/2007- ||07/msg01077.html Status|NEW |ASSIGNED Last reconfirmed|2007-07-07 09:25:01 |2007-07-11 18:47:20 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #6 from scovich at gmail dot com 2007-07-11 20:27 --- (In reply to comment #5) SImode moves will be a bit harder, because shufps insn pattern is involved in the vector expansion. IIRC, shufps takes 3 cycles on Core2 (http://www.agner.org/optimize/instruction_tables.pdf), even without the operand type mismatch (does that still exist?). That's =4 cycles. Storing the vector to stack and load the desired entry would take =4 cycles, even without Intel's store-load optimizations, and I imagine the optimizer would be able to deal with it better. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
--- Comment #1 from ubizjak at gmail dot com 2007-07-07 09:25 --- Confirmed, not a regression. -- ubizjak at gmail dot com changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-07-07 09:25:01 date|| Target Milestone|--- |4.3.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
-- rguenth at gcc dot gnu dot org changed: What|Removed |Added CC||rguenth at gcc dot gnu dot ||org Target Milestone|4.3.0 |--- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661
[Bug target/32661] __builtin_ia32_vec_ext suboptimal for pointer/ref args
-- pinskia at gcc dot gnu dot org changed: What|Removed |Added CC||pinskia at gcc dot gnu dot ||org Severity|normal |enhancement Component|middle-end |target Keywords||missed-optimization http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32661