[Bug tree-optimization/88828] Inefficient update of the first element of vector registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828 H.J. Lu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |10.0 --- Comment #9 from H.J. Lu --- Fixed for GCC 10.
[Bug tree-optimization/88828] Inefficient update of the first element of vector registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828 Bug 88828 depends on bug 54855, which changed state. Bug 54855 Summary: Unnecessary duplication when performing scalar operation on vector element https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/88828] Inefficient update of the first element of vector registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828 --- Comment #8 from Richard Biener --- Author: rguenth Date: Wed May 15 09:59:37 2019 New Revision: 271204 URL: https://gcc.gnu.org/viewcvs?rev=271204=gcc=rev Log: 2019-05-15 Richard Biener PR tree-optimization/88828 * tree-ssa-forwprop.c (simplify_vector_constructor): Fix bogus check. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-forwprop.c
[Bug tree-optimization/88828] Inefficient update of the first element of vector registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828 --- Comment #7 from Andreas Schwab --- ../../gcc/tree-ssa-forwprop.c: In function 'bool simplify_vector_constructor(gimple_stmt_iterator*)': ../../gcc/tree-ssa-forwprop.c:2107:14: error: array subscript 2 is above array bounds of 'tree_node* [2]' [-Werror=array-bounds] 2107 |orig[j] = ref; |~~^ ../../gcc/tree-ssa-forwprop.c:2044:17: note: while referencing 'orig' 2044 | tree op, op2, orig[2], type, elem_type; | ^~~~ ../../gcc/tree-ssa-forwprop.c:2107:14: error: array subscript 2 is above array bounds of 'tree_node* [2]' [-Werror=array-bounds] 2107 |orig[j] = ref; |~~^ ../../gcc/tree-ssa-forwprop.c:2044:17: note: while referencing 'orig' 2044 | tree op, op2, orig[2], type, elem_type; | ^~~~
[Bug tree-optimization/88828] Inefficient update of the first element of vector registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828 --- Comment #6 from Richard Biener --- Author: rguenth Date: Tue May 14 09:11:15 2019 New Revision: 271153 URL: https://gcc.gnu.org/viewcvs?rev=271153=gcc=rev Log: 2019-05-14 Richard Biener H.J. Lu PR tree-optimization/88828 * tree-ssa-forwprop.c (simplify_vector_constructor): Handle permuting in a single non-constant element not extracted from a vector. * gcc.target/i386/pr88828-1.c: New test. * gcc.target/i386/pr88828-1a.c: Likewise. * gcc.target/i386/pr88828-1b.c: Likewise. * gcc.target/i386/pr88828-1c.c: Likewise. * gcc.target/i386/pr88828-4a.c: Likewise. * gcc.target/i386/pr88828-4b.c: Likewise. * gcc.target/i386/pr88828-5a.c: Likewise. * gcc.target/i386/pr88828-5b.c: Likewise. * gcc.target/i386/pr88828-7.c: Likewise. * gcc.target/i386/pr88828-7a.c: Likewise. * gcc.target/i386/pr88828-7b.c: Likewise. * gcc.target/i386/pr88828-8.c: Likewise. * gcc.target/i386/pr88828-8a.c: Likewise. * gcc.target/i386/pr88828-8b.c: Likewise. * gcc.target/i386/pr88828-9.c: Likewise. * gcc.target/i386/pr88828-9a.c: Likewise. * gcc.target/i386/pr88828-9b.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr88828-1.c trunk/gcc/testsuite/gcc.target/i386/pr88828-1a.c trunk/gcc/testsuite/gcc.target/i386/pr88828-1b.c trunk/gcc/testsuite/gcc.target/i386/pr88828-1c.c trunk/gcc/testsuite/gcc.target/i386/pr88828-4a.c trunk/gcc/testsuite/gcc.target/i386/pr88828-4b.c trunk/gcc/testsuite/gcc.target/i386/pr88828-5a.c trunk/gcc/testsuite/gcc.target/i386/pr88828-5b.c trunk/gcc/testsuite/gcc.target/i386/pr88828-7.c trunk/gcc/testsuite/gcc.target/i386/pr88828-7a.c trunk/gcc/testsuite/gcc.target/i386/pr88828-7b.c trunk/gcc/testsuite/gcc.target/i386/pr88828-8.c trunk/gcc/testsuite/gcc.target/i386/pr88828-8a.c trunk/gcc/testsuite/gcc.target/i386/pr88828-8b.c trunk/gcc/testsuite/gcc.target/i386/pr88828-9.c trunk/gcc/testsuite/gcc.target/i386/pr88828-9a.c trunk/gcc/testsuite/gcc.target/i386/pr88828-9b.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-forwprop.c
[Bug tree-optimization/88828] Inefficient update of the first element of vector registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828 --- Comment #5 from Richard Biener --- Author: rguenth Date: Mon May 6 12:43:30 2019 New Revision: 270908 URL: https://gcc.gnu.org/viewcvs?rev=270908=gcc=rev Log: 2019-05-06 Richard Biener PR tree-optimization/88828 * tree-ssa-forwprop.c (get_bit_field_ref_def): Split out from... (simplify_vector_constructor): ...here. Handle constants in the constructor. * gcc.target/i386/pr88828-0.c: New testcase. Added: trunk/gcc/testsuite/gcc.target/i386/pr88828-0.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-forwprop.c
[Bug tree-optimization/88828] Inefficient update of the first element of vector registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828 Marc Glisse changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=54855 --- Comment #4 from Marc Glisse --- Comment #3 is similar to PR 54855.
[Bug tree-optimization/88828] Inefficient update of the first element of vector registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828 H.J. Lu changed: What|Removed |Added CC||crazylht at gmail dot com, ||xuepeng.guo at intel dot com --- Comment #3 from H.J. Lu --- Another testcase: [hjl@gnu-cfl-1 pr88828]$ cat y.i typedef double __v2df __attribute__ ((__vector_size__ (16))); typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__)); __m128d _mm_add_sd (__m128d x, __m128d y) { __m128d z = __extension__ (__m128d)(__v2df) { (((__v2df) x)[0] + ((__v2df) y)[0]), ((__v2df) x)[1] }; return z; } [hjl@gnu-cfl-1 pr88828]$ gcc -S -O2 y.i [hjl@gnu-cfl-1 pr88828]$ cat y.s .file "y.i" .text .p2align 4,,15 .globl _mm_add_sd .type _mm_add_sd, @function _mm_add_sd: .LFB0: .cfi_startproc movapd %xmm0, %xmm2 addsd %xmm1, %xmm2 movsd %xmm2, %xmm0 ret .cfi_endproc .LFE0: .size _mm_add_sd, .-_mm_add_sd .ident "GCC: (GNU) 8.2.1 20190109 (Red Hat 8.2.1-7)" .section.note.GNU-stack,"",@progbits [hjl@gnu-cfl-1 pr88828]$ I am expecting addsd %xmm1, %xmm0 retq
[Bug tree-optimization/88828] Inefficient update of the first element of vector registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-14 CC||rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Richard Biener --- I think there's related bugs. foo1 is optimized OK: y_4 = BIT_INSERT_EXPR ; return y_4; while foo is expanded from [local count: 1073741824]: _1 = BIT_FIELD_REF ; _2 = BIT_FIELD_REF ; _3 = BIT_FIELD_REF ; y_6 = {f_5(D), _1, _2, _3}; return y_6; tree forwprop contains code pattern-matching on vector CONSTRUCTORs, it could be extended to handle this case I think. IIRC it can detect arbitrary two-vector permutes already, for the above we could go through an intermediate _1 = {f_5(D), f_5(D), ... }; y_6 = VEC_PERM <_1, x_7(D), { }>; and recognize permutes that only replace a single vector element. So I think we should optimize __v4sf foo (__v4sf x, float f) { __v4sf y = __extension__ (__v4sf) { f, x[2], x[1], x[3] }; return y; } as well, first permuting x and then inserting f (at any position).
[Bug tree-optimization/88828] Inefficient update of the first element of vector registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization Target||x86_64 Component|target |tree-optimization --- Comment #1 from Andrew Pinski --- I think there are two issues here (maybe only one since I have not tested one of them). The first is not recognizing: typedef float __v4sf __attribute__ ((__vector_size__ (16))); __v4sf foo (__v4sf x, float f) { __v4sf y = __extension__ (__v4sf) { f, x[1], x[2], x[3] }; return y; } is the same as: __v4sf foo1 (__v4sf x, float f) { __v4sf y = x; y[0] = f; return y; } This is a generic tree optimization issue. The second is if foo1 is not optimized to what you want it to be. That would be a target issue.