[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization Severity|normal |enhancement Resolution|--- |FIXED Target Milestone|--- |4.7.0 Status|NEW |RESOLVED --- Comment #11 from Andrew Pinski --- Fixed almost 10 years ago but was not closed.
[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074 --- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-22 09:38:53 UTC --- Created attachment 25878 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25878 gcc47-pr51074-be.patch Big endian fix, untested.
[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074 --- Comment #8 from Richard Henderson rth at gcc dot gnu.org 2011-11-22 16:08:17 UTC --- No, Jakub, vector elements are in memory order. There is no adjustment to be made here. Unfortunately ppc represents its interleave patterns non-standard, but one can interpret. E.g. the ultimate implementation of vec_interleave_low_v4si: (define_insn altivec_vmrglw [(set (match_operand:V4SI 0 register_operand =v) (vec_merge:V4SI (vec_select:V4SI (match_operand:V4SI 1 register_operand v) (parallel [(const_int 2) (const_int 0) (const_int 3) (const_int 1)])) (vec_select:V4SI (match_operand:V4SI 2 register_operand v) (parallel [(const_int 0) (const_int 2) (const_int 1) (const_int 3)])) (const_int 5)))] By my reading that's { 4+0, 0, 4+1, 1 } if you consider op2 to be +4. Which is ... argument reversed from the normal { 0, 4, 1, 5 } that we expected, but certainly not the { 2, 6, 3, 7 } that you were going to generate with that patch. As for the swapped operands... that does seem to correlate with the actual output quoted in comment #6. It seems like we need to dig and figure out if the rtl is wrong, or if arguments got swapped along the N stage path between vector.md and the ultimate altivec.md pattern.
[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074 --- Comment #9 from Pat Haugen pthaugen at gcc dot gnu.org 2011-11-22 16:15:09 UTC --- (In reply to comment #7) Created attachment 25878 [details] gcc47-pr51074-be.patch Big endian fix, untested. This patch fixes the issue on both my testcase and the cpu2000 benchmark.
[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074 --- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-22 16:57:39 UTC --- Author: jakub Date: Tue Nov 22 16:57:33 2011 New Revision: 181627 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=181627 Log: PR tree-optimization/51074 * fold-const.c (fold_binary_loc): Fix up VEC_INTERLEAVE_*_EXPR handling for BYTES_BIG_ENDIAN. * optabs.c (can_vec_perm_for_code_p): Likewise. * gcc.dg/vect/pr51074.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/pr51074.c Modified: trunk/gcc/ChangeLog trunk/gcc/fold-const.c trunk/gcc/optabs.c trunk/gcc/testsuite/ChangeLog
[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074 Pat Haugen pthaugen at gcc dot gnu.org changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org --- Comment #6 from Pat Haugen pthaugen at gcc dot gnu.org 2011-11-21 23:57:29 UTC --- cpu2000 benchmark 176.gcc started failing on PPC with this patch (miscompare of results). Appears to be due to result from VEC_INTERLEAVE_[LOW|HIGH]_EXPR when folding. Following is simple testcase to demonstrate results. temp/gcc cat junk.c #include stdio.h #define NUM 8 struct hard_reg_n_uses { int regno; int uses; }; struct hard_reg_n_uses hard_reg_n_uses[NUM]; void main() { int i; for (i = 0; i NUM; i++) { hard_reg_n_uses[i].uses = 0; hard_reg_n_uses[i].regno = i; } for (i = 0; i NUM; i++) printf(i = %d regno = %d\n,i,hard_reg_n_uses[i].regno); } When compiled with revisions prior to r181297 with -O3 -mcpu=power7 I get the following results: temp/gcc a.out i = 0 regno = 0 i = 1 regno = 1 i = 2 regno = 2 i = 3 regno = 3 i = 4 regno = 4 i = 5 regno = 5 i = 6 regno = 6 i = 7 regno = 7 revision 181297 (and later) give: temp/gcc a.out i = 0 regno = 2 i = 1 regno = 3 i = 2 regno = 0 i = 3 regno = 1 i = 4 regno = 6 i = 5 regno = 7 i = 6 regno = 4 i = 7 regno = 5 Comparing the tree dumps of r181296 and r181297, diff comes in at 127t.dom2 dump. 36,38c36,38 vect_inter_high.15_17 = VEC_INTERLEAVE_HIGH_EXPR { 0, 1, 2, 3 }, { 0, 0, 0, 0 }; vect_inter_low.16_37 = VEC_INTERLEAVE_LOW_EXPR { 0, 1, 2, 3 }, { 0, 0, 0, 0 }; MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses] = vect_inter_high.15_17; --- vect_inter_high.15_17 = { 2, 0, 3, 0 }; vect_inter_low.16_37 = { 0, 0, 1, 0 }; MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses] = { 2, 0, 3, 0 }; 40c40 MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses + 16B] = vect_inter_low.16_37; --- MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses + 16B] = { 0, 0, 1, 0 }; 47,49c47,49 vect_inter_high.15_28 = VEC_INTERLEAVE_HIGH_EXPR { 4, 5, 6, 7 }, { 0, 0, 0, 0 }; vect_inter_low.16_29 = VEC_INTERLEAVE_LOW_EXPR { 4, 5, 6, 7 }, { 0, 0, 0, 0 }; MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses + 32B] = vect_inter_high.15_28; --- vect_inter_high.15_28 = { 6, 0, 7, 0 }; vect_inter_low.16_29 = { 4, 0, 5, 0 }; MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses + 32B] = { 6, 0, 7, 0 }; 51c51 MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses + 48B] = vect_inter_low.16_29; --- MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses + 48B] = { 4, 0, 5, 0 };
[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-11 19:55:26 UTC --- Author: jakub Date: Fri Nov 11 19:55:23 2011 New Revision: 181297 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=181297 Log: PR tree-optimization/51074 * fold-const.c (vec_cst_ctor_to_array, fold_vec_perm): New functions. (fold_binary_loc): Handle VEC_EXTRACT_EVEN_EXPR, VEC_EXTRACT_ODD_EXPR, VEC_INTERLEAVE_HIGH_EXPR and VEC_INTERLEAVE_LOW_EXPR with VECTOR_CST or CONSTRUCTOR operands. (fold_ternary_loc): Handle VEC_PERM_EXPR with VECTOR_CST or CONSTRUCTOR operands. * tree-ssa-propagate.c (valid_gimple_rhs_p): Handle ternary expressions. * tree-vect-generic.c (lower_vec_perm): Mask sel_int elements to 0 .. 2 * elements - 1. Modified: trunk/gcc/ChangeLog trunk/gcc/fold-const.c trunk/gcc/tree-ssa-propagate.c trunk/gcc/tree-vect-generic.c
[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2011-11-10 Ever Confirmed|0 |1 --- Comment #1 from Richard Guenther rguenth at gcc dot gnu.org 2011-11-10 10:40:20 UTC --- Well, why not handle the tree codes in fold-const.c? That way all propagators would handle it via gimple_fold_stmt_to_constant_1.
[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074 --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-10 10:50:54 UTC --- The case I was worried was if we have a single VECTOR_CST before the loop and then create 16 different vectors out of it using different permutations, then perhaps the permutations of the same VECTOR_CST might be cheaper over having to load 10 constants out of memory because the register pressure was too high. But perhaps that is unlikely and we just should fold, if it works in fold-const.c, sure. Doing something about interleaved const store in the vectorizer is desirable anyway, even if we leave the folding to following passes, the fact that we don't need any interleaves means we perhaps might handle more cases and the cost model wouldn't reject it so often.
[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074 --- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-10 13:47:47 UTC --- Created attachment 25784 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25784 gcc47-pr51074.patch Folding patch. For __builtin_shuffle it works well. For the interleaved stores of addresses like: char *a[1024]; extern char b[]; void foo () { int i; for (i = 0; i 1024; i += 16) { a[i] = b + 1; a[i + 15] = b + 2; a[i + 1] = b + 3; a[i + 14] = b + 4; a[i + 2] = b + 5; a[i + 13] = b + 6; a[i + 3] = b + 7; a[i + 12] = b + 8; a[i + 4] = b + 9; a[i + 11] = b + 10; a[i + 5] = b + 11; a[i + 10] = b + 12; a[i + 6] = b + 13; a[i + 9] = b + 14; a[i + 7] = b + 15; a[i + 8] = b + 16; } } it doesn't help, I'd need to do something like: --- tree-ssa-propagate.c.jj 2011-09-29 14:25:46.0 +0200 +++ tree-ssa-propagate.c 2011-11-10 14:33:55.923268422 +0100 @@ -610,6 +610,8 @@ valid_gimple_rhs_p (tree expr) return false; case tcc_exceptional: + if (code == CONSTRUCTOR TREE_CODE (TREE_TYPE (expr)) == VECTOR_TYPE) +break; if (code != SSA_NAME) return false; break; but that isn't helpful either (because it puts the vector CONSTRUCTOR inside of loop and thus prevents vectorization - is expanded piecewise).
[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074 --- Comment #4 from Richard Guenther rguenth at gcc dot gnu.org 2011-11-10 13:59:26 UTC --- (In reply to comment #3) Created attachment 25784 [details] gcc47-pr51074.patch Folding patch. For __builtin_shuffle it works well. Looks good. For the interleaved stores of addresses like: char *a[1024]; extern char b[]; void foo () { int i; for (i = 0; i 1024; i += 16) { a[i] = b + 1; a[i + 15] = b + 2; a[i + 1] = b + 3; a[i + 14] = b + 4; a[i + 2] = b + 5; a[i + 13] = b + 6; a[i + 3] = b + 7; a[i + 12] = b + 8; a[i + 4] = b + 9; a[i + 11] = b + 10; a[i + 5] = b + 11; a[i + 10] = b + 12; a[i + 6] = b + 13; a[i + 9] = b + 14; a[i + 7] = b + 15; a[i + 8] = b + 16; } } it doesn't help, I'd need to do something like: --- tree-ssa-propagate.c.jj 2011-09-29 14:25:46.0 +0200 +++ tree-ssa-propagate.c 2011-11-10 14:33:55.923268422 +0100 @@ -610,6 +610,8 @@ valid_gimple_rhs_p (tree expr) return false; case tcc_exceptional: + if (code == CONSTRUCTOR TREE_CODE (TREE_TYPE (expr)) == VECTOR_TYPE) +break; if (code != SSA_NAME) return false; break; but that isn't helpful either (because it puts the vector CONSTRUCTOR inside of loop and thus prevents vectorization - is expanded piecewise). Still the above is a good thing anyway - it is a valid gimple RHS after all. Loop IM should be able to hoist the constructor - why doesn't it do that? (PRE, too)