[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR

2021-07-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Severity|normal  |enhancement
 Resolution|--- |FIXED
   Target Milestone|--- |4.7.0
 Status|NEW |RESOLVED

--- Comment #11 from Andrew Pinski  ---
Fixed almost 10 years ago but was not closed.

[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR

2011-11-22 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074

--- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-22 
09:38:53 UTC ---
Created attachment 25878
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25878
gcc47-pr51074-be.patch

Big endian fix, untested.


[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR

2011-11-22 Thread rth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074

--- Comment #8 from Richard Henderson rth at gcc dot gnu.org 2011-11-22 
16:08:17 UTC ---
No, Jakub, vector elements are in memory order.  There is no adjustment
to be made here.

Unfortunately ppc represents its interleave patterns non-standard, but
one can interpret.  E.g. the ultimate implementation of
vec_interleave_low_v4si:

(define_insn altivec_vmrglw
  [(set (match_operand:V4SI 0 register_operand =v)
(vec_merge:V4SI
 (vec_select:V4SI (match_operand:V4SI 1 register_operand v)
  (parallel [(const_int 2)
 (const_int 0)
 (const_int 3)
 (const_int 1)]))
 (vec_select:V4SI (match_operand:V4SI 2 register_operand v)
  (parallel [(const_int 0)
 (const_int 2)
 (const_int 1)
 (const_int 3)]))
 (const_int 5)))]

By my reading that's { 4+0, 0, 4+1, 1 } if you consider op2 to be +4.
Which is ... argument reversed from the normal { 0, 4, 1, 5 } that we
expected, but certainly not the { 2, 6, 3, 7 } that you were going to
generate with that patch.

As for the swapped operands... that does seem to correlate with the
actual output quoted in comment #6.  It seems like we need to dig and
figure out if the rtl is wrong, or if arguments got swapped along the
N stage path between vector.md and the ultimate altivec.md pattern.


[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR

2011-11-22 Thread pthaugen at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074

--- Comment #9 from Pat Haugen pthaugen at gcc dot gnu.org 2011-11-22 
16:15:09 UTC ---
(In reply to comment #7)
 Created attachment 25878 [details]
 gcc47-pr51074-be.patch
 
 Big endian fix, untested.

This patch fixes the issue on both my testcase and the cpu2000 benchmark.


[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR

2011-11-22 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074

--- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-22 
16:57:39 UTC ---
Author: jakub
Date: Tue Nov 22 16:57:33 2011
New Revision: 181627

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=181627
Log:
PR tree-optimization/51074
* fold-const.c (fold_binary_loc): Fix up VEC_INTERLEAVE_*_EXPR
handling for BYTES_BIG_ENDIAN.
* optabs.c (can_vec_perm_for_code_p): Likewise.

* gcc.dg/vect/pr51074.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/vect/pr51074.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/fold-const.c
trunk/gcc/optabs.c
trunk/gcc/testsuite/ChangeLog


[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR

2011-11-21 Thread pthaugen at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074

Pat Haugen pthaugen at gcc dot gnu.org changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org

--- Comment #6 from Pat Haugen pthaugen at gcc dot gnu.org 2011-11-21 
23:57:29 UTC ---
cpu2000 benchmark 176.gcc started failing on PPC with this patch (miscompare of
results). Appears to be due to result from VEC_INTERLEAVE_[LOW|HIGH]_EXPR when
folding.  Following is simple testcase to demonstrate results.

temp/gcc cat junk.c
#include stdio.h

#define NUM 8
struct hard_reg_n_uses { int regno; int uses; };
struct hard_reg_n_uses hard_reg_n_uses[NUM];

void main() {
  int i;

  for (i = 0; i  NUM; i++)
{
  hard_reg_n_uses[i].uses = 0;
  hard_reg_n_uses[i].regno = i;
}

  for (i = 0; i  NUM; i++)
printf(i = %d  regno = %d\n,i,hard_reg_n_uses[i].regno);

}


When compiled with revisions prior to r181297 with -O3 -mcpu=power7 I get the
following results:

temp/gcc a.out
i = 0   regno = 0
i = 1   regno = 1
i = 2   regno = 2
i = 3   regno = 3
i = 4   regno = 4
i = 5   regno = 5
i = 6   regno = 6
i = 7   regno = 7


revision 181297 (and later) give:

temp/gcc a.out
i = 0   regno = 2
i = 1   regno = 3
i = 2   regno = 0
i = 3   regno = 1
i = 4   regno = 6
i = 5   regno = 7
i = 6   regno = 4
i = 7   regno = 5


Comparing the tree dumps of r181296 and r181297, diff comes in at 127t.dom2
dump.


36,38c36,38
   vect_inter_high.15_17 = VEC_INTERLEAVE_HIGH_EXPR { 0, 1, 2, 3 }, { 0, 0,
0, 0 };
   vect_inter_low.16_37 = VEC_INTERLEAVE_LOW_EXPR { 0, 1, 2, 3 }, { 0, 0, 0,
0 };
   MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses] = vect_inter_high.15_17;
---
   vect_inter_high.15_17 = { 2, 0, 3, 0 };
   vect_inter_low.16_37 = { 0, 0, 1, 0 };
   MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses] = { 2, 0, 3, 0 };
40c40
   MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses + 16B] =
vect_inter_low.16_37;
---
   MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses + 16B] = { 0, 0, 1, 0 };
47,49c47,49
   vect_inter_high.15_28 = VEC_INTERLEAVE_HIGH_EXPR { 4, 5, 6, 7 }, { 0, 0,
0, 0 };
   vect_inter_low.16_29 = VEC_INTERLEAVE_LOW_EXPR { 4, 5, 6, 7 }, { 0, 0, 0,
0 };
   MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses + 32B] =
vect_inter_high.15_28;
---
   vect_inter_high.15_28 = { 6, 0, 7, 0 };
   vect_inter_low.16_29 = { 4, 0, 5, 0 };
   MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses + 32B] = { 6, 0, 7, 0 };
51c51
   MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses + 48B] =
vect_inter_low.16_29;
---
   MEM[(struct hard_reg_n_uses[8] *)hard_reg_n_uses + 48B] = { 4, 0, 5, 0 };


[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR

2011-11-11 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074

--- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-11 
19:55:26 UTC ---
Author: jakub
Date: Fri Nov 11 19:55:23 2011
New Revision: 181297

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=181297
Log:
PR tree-optimization/51074
* fold-const.c (vec_cst_ctor_to_array, fold_vec_perm): New functions.
(fold_binary_loc): Handle VEC_EXTRACT_EVEN_EXPR,
VEC_EXTRACT_ODD_EXPR, VEC_INTERLEAVE_HIGH_EXPR and
VEC_INTERLEAVE_LOW_EXPR with VECTOR_CST or CONSTRUCTOR operands.
(fold_ternary_loc): Handle VEC_PERM_EXPR with VECTOR_CST or
CONSTRUCTOR operands.
* tree-ssa-propagate.c (valid_gimple_rhs_p): Handle ternary
expressions.
* tree-vect-generic.c (lower_vec_perm): Mask sel_int elements
to 0 .. 2 * elements - 1.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/fold-const.c
trunk/gcc/tree-ssa-propagate.c
trunk/gcc/tree-vect-generic.c


[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR

2011-11-10 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074

Richard Guenther rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2011-11-10
 Ever Confirmed|0   |1

--- Comment #1 from Richard Guenther rguenth at gcc dot gnu.org 2011-11-10 
10:40:20 UTC ---
Well, why not handle the tree codes in fold-const.c?  That way all propagators
would handle it via gimple_fold_stmt_to_constant_1.


[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR

2011-11-10 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074

--- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-10 
10:50:54 UTC ---
The case I was worried was if we have a single VECTOR_CST before the loop and
then create 16 different vectors out of it using different permutations, then
perhaps the permutations of the same VECTOR_CST might be cheaper over having to
load 10 constants out of memory because the register pressure was too high.
But perhaps that is unlikely and we just should fold, if it works in
fold-const.c, sure.

Doing something about interleaved const store in the vectorizer is desirable
anyway, even if we leave the folding to following passes, the fact that we
don't need any interleaves means we perhaps might handle more cases and the
cost model wouldn't reject it so often.


[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR

2011-11-10 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074

--- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org 2011-11-10 
13:47:47 UTC ---
Created attachment 25784
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25784
gcc47-pr51074.patch

Folding patch.  For __builtin_shuffle it works well.

For the interleaved stores of addresses like:
char *a[1024];
extern char b[];

void
foo ()
{
  int i;
  for (i = 0; i  1024; i += 16)
{
  a[i] = b + 1;
  a[i + 15] = b + 2;
  a[i + 1] = b + 3;
  a[i + 14] = b + 4;
  a[i + 2] = b + 5;
  a[i + 13] = b + 6;
  a[i + 3] = b + 7;
  a[i + 12] = b + 8;
  a[i + 4] = b + 9;
  a[i + 11] = b + 10;
  a[i + 5] = b + 11;
  a[i + 10] = b + 12;
  a[i + 6] = b + 13;
  a[i + 9] = b + 14;
  a[i + 7] = b + 15;
  a[i + 8] = b + 16;
}
}

it doesn't help, I'd need to do something like:
--- tree-ssa-propagate.c.jj 2011-09-29 14:25:46.0 +0200
+++ tree-ssa-propagate.c 2011-11-10 14:33:55.923268422 +0100
@@ -610,6 +610,8 @@ valid_gimple_rhs_p (tree expr)
   return false;

 case tcc_exceptional:
+  if (code == CONSTRUCTOR  TREE_CODE (TREE_TYPE (expr)) == VECTOR_TYPE)
+break;
   if (code != SSA_NAME)
 return false;
   break;

but that isn't helpful either (because it puts the vector CONSTRUCTOR inside of
loop and thus prevents vectorization - is expanded piecewise).


[Bug tree-optimization/51074] No constant folding performed for VEC_PERM_EXPR, VEC_INTERLEAVE*EXPR, VEC_EXTRACT*EXPR

2011-11-10 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51074

--- Comment #4 from Richard Guenther rguenth at gcc dot gnu.org 2011-11-10 
13:59:26 UTC ---
(In reply to comment #3)
 Created attachment 25784 [details]
 gcc47-pr51074.patch
 
 Folding patch.  For __builtin_shuffle it works well.

Looks good.

 For the interleaved stores of addresses like:
 char *a[1024];
 extern char b[];
 
 void
 foo ()
 {
   int i;
   for (i = 0; i  1024; i += 16)
 {
   a[i] = b + 1;
   a[i + 15] = b + 2;
   a[i + 1] = b + 3;
   a[i + 14] = b + 4;
   a[i + 2] = b + 5;
   a[i + 13] = b + 6;
   a[i + 3] = b + 7;
   a[i + 12] = b + 8;
   a[i + 4] = b + 9;
   a[i + 11] = b + 10;
   a[i + 5] = b + 11;
   a[i + 10] = b + 12;
   a[i + 6] = b + 13;
   a[i + 9] = b + 14;
   a[i + 7] = b + 15;
   a[i + 8] = b + 16;
 }
 }
 
 it doesn't help, I'd need to do something like:
 --- tree-ssa-propagate.c.jj 2011-09-29 14:25:46.0 +0200
 +++ tree-ssa-propagate.c 2011-11-10 14:33:55.923268422 +0100
 @@ -610,6 +610,8 @@ valid_gimple_rhs_p (tree expr)
return false;
 
  case tcc_exceptional:
 +  if (code == CONSTRUCTOR  TREE_CODE (TREE_TYPE (expr)) == VECTOR_TYPE)
 +break;
if (code != SSA_NAME)
  return false;
break;
 
 but that isn't helpful either (because it puts the vector CONSTRUCTOR inside 
 of
 loop and thus prevents vectorization - is expanded piecewise).

Still the above is a good thing anyway - it is a valid gimple RHS after all.
Loop IM should be able to hoist the constructor - why doesn't it do that?
(PRE, too)