[Bug tree-optimization/88828] Inefficient update of the first element of vector registers

2019-06-24 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828

H.J. Lu  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |10.0

--- Comment #9 from H.J. Lu  ---
Fixed for GCC 10.

[Bug tree-optimization/88828] Inefficient update of the first element of vector registers

2019-06-20 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828
Bug 88828 depends on bug 54855, which changed state.

Bug 54855 Summary: Unnecessary duplication when performing scalar operation on 
vector element
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/88828] Inefficient update of the first element of vector registers

2019-05-15 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828

--- Comment #8 from Richard Biener  ---
Author: rguenth
Date: Wed May 15 09:59:37 2019
New Revision: 271204

URL: https://gcc.gnu.org/viewcvs?rev=271204=gcc=rev
Log:
2019-05-15  Richard Biener  

PR tree-optimization/88828
* tree-ssa-forwprop.c (simplify_vector_constructor): Fix
bogus check.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-forwprop.c

[Bug tree-optimization/88828] Inefficient update of the first element of vector registers

2019-05-15 Thread sch...@linux-m68k.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828

--- Comment #7 from Andreas Schwab  ---
../../gcc/tree-ssa-forwprop.c: In function 'bool
simplify_vector_constructor(gimple_stmt_iterator*)':
../../gcc/tree-ssa-forwprop.c:2107:14: error: array subscript 2 is above array
bounds of 'tree_node* [2]' [-Werror=array-bounds]
 2107 |orig[j] = ref;
  |~~^
../../gcc/tree-ssa-forwprop.c:2044:17: note: while referencing 'orig'
 2044 |   tree op, op2, orig[2], type, elem_type;
  | ^~~~
../../gcc/tree-ssa-forwprop.c:2107:14: error: array subscript 2 is above array
bounds of 'tree_node* [2]' [-Werror=array-bounds]
 2107 |orig[j] = ref;
  |~~^
../../gcc/tree-ssa-forwprop.c:2044:17: note: while referencing 'orig'
 2044 |   tree op, op2, orig[2], type, elem_type;
  | ^~~~

[Bug tree-optimization/88828] Inefficient update of the first element of vector registers

2019-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828

--- Comment #6 from Richard Biener  ---
Author: rguenth
Date: Tue May 14 09:11:15 2019
New Revision: 271153

URL: https://gcc.gnu.org/viewcvs?rev=271153=gcc=rev
Log:
2019-05-14  Richard Biener  
H.J. Lu  

PR tree-optimization/88828
* tree-ssa-forwprop.c (simplify_vector_constructor): Handle
permuting in a single non-constant element not extracted
from a vector.

* gcc.target/i386/pr88828-1.c: New test.
* gcc.target/i386/pr88828-1a.c: Likewise.
* gcc.target/i386/pr88828-1b.c: Likewise.
* gcc.target/i386/pr88828-1c.c: Likewise.
* gcc.target/i386/pr88828-4a.c: Likewise.
* gcc.target/i386/pr88828-4b.c: Likewise.
* gcc.target/i386/pr88828-5a.c: Likewise.
* gcc.target/i386/pr88828-5b.c: Likewise.
* gcc.target/i386/pr88828-7.c: Likewise.
* gcc.target/i386/pr88828-7a.c: Likewise.
* gcc.target/i386/pr88828-7b.c: Likewise.
* gcc.target/i386/pr88828-8.c: Likewise.
* gcc.target/i386/pr88828-8a.c: Likewise.
* gcc.target/i386/pr88828-8b.c: Likewise.
* gcc.target/i386/pr88828-9.c: Likewise.
* gcc.target/i386/pr88828-9a.c: Likewise.
* gcc.target/i386/pr88828-9b.c: Likewise.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr88828-1.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-1a.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-1b.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-1c.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-4a.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-4b.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-5a.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-5b.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-7.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-7a.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-7b.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-8.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-8a.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-8b.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-9.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-9a.c
trunk/gcc/testsuite/gcc.target/i386/pr88828-9b.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-forwprop.c

[Bug tree-optimization/88828] Inefficient update of the first element of vector registers

2019-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828

--- Comment #5 from Richard Biener  ---
Author: rguenth
Date: Mon May  6 12:43:30 2019
New Revision: 270908

URL: https://gcc.gnu.org/viewcvs?rev=270908=gcc=rev
Log:
2019-05-06  Richard Biener  

PR tree-optimization/88828
* tree-ssa-forwprop.c (get_bit_field_ref_def): Split out from...
(simplify_vector_constructor): ...here.  Handle constants in
the constructor.

* gcc.target/i386/pr88828-0.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr88828-0.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-forwprop.c

[Bug tree-optimization/88828] Inefficient update of the first element of vector registers

2019-01-23 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828

Marc Glisse  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=54855

--- Comment #4 from Marc Glisse  ---
Comment #3 is similar to PR 54855.

[Bug tree-optimization/88828] Inefficient update of the first element of vector registers

2019-01-23 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828

H.J. Lu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com,
   ||xuepeng.guo at intel dot com

--- Comment #3 from H.J. Lu  ---
Another testcase:

[hjl@gnu-cfl-1 pr88828]$ cat y.i
typedef double __v2df __attribute__ ((__vector_size__ (16)));
typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__));

__m128d
_mm_add_sd (__m128d x, __m128d y)
{
  __m128d z =  __extension__ (__m128d)(__v2df)
{ (((__v2df) x)[0] + ((__v2df) y)[0]), ((__v2df) x)[1] };
  return z;
}
[hjl@gnu-cfl-1 pr88828]$  gcc -S -O2 y.i 
[hjl@gnu-cfl-1 pr88828]$  cat y.s
.file   "y.i"
.text
.p2align 4,,15
.globl  _mm_add_sd
.type   _mm_add_sd, @function
_mm_add_sd:
.LFB0:
.cfi_startproc
movapd  %xmm0, %xmm2
addsd   %xmm1, %xmm2
movsd   %xmm2, %xmm0
ret
.cfi_endproc
.LFE0:
.size   _mm_add_sd, .-_mm_add_sd
.ident  "GCC: (GNU) 8.2.1 20190109 (Red Hat 8.2.1-7)"
.section.note.GNU-stack,"",@progbits
[hjl@gnu-cfl-1 pr88828]$ 

I am expecting

addsd   %xmm1, %xmm0
retq

[Bug tree-optimization/88828] Inefficient update of the first element of vector registers

2019-01-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-14
 CC||rguenth at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
I think there's related bugs.  foo1 is optimized OK:

  y_4 = BIT_INSERT_EXPR ;
  return y_4;

while foo is expanded from

   [local count: 1073741824]:
  _1 = BIT_FIELD_REF ;
  _2 = BIT_FIELD_REF ;
  _3 = BIT_FIELD_REF ;
  y_6 = {f_5(D), _1, _2, _3};
  return y_6;

tree forwprop contains code pattern-matching on vector CONSTRUCTORs,
it could be extended to handle this case I think.  IIRC it can detect
arbitrary two-vector permutes already, for the above we could go
through an intermediate

  _1 = {f_5(D), f_5(D), ... };
  y_6 = VEC_PERM <_1, x_7(D), {  }>;

and recognize permutes that only replace a single vector element.

So I think we should optimize

__v4sf
foo (__v4sf x, float f)
{
__v4sf y =  __extension__ (__v4sf)
  { f, x[2], x[1], x[3] };
  return y;
}

as well, first permuting x and then inserting f (at any position).

[Bug tree-optimization/88828] Inefficient update of the first element of vector registers

2019-01-13 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||x86_64
  Component|target  |tree-optimization

--- Comment #1 from Andrew Pinski  ---
I think there are two issues here (maybe only one since I have not tested one
of them).

The first is not recognizing:
typedef float __v4sf __attribute__ ((__vector_size__ (16)));

__v4sf 
foo (__v4sf x, float f)
{
  __v4sf y =  __extension__ (__v4sf)
  { f, x[1], x[2], x[3] };
  return y;
}

is the same as:
__v4sf 
foo1 (__v4sf x, float f)
{
  __v4sf y =  x;
  y[0] = f;
  return y;
}

This is a generic tree optimization issue.
The second is if foo1 is not optimized to what you want it to be.  That would
be a target issue.