Re: [SVE][match.pd] Fix ICE observed in PR110280

2023-06-23 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 23 Jun 2023 at 14:58, Richard Biener  wrote:
>
> On Fri, Jun 23, 2023 at 11:09 AM Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 22 Jun 2023 at 18:06, Richard Biener  
> > wrote:
> > >
> > > On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Tue, 20 Jun 2023 at 16:47, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > Hi Richard,
> > > > > > For the following reduced test-case taken from PR:
> > > > > >
> > > > > > #include "arm_sve.h"
> > > > > > svuint32_t l() {
> > > > > >   alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
> > > > > >   return svld1rq_u32(svptrue_b8(), lanes);
> > > > > > }
> > > > > >
> > > > > > compiling with -O3 -mcpu=generic+sve results in following ICE:
> > > > > > during GIMPLE pass: fre
> > > > > > pr110280.c: In function 'l':
> > > > > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> > > > > > tree-ssa-sccvn.cc:6890
> > > > > > 5 | }
> > > > > >   | ^
> > > > > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> > > > > > gimple_stmt_iterator*)
> > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> > > > > > 0x120bf4d 
> > > > > > eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> > > > > > 0x120bf4d 
> > > > > > eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> > > > > > 0x1aeec77 dom_walker::walk(basic_block_def*)
> > > > > > ../../gcc/gcc/domwalk.cc:311
> > > > > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> > > > > > 0x1214664 do_rpo_vn_1
> > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> > > > > > 0x1215ba5 execute
> > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702
> > > > > >
> > > > > > cc1 simplifies:
> > > > > >   lanes[0] = 0;
> > > > > >   lanes[1] = 0;
> > > > > >   lanes[2] = 0;
> > > > > >   lanes[3] = 0;
> > > > > >   _1 = { -1, ... };
> > > > > >   _7 = svld1rq_u32 (_1, );
> > > > > >
> > > > > > to:
> > > > > >   _9 = MEM  [(unsigned int * 
> > > > > > {ref-all})];
> > > > > >   _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
> > > > > >
> > > > > > and then fre1 dump shows:
> > > > > > Applying pattern match.pd:8675, generic-match-5.cc:9025
> > > > > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to 
> > > > > > {
> > > > > > 0, 0, 0, 0 }
> > > > > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 
> > > > > > 0, 0, 0 }
> > > > > >
> > > > > > The issue seems to be with the following pattern:
> > > > > > (simplify
> > > > > >  (vec_perm vec_same_elem_p@0 @0 @1)
> > > > > >  @0)
> > > > > >
> > > > > > which simplifies above VEC_PERM_EXPR to:
> > > > > > _7 = {0, 0, 0, 0}
> > > > > > which is incorrect since _9 and mask have different vector lengths.
> > > > > >
> > > > > > The attached patch amends the pattern to simplify above 
> > > > > > VEC_PERM_EXPR
> > > > > > only if operand and mask have same number of elements, which seems 
> > > > > > to fix
> > > > > > the issue, and we're left with the following in .optimized dump:
> > > > > >[local count: 1073741824]:
> > > > > >   _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, 
> > > > > > ... }>;
> > > > >
> > > > > it would be nice to have this optimized.
> > > > >
> > > > > -
> > > > >  (simplify
> > > > >   (vec_perm vec_same_elem_p@0 @0 @1)
> > > > > - @0)
> > > > > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
> > > > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1
> > > > > +  @0))
> > > > >
> > > > > that looks good I think.  Maybe even better use 'type' instead of 
> > > > > TREE_TYPE (@1)
> > > > > since that's more obviously the return type in which case
> > > > >
> > > > >   (if (types_match (type, TREE_TYPE (@0))
> > > > >
> > > > > would be more to the point.
> > > > >
> > > > > But can't you to simplify this in the !known_eq case do a simple
> > > > >
> > > > >   { build_vector_from_val (type, the-element); }
> > > > >
> > > > > ?  The 'vec_same_elem_p' predicate doesn't get you at the element,
> > > > >
> > > > >  (with { tree el = uniform_vector_p (@0); }
> > > > >   (if (el)
> > > > >{ build_vector_from_val (type, el); })))
> > > > >
> > > > > would be the cheapest workaround.
> > > > Hi Richard,
> > > > Thanks for the suggestions. Using build_vector_from_val simplifies it 
> > > > to:
> > > >[local count: 1073741824]:
> > > >   return { 0, ... };
> > > >
> > > > Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on
> > > > x86_64-linux-gnu.
> > > > OK to commit ?
> > >
> > > Can you retain the case of matching type?  Like
> > >
> > >   (if (types_match (type, TREE_TYPE (@0))
> > >@0
> > >(with
> > > {
> > >tree elem = uniform_vector_p (@0);
> > > }

Re: [SVE][match.pd] Fix ICE observed in PR110280

2023-06-23 Thread Richard Biener via Gcc-patches
On Fri, Jun 23, 2023 at 11:09 AM Prathamesh Kulkarni
 wrote:
>
> On Thu, 22 Jun 2023 at 18:06, Richard Biener  
> wrote:
> >
> > On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Tue, 20 Jun 2023 at 16:47, Richard Biener  
> > > wrote:
> > > >
> > > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Hi Richard,
> > > > > For the following reduced test-case taken from PR:
> > > > >
> > > > > #include "arm_sve.h"
> > > > > svuint32_t l() {
> > > > >   alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
> > > > >   return svld1rq_u32(svptrue_b8(), lanes);
> > > > > }
> > > > >
> > > > > compiling with -O3 -mcpu=generic+sve results in following ICE:
> > > > > during GIMPLE pass: fre
> > > > > pr110280.c: In function 'l':
> > > > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> > > > > tree-ssa-sccvn.cc:6890
> > > > > 5 | }
> > > > >   | ^
> > > > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> > > > > gimple_stmt_iterator*)
> > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> > > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> > > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> > > > > 0x1aeec77 dom_walker::walk(basic_block_def*)
> > > > > ../../gcc/gcc/domwalk.cc:311
> > > > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> > > > > 0x1214664 do_rpo_vn_1
> > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> > > > > 0x1215ba5 execute
> > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702
> > > > >
> > > > > cc1 simplifies:
> > > > >   lanes[0] = 0;
> > > > >   lanes[1] = 0;
> > > > >   lanes[2] = 0;
> > > > >   lanes[3] = 0;
> > > > >   _1 = { -1, ... };
> > > > >   _7 = svld1rq_u32 (_1, );
> > > > >
> > > > > to:
> > > > >   _9 = MEM  [(unsigned int * 
> > > > > {ref-all})];
> > > > >   _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
> > > > >
> > > > > and then fre1 dump shows:
> > > > > Applying pattern match.pd:8675, generic-match-5.cc:9025
> > > > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to {
> > > > > 0, 0, 0, 0 }
> > > > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 
> > > > > 0, 0 }
> > > > >
> > > > > The issue seems to be with the following pattern:
> > > > > (simplify
> > > > >  (vec_perm vec_same_elem_p@0 @0 @1)
> > > > >  @0)
> > > > >
> > > > > which simplifies above VEC_PERM_EXPR to:
> > > > > _7 = {0, 0, 0, 0}
> > > > > which is incorrect since _9 and mask have different vector lengths.
> > > > >
> > > > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR
> > > > > only if operand and mask have same number of elements, which seems to 
> > > > > fix
> > > > > the issue, and we're left with the following in .optimized dump:
> > > > >[local count: 1073741824]:
> > > > >   _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, 
> > > > > ... }>;
> > > >
> > > > it would be nice to have this optimized.
> > > >
> > > > -
> > > >  (simplify
> > > >   (vec_perm vec_same_elem_p@0 @0 @1)
> > > > - @0)
> > > > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
> > > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1
> > > > +  @0))
> > > >
> > > > that looks good I think.  Maybe even better use 'type' instead of 
> > > > TREE_TYPE (@1)
> > > > since that's more obviously the return type in which case
> > > >
> > > >   (if (types_match (type, TREE_TYPE (@0))
> > > >
> > > > would be more to the point.
> > > >
> > > > But can't you to simplify this in the !known_eq case do a simple
> > > >
> > > >   { build_vector_from_val (type, the-element); }
> > > >
> > > > ?  The 'vec_same_elem_p' predicate doesn't get you at the element,
> > > >
> > > >  (with { tree el = uniform_vector_p (@0); }
> > > >   (if (el)
> > > >{ build_vector_from_val (type, el); })))
> > > >
> > > > would be the cheapest workaround.
> > > Hi Richard,
> > > Thanks for the suggestions. Using build_vector_from_val simplifies it to:
> > >[local count: 1073741824]:
> > >   return { 0, ... };
> > >
> > > Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on
> > > x86_64-linux-gnu.
> > > OK to commit ?
> >
> > Can you retain the case of matching type?  Like
> >
> >   (if (types_match (type, TREE_TYPE (@0))
> >@0
> >(with
> > {
> >tree elem = uniform_vector_p (@0);
> > }
> >(if (elem)
> > { build_vector_from_val (type, elem); }
> >
> > ?  Because uniform_vector_p is strictly less powerful than (vec_same_elem_p 
> > ...)
> >
> > OK with that change.
> Thanks, does the attached patch look OK ?

OK.

> Bootstrapped+tested on aarch64-linux-gnu and x86_64-linux-gnu.
>
> Thanks,
> Prathamesh
> >
> > Richard.
> >
> >
> > >
> > > 

Re: [SVE][match.pd] Fix ICE observed in PR110280

2023-06-23 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 22 Jun 2023 at 18:06, Richard Biener  wrote:
>
> On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni
>  wrote:
> >
> > On Tue, 20 Jun 2023 at 16:47, Richard Biener  
> > wrote:
> > >
> > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi Richard,
> > > > For the following reduced test-case taken from PR:
> > > >
> > > > #include "arm_sve.h"
> > > > svuint32_t l() {
> > > >   alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
> > > >   return svld1rq_u32(svptrue_b8(), lanes);
> > > > }
> > > >
> > > > compiling with -O3 -mcpu=generic+sve results in following ICE:
> > > > during GIMPLE pass: fre
> > > > pr110280.c: In function 'l':
> > > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> > > > tree-ssa-sccvn.cc:6890
> > > > 5 | }
> > > >   | ^
> > > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> > > > gimple_stmt_iterator*)
> > > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> > > > 0x1aeec77 dom_walker::walk(basic_block_def*)
> > > > ../../gcc/gcc/domwalk.cc:311
> > > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> > > > 0x1214664 do_rpo_vn_1
> > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> > > > 0x1215ba5 execute
> > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702
> > > >
> > > > cc1 simplifies:
> > > >   lanes[0] = 0;
> > > >   lanes[1] = 0;
> > > >   lanes[2] = 0;
> > > >   lanes[3] = 0;
> > > >   _1 = { -1, ... };
> > > >   _7 = svld1rq_u32 (_1, );
> > > >
> > > > to:
> > > >   _9 = MEM  [(unsigned int * {ref-all})];
> > > >   _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
> > > >
> > > > and then fre1 dump shows:
> > > > Applying pattern match.pd:8675, generic-match-5.cc:9025
> > > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to {
> > > > 0, 0, 0, 0 }
> > > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 
> > > > 0, 0 }
> > > >
> > > > The issue seems to be with the following pattern:
> > > > (simplify
> > > >  (vec_perm vec_same_elem_p@0 @0 @1)
> > > >  @0)
> > > >
> > > > which simplifies above VEC_PERM_EXPR to:
> > > > _7 = {0, 0, 0, 0}
> > > > which is incorrect since _9 and mask have different vector lengths.
> > > >
> > > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR
> > > > only if operand and mask have same number of elements, which seems to 
> > > > fix
> > > > the issue, and we're left with the following in .optimized dump:
> > > >[local count: 1073741824]:
> > > >   _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... 
> > > > }>;
> > >
> > > it would be nice to have this optimized.
> > >
> > > -
> > >  (simplify
> > >   (vec_perm vec_same_elem_p@0 @0 @1)
> > > - @0)
> > > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
> > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1
> > > +  @0))
> > >
> > > that looks good I think.  Maybe even better use 'type' instead of 
> > > TREE_TYPE (@1)
> > > since that's more obviously the return type in which case
> > >
> > >   (if (types_match (type, TREE_TYPE (@0))
> > >
> > > would be more to the point.
> > >
> > > But can't you to simplify this in the !known_eq case do a simple
> > >
> > >   { build_vector_from_val (type, the-element); }
> > >
> > > ?  The 'vec_same_elem_p' predicate doesn't get you at the element,
> > >
> > >  (with { tree el = uniform_vector_p (@0); }
> > >   (if (el)
> > >{ build_vector_from_val (type, el); })))
> > >
> > > would be the cheapest workaround.
> > Hi Richard,
> > Thanks for the suggestions. Using build_vector_from_val simplifies it to:
> >[local count: 1073741824]:
> >   return { 0, ... };
> >
> > Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on
> > x86_64-linux-gnu.
> > OK to commit ?
>
> Can you retain the case of matching type?  Like
>
>   (if (types_match (type, TREE_TYPE (@0))
>@0
>(with
> {
>tree elem = uniform_vector_p (@0);
> }
>(if (elem)
> { build_vector_from_val (type, elem); }
>
> ?  Because uniform_vector_p is strictly less powerful than (vec_same_elem_p 
> ...)
>
> OK with that change.
Thanks, does the attached patch look OK ?
Bootstrapped+tested on aarch64-linux-gnu and x86_64-linux-gnu.

Thanks,
Prathamesh
>
> Richard.
>
>
> >
> > Thanks,
> > Prathamesh
> > >
> > > >   return _2;
> > > >
> > > > code-gen:
> > > > l:
> > > > mov z0.b, #0
> > > > ret
> > > >
> > > > Patch is bootstrapped+tested on aarch64-linux-gnu.
> > > > OK to commit ?
> > > >
> > > > Thanks,
> > > > Prathamesh
[aarch64/match.pd] Fix ICE observed in PR110280.

gcc/ChangeLog:
PR tree-optimization/110280
 

Re: [SVE][match.pd] Fix ICE observed in PR110280

2023-06-22 Thread Richard Biener via Gcc-patches
On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni
 wrote:
>
> On Tue, 20 Jun 2023 at 16:47, Richard Biener  
> wrote:
> >
> > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
> >  wrote:
> > >
> > > Hi Richard,
> > > For the following reduced test-case taken from PR:
> > >
> > > #include "arm_sve.h"
> > > svuint32_t l() {
> > >   alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
> > >   return svld1rq_u32(svptrue_b8(), lanes);
> > > }
> > >
> > > compiling with -O3 -mcpu=generic+sve results in following ICE:
> > > during GIMPLE pass: fre
> > > pr110280.c: In function 'l':
> > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> > > tree-ssa-sccvn.cc:6890
> > > 5 | }
> > >   | ^
> > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> > > gimple_stmt_iterator*)
> > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> > > 0x1aeec77 dom_walker::walk(basic_block_def*)
> > > ../../gcc/gcc/domwalk.cc:311
> > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> > > 0x1214664 do_rpo_vn_1
> > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> > > 0x1215ba5 execute
> > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702
> > >
> > > cc1 simplifies:
> > >   lanes[0] = 0;
> > >   lanes[1] = 0;
> > >   lanes[2] = 0;
> > >   lanes[3] = 0;
> > >   _1 = { -1, ... };
> > >   _7 = svld1rq_u32 (_1, );
> > >
> > > to:
> > >   _9 = MEM  [(unsigned int * {ref-all})];
> > >   _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
> > >
> > > and then fre1 dump shows:
> > > Applying pattern match.pd:8675, generic-match-5.cc:9025
> > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to {
> > > 0, 0, 0, 0 }
> > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 
> > > 0 }
> > >
> > > The issue seems to be with the following pattern:
> > > (simplify
> > >  (vec_perm vec_same_elem_p@0 @0 @1)
> > >  @0)
> > >
> > > which simplifies above VEC_PERM_EXPR to:
> > > _7 = {0, 0, 0, 0}
> > > which is incorrect since _9 and mask have different vector lengths.
> > >
> > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR
> > > only if operand and mask have same number of elements, which seems to fix
> > > the issue, and we're left with the following in .optimized dump:
> > >[local count: 1073741824]:
> > >   _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... 
> > > }>;
> >
> > it would be nice to have this optimized.
> >
> > -
> >  (simplify
> >   (vec_perm vec_same_elem_p@0 @0 @1)
> > - @0)
> > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
> > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1
> > +  @0))
> >
> > that looks good I think.  Maybe even better use 'type' instead of TREE_TYPE 
> > (@1)
> > since that's more obviously the return type in which case
> >
> >   (if (types_match (type, TREE_TYPE (@0))
> >
> > would be more to the point.
> >
> > But can't you to simplify this in the !known_eq case do a simple
> >
> >   { build_vector_from_val (type, the-element); }
> >
> > ?  The 'vec_same_elem_p' predicate doesn't get you at the element,
> >
> >  (with { tree el = uniform_vector_p (@0); }
> >   (if (el)
> >{ build_vector_from_val (type, el); })))
> >
> > would be the cheapest workaround.
> Hi Richard,
> Thanks for the suggestions. Using build_vector_from_val simplifies it to:
>[local count: 1073741824]:
>   return { 0, ... };
>
> Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on
> x86_64-linux-gnu.
> OK to commit ?

Can you retain the case of matching type?  Like

  (if (types_match (type, TREE_TYPE (@0))
   @0
   (with
{
   tree elem = uniform_vector_p (@0);
}
   (if (elem)
{ build_vector_from_val (type, elem); }

?  Because uniform_vector_p is strictly less powerful than (vec_same_elem_p ...)

OK with that change.

Richard.


>
> Thanks,
> Prathamesh
> >
> > >   return _2;
> > >
> > > code-gen:
> > > l:
> > > mov z0.b, #0
> > > ret
> > >
> > > Patch is bootstrapped+tested on aarch64-linux-gnu.
> > > OK to commit ?
> > >
> > > Thanks,
> > > Prathamesh


Re: [SVE][match.pd] Fix ICE observed in PR110280

2023-06-22 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 20 Jun 2023 at 16:47, Richard Biener  wrote:
>
> On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > Hi Richard,
> > For the following reduced test-case taken from PR:
> >
> > #include "arm_sve.h"
> > svuint32_t l() {
> >   alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
> >   return svld1rq_u32(svptrue_b8(), lanes);
> > }
> >
> > compiling with -O3 -mcpu=generic+sve results in following ICE:
> > during GIMPLE pass: fre
> > pr110280.c: In function 'l':
> > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> > tree-ssa-sccvn.cc:6890
> > 5 | }
> >   | ^
> > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> > gimple_stmt_iterator*)
> > ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> > 0x1aeec77 dom_walker::walk(basic_block_def*)
> > ../../gcc/gcc/domwalk.cc:311
> > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> > ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> > 0x1214664 do_rpo_vn_1
> > ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> > 0x1215ba5 execute
> > ../../gcc/gcc/tree-ssa-sccvn.cc:8702
> >
> > cc1 simplifies:
> >   lanes[0] = 0;
> >   lanes[1] = 0;
> >   lanes[2] = 0;
> >   lanes[3] = 0;
> >   _1 = { -1, ... };
> >   _7 = svld1rq_u32 (_1, );
> >
> > to:
> >   _9 = MEM  [(unsigned int * {ref-all})];
> >   _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
> >
> > and then fre1 dump shows:
> > Applying pattern match.pd:8675, generic-match-5.cc:9025
> > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to {
> > 0, 0, 0, 0 }
> > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 }
> >
> > The issue seems to be with the following pattern:
> > (simplify
> >  (vec_perm vec_same_elem_p@0 @0 @1)
> >  @0)
> >
> > which simplifies above VEC_PERM_EXPR to:
> > _7 = {0, 0, 0, 0}
> > which is incorrect since _9 and mask have different vector lengths.
> >
> > The attached patch amends the pattern to simplify above VEC_PERM_EXPR
> > only if operand and mask have same number of elements, which seems to fix
> > the issue, and we're left with the following in .optimized dump:
> >[local count: 1073741824]:
> >   _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>;
>
> it would be nice to have this optimized.
>
> -
>  (simplify
>   (vec_perm vec_same_elem_p@0 @0 @1)
> - @0)
> + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
> +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1
> +  @0))
>
> that looks good I think.  Maybe even better use 'type' instead of TREE_TYPE 
> (@1)
> since that's more obviously the return type in which case
>
>   (if (types_match (type, TREE_TYPE (@0))
>
> would be more to the point.
>
> But can't you to simplify this in the !known_eq case do a simple
>
>   { build_vector_from_val (type, the-element); }
>
> ?  The 'vec_same_elem_p' predicate doesn't get you at the element,
>
>  (with { tree el = uniform_vector_p (@0); }
>   (if (el)
>{ build_vector_from_val (type, el); })))
>
> would be the cheapest workaround.
Hi Richard,
Thanks for the suggestions. Using build_vector_from_val simplifies it to:
   [local count: 1073741824]:
  return { 0, ... };

Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on
x86_64-linux-gnu.
OK to commit ?

Thanks,
Prathamesh
>
> >   return _2;
> >
> > code-gen:
> > l:
> > mov z0.b, #0
> > ret
> >
> > Patch is bootstrapped+tested on aarch64-linux-gnu.
> > OK to commit ?
> >
> > Thanks,
> > Prathamesh
[aarch64/match.pd] Fix ICE observed in PR110280.

gcc/ChangeLog:
PR tree-optimization/110280
* match.pd (vec_perm_expr(v, v, mask) -> v): Explicitly build vector
using build_vector_from_val with the element of input operand, and
mask's type.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pr110280.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 2dd23826034..76a37297d3c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8672,7 +8672,12 @@ and,
 
 (simplify
  (vec_perm vec_same_elem_p@0 @0 @1)
- @0)
+ (with
+  {
+tree elem = uniform_vector_p (@0);
+  }
+  (if (elem)
+   { build_vector_from_val (type, elem); })))
 
 /* Push VEC_PERM earlier if that may help FMA perception (PR101895).  */
 (simplify
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c
new file mode 100644
index 000..d3279f38362
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+#include "arm_sve.h"
+
+svuint32_t l()
+{
+  _Alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
+  return svld1rq_u32(svptrue_b8(), lanes);
+}
+
+/* 

Re: [SVE][match.pd] Fix ICE observed in PR110280

2023-06-20 Thread Richard Biener via Gcc-patches
On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
 wrote:
>
> Hi Richard,
> For the following reduced test-case taken from PR:
>
> #include "arm_sve.h"
> svuint32_t l() {
>   alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
>   return svld1rq_u32(svptrue_b8(), lanes);
> }
>
> compiling with -O3 -mcpu=generic+sve results in following ICE:
> during GIMPLE pass: fre
> pr110280.c: In function 'l':
> pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> tree-ssa-sccvn.cc:6890
> 5 | }
>   | ^
> 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> gimple_stmt_iterator*)
> ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> 0x1aeec77 dom_walker::walk(basic_block_def*)
> ../../gcc/gcc/domwalk.cc:311
> 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> 0x1214664 do_rpo_vn_1
> ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> 0x1215ba5 execute
> ../../gcc/gcc/tree-ssa-sccvn.cc:8702
>
> cc1 simplifies:
>   lanes[0] = 0;
>   lanes[1] = 0;
>   lanes[2] = 0;
>   lanes[3] = 0;
>   _1 = { -1, ... };
>   _7 = svld1rq_u32 (_1, );
>
> to:
>   _9 = MEM  [(unsigned int * {ref-all})];
>   _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
>
> and then fre1 dump shows:
> Applying pattern match.pd:8675, generic-match-5.cc:9025
> Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to {
> 0, 0, 0, 0 }
> RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 }
>
> The issue seems to be with the following pattern:
> (simplify
>  (vec_perm vec_same_elem_p@0 @0 @1)
>  @0)
>
> which simplifies above VEC_PERM_EXPR to:
> _7 = {0, 0, 0, 0}
> which is incorrect since _9 and mask have different vector lengths.
>
> The attached patch amends the pattern to simplify above VEC_PERM_EXPR
> only if operand and mask have same number of elements, which seems to fix
> the issue, and we're left with the following in .optimized dump:
>[local count: 1073741824]:
>   _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>;

it would be nice to have this optimized.

-
 (simplify
  (vec_perm vec_same_elem_p@0 @0 @1)
- @0)
+ (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
+   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1
+  @0))

that looks good I think.  Maybe even better use 'type' instead of TREE_TYPE (@1)
since that's more obviously the return type in which case

  (if (types_match (type, TREE_TYPE (@0))

would be more to the point.

But can't you to simplify this in the !known_eq case do a simple

  { build_vector_from_val (type, the-element); }

?  The 'vec_same_elem_p' predicate doesn't get you at the element,

 (with { tree el = uniform_vector_p (@0); }
  (if (el)
   { build_vector_from_val (type, el); })))

would be the cheapest workaround.

>   return _2;
>
> code-gen:
> l:
> mov z0.b, #0
> ret
>
> Patch is bootstrapped+tested on aarch64-linux-gnu.
> OK to commit ?
>
> Thanks,
> Prathamesh


[SVE][match.pd] Fix ICE observed in PR110280

2023-06-20 Thread Prathamesh Kulkarni via Gcc-patches
Hi Richard,
For the following reduced test-case taken from PR:

#include "arm_sve.h"
svuint32_t l() {
  alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
  return svld1rq_u32(svptrue_b8(), lanes);
}

compiling with -O3 -mcpu=generic+sve results in following ICE:
during GIMPLE pass: fre
pr110280.c: In function 'l':
pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
tree-ssa-sccvn.cc:6890
5 | }
  | ^
0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
gimple_stmt_iterator*)
../../gcc/gcc/tree-ssa-sccvn.cc:6890
0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
../../gcc/gcc/tree-ssa-sccvn.cc:7324
0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
../../gcc/gcc/tree-ssa-sccvn.cc:7257
0x1aeec77 dom_walker::walk(basic_block_def*)
../../gcc/gcc/domwalk.cc:311
0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
../../gcc/gcc/tree-ssa-sccvn.cc:7504
0x1214664 do_rpo_vn_1
../../gcc/gcc/tree-ssa-sccvn.cc:8616
0x1215ba5 execute
../../gcc/gcc/tree-ssa-sccvn.cc:8702

cc1 simplifies:
  lanes[0] = 0;
  lanes[1] = 0;
  lanes[2] = 0;
  lanes[3] = 0;
  _1 = { -1, ... };
  _7 = svld1rq_u32 (_1, );

to:
  _9 = MEM  [(unsigned int * {ref-all})];
  _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;

and then fre1 dump shows:
Applying pattern match.pd:8675, generic-match-5.cc:9025
Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to {
0, 0, 0, 0 }
RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 }

The issue seems to be with the following pattern:
(simplify
 (vec_perm vec_same_elem_p@0 @0 @1)
 @0)

which simplifies above VEC_PERM_EXPR to:
_7 = {0, 0, 0, 0}
which is incorrect since _9 and mask have different vector lengths.

The attached patch amends the pattern to simplify above VEC_PERM_EXPR
only if operand and mask have same number of elements, which seems to fix
the issue, and we're left with the following in .optimized dump:
   [local count: 1073741824]:
  _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>;
  return _2;

code-gen:
l:
mov z0.b, #0
ret

Patch is bootstrapped+tested on aarch64-linux-gnu.
OK to commit ?

Thanks,
Prathamesh
[SVE][match.pd] Fix ICE observed in PR110280.

gcc/ChangeLog:
PR tree-optimization/110280
* match.pd (vec_perm_expr(v, v, mask) -> v): Simplify the pattern
only if operand and mask of VEC_PERM_EXPR have same number of
elements.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pr110280.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 2dd23826034..0eb5f8f0af6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8669,10 +8669,11 @@ and,
  @0
  (if (uniform_vector_p (@0
 
-
 (simplify
  (vec_perm vec_same_elem_p@0 @0 @1)
- @0)
+ (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
+   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1
+  @0))
 
 /* Push VEC_PERM earlier if that may help FMA perception (PR101895).  */
 (simplify
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c
new file mode 100644
index 000..453c9cbcf9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+#include "arm_sve.h"
+
+svuint32_t l()
+{
+  _Alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
+  return svld1rq_u32(svptrue_b8(), lanes);
+}