Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-07-07 Thread Richard Sandiford
"Yangfei (Felix)"  writes:
>> > +nscalars = (STMT_SLP_TYPE (stmt_info)
>> > +  ? vf * DR_GROUP_SIZE (stmt_info) : vf);
>> 
>> …the indentation on this line.  Hope you don't mind, but I also “reflowed”
>> the commit message to make it fit within 72 chars.
>> (The text itself is the same.)
>
> It's OK.  :-)
> BTW: Is this the rule for gcc git commit msg format? 72 chars instead of 80 
> chars?

Well, it was over 80 chars too, which is why I noticed :-)
But I think the idea is that it shouldn't wrap when you use “git log”
on an 80-character terminal, and “git log” indents the messages.
72 is probably overboard though.

Thanks,
Richard



RE: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-07-02 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Thursday, July 2, 2020 5:17 PM
> To: Yangfei (Felix) 
> Cc: Richard Biener ; Richard Biener
> ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182
> 

Cut...

> 
> Thanks, pushed to master with a minor whitespace fix for…

Thanks for finding it.

> > + nscalars = (STMT_SLP_TYPE (stmt_info)
> > +   ? vf * DR_GROUP_SIZE (stmt_info) : vf);
> 
> …the indentation on this line.  Hope you don't mind, but I also “reflowed”
> the commit message to make it fit within 72 chars.
> (The text itself is the same.)

It's OK.  :-)
BTW: Is this the rule for gcc git commit msg format? 72 chars instead of 80 
chars?

Felix


Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-07-02 Thread Richard Sandiford
"Yangfei (Felix)"  writes:
> Hi,
>
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Wednesday, July 1, 2020 9:03 PM
>> To: Yangfei (Felix) 
>> Cc: Richard Biener ; Richard Biener
>> ; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182
>> 
>> "Yangfei (Felix)"  writes:
>> >> On June 30, 2020 4:23:03 PM GMT+02:00, Richard Sandiford
>> >>  wrote:
>> >> >Richard Biener  writes:
>> >> >> So it seems odd to somehow put in the number of vectors...  so to
>> >> >> me it would have made sense if it did
>> >> >>
>> >> >>   possible_npeel_number = lower_bound (nscalars);
>> >> >>
>> >> >> or whateveris necessary to make the polys happy.  Thus simply
>> >> >> elide the vect_get_num_vectors call?  But it's been very long
>> >> >> since I've dived into the alignment peeling code...
>> >> >
>> >> >Ah, I see what you mean.  So rather than:
>> >> >
>> >> >   /* Save info about DR in the hash table.  Also include peeling
>> >> >  amounts according to the explanation above.  */
>> >> >  for (j = 0; j < possible_npeel_number; j++)
>> >> >{
>> >> >  vect_peeling_hash_insert (_htab, loop_vinfo,
>> >> > dr_info, npeel_tmp);
>> >> >   npeel_tmp += target_align / dr_size;
>> >> >}
>> >> >
>> >> >just have something like:
>> >> >
>> >> >   while (known_le (npeel_tmp, nscalars))
>> >> > {
>> >> >   …
>> >> > }
>> >> >
>> >> >?
>> >>
>> >> Yeah.
>> >
>> > Not sure if I understand correctly.  I am supposing the following check in
>> the original code is not necessary if we go like that.
>> >
>> > 1822   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
>> >
>> > Is that correct?
>> 
>> I think we still need it.  I guess there are two choices:
>> 
>> - make nscalars default to npeel_tmp before the “if” above.
>
> I think this will be simpler.  How about the v2 patch?
> Bootstrapped and tested on aarch64-linux-gnu & x86_64-linux-gnu.
>
> Thanks,
> Felix
>
> From 69efce90fa746a2a86f8d673335b874c7df34d9f Mon Sep 17 00:00:00 2001
> From: Fei Yang 
> Date: Thu, 2 Jul 2020 14:39:23 +0800
> Subject: [PATCH] vect: Fix an ICE in exact_div [PR95961]
>
> In the test case for PR95961, vectorization factor computed by
> vect_determine_vectorization_factor is [8,8].  But this is updated to [1,1]
> later by vect_update_vf_for_slp.  When we call vect_get_num_vectors in
> vect_enhance_data_refs_alignment, the number of scalars which is based on
> the vectorization factor is not a multiple of the the number of elements in
> the vector type.  This leads to the ICE.  This isn't a simple stream of
> contiguous vector accesses.  It's hard to predict from the available 
> information
> how many vector accesses we'll actually need per iteration.  As discussed, 
> here
> we should use the number of scalars instead of the number of vectors as an 
> upper
> bound for the loop saving info about DR in the hash table.
>
> 2020-07-02  Felix Yang  
>
> gcc/
>   PR tree-optimization/95961
>   * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use the
>   number of scalars instead of the number of vectors as an upper bound
>   for the loop saving info about DR in the hash table.  Remove unused
>   local variables.
>
> gcc/testsuite/
>
>   PR tree-optimization/95961
>   * gcc.target/aarch64/sve/pr95961.c: New test.

Thanks, pushed to master with a minor whitespace fix for…

> +   nscalars = (STMT_SLP_TYPE (stmt_info)
> + ? vf * DR_GROUP_SIZE (stmt_info) : vf);

…the indentation on this line.  Hope you don't mind, but I also
“reflowed” the commit message to make it fit within 72 chars.
(The text itself is the same.)

Richard


RE: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-07-02 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Wednesday, July 1, 2020 9:03 PM
> To: Yangfei (Felix) 
> Cc: Richard Biener ; Richard Biener
> ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182
> 
> "Yangfei (Felix)"  writes:
> >> On June 30, 2020 4:23:03 PM GMT+02:00, Richard Sandiford
> >>  wrote:
> >> >Richard Biener  writes:
> >> >> So it seems odd to somehow put in the number of vectors...  so to
> >> >> me it would have made sense if it did
> >> >>
> >> >>   possible_npeel_number = lower_bound (nscalars);
> >> >>
> >> >> or whateveris necessary to make the polys happy.  Thus simply
> >> >> elide the vect_get_num_vectors call?  But it's been very long
> >> >> since I've dived into the alignment peeling code...
> >> >
> >> >Ah, I see what you mean.  So rather than:
> >> >
> >> >/* Save info about DR in the hash table.  Also include peeling
> >> >   amounts according to the explanation above.  */
> >> >  for (j = 0; j < possible_npeel_number; j++)
> >> >{
> >> >  vect_peeling_hash_insert (_htab, loop_vinfo,
> >> >  dr_info, npeel_tmp);
> >> >npeel_tmp += target_align / dr_size;
> >> >}
> >> >
> >> >just have something like:
> >> >
> >> >while (known_le (npeel_tmp, nscalars))
> >> >  {
> >> >…
> >> >  }
> >> >
> >> >?
> >>
> >> Yeah.
> >
> > Not sure if I understand correctly.  I am supposing the following check in
> the original code is not necessary if we go like that.
> >
> > 1822   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
> >
> > Is that correct?
> 
> I think we still need it.  I guess there are two choices:
> 
> - make nscalars default to npeel_tmp before the “if” above.

I think this will be simpler.  How about the v2 patch?
Bootstrapped and tested on aarch64-linux-gnu & x86_64-linux-gnu.

Thanks,
Felix


pr95961-v2.diff
Description: pr95961-v2.diff


Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-07-01 Thread Richard Sandiford
"Yangfei (Felix)"  writes:
>> On June 30, 2020 4:23:03 PM GMT+02:00, Richard Sandiford
>>  wrote:
>> >Richard Biener  writes:
>> >> So it seems odd to somehow put in the number of vectors...  so to me
>> >> it would have made sense if it did
>> >>
>> >>   possible_npeel_number = lower_bound (nscalars);
>> >>
>> >> or whateveris necessary to make the polys happy.  Thus simply elide
>> >> the vect_get_num_vectors call?  But it's been very long since I've
>> >> dived into the alignment peeling code...
>> >
>> >Ah, I see what you mean.  So rather than:
>> >
>> >  /* Save info about DR in the hash table.  Also include peeling
>> > amounts according to the explanation above.  */
>> >  for (j = 0; j < possible_npeel_number; j++)
>> >{
>> >  vect_peeling_hash_insert (_htab, loop_vinfo,
>> >dr_info, npeel_tmp);
>> >  npeel_tmp += target_align / dr_size;
>> >}
>> >
>> >just have something like:
>> >
>> >  while (known_le (npeel_tmp, nscalars))
>> >{
>> >  …
>> >}
>> >
>> >?
>> 
>> Yeah.
>
> Not sure if I understand correctly.  I am supposing the following check in 
> the original code is not necessary if we go like that.
>
> 1822   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
>
> Is that correct?

I think we still need it.  I guess there are two choices:

- make nscalars default to npeel_tmp before the “if” above.
- put the loop inside the “if” and add a single call to
  vect_peeling_hash_insert as a new “else”.

Thanks,
Richard


RE: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-07-01 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: Tuesday, June 30, 2020 10:50 PM
> To: Richard Sandiford 
> Cc: Richard Biener ; Yangfei (Felix)
> ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182
> 
> On June 30, 2020 4:23:03 PM GMT+02:00, Richard Sandiford
>  wrote:
> >Richard Biener  writes:
> >> On Tue, 30 Jun 2020, Richard Sandiford wrote:
> >>
> >>> Richard Biener  writes:
> >>> > On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford
> >>> >  wrote:
> >>> >>
> >>> >> "Yangfei (Felix)"  writes:
> >>> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >>> >> > index e050db1a2e4..ea39fcac0e0 100644
> >>> >> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >>> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >>> >> > @@ -1,6 +1,7 @@
> >>> >> >  /* { dg-do compile } */
> >>> >> >  /* { dg-additional-options "-O3" } */
> >>> >> >  /* { dg-additional-options "-mavx2" { target { i?86-*-*
> >x86_64-*-* } } } */
> >>> >> > +/* { dg-additional-options "-march=armv8.2-a+sve
> >-fno-vect-cost-model" { target aarch64*-*-* } } */
> >>> >> >
> >>> >> >  typedef struct {
> >>> >> >  unsigned short mprr_2[5][16][16];
> >>> >>
> >>> >> This test is useful for Advanced SIMD too, so I think we should
> >continue
> >>> >> to test it with whatever options the person running the testsuite
> >chose.
> >>> >> Instead we could duplicate the test in gcc.target/aarch64/sve
> >with
> >>> >> appropriate options.
> >>> >>
> >>> >> > diff --git a/gcc/tree-vect-data-refs.c
> >b/gcc/tree-vect-data-refs.c
> >>> >> > index eb8288e7a85..b30a7d8a3bb 100644
> >>> >> > --- a/gcc/tree-vect-data-refs.c
> >>> >> > +++ b/gcc/tree-vect-data-refs.c
> >>> >> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment
> >(loop_vec_info loop_vinfo)
> >>> >> >   {
> >>> >> > poly_uint64 nscalars = (STMT_SLP_TYPE
> >(stmt_info)
> >>> >> > ? vf * DR_GROUP_SIZE
> >(stmt_info) : vf);
> >>> >> > -   possible_npeel_number
> >>> >> > - = vect_get_num_vectors (nscalars, vectype);
> >>> >> > +   if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS
> >(vectype)))
> >>> >> > + possible_npeel_number = 0;
> >>> >> > +   else
> >>> >> > + possible_npeel_number
> >>> >> > +   = vect_get_num_vectors (nscalars, vectype);
> >>> >> >
> >>> >> > /* NPEEL_TMP is 0 when there is no
> >misalignment, but also
> >>> >> >allow peeling NELEMENTS.  */
> >>> >>
> >>> >> OK, so this is coming from:
> >>> >>
> >>> >>   int s[16][2];
> >>> >>   …
> >>> >>   … =s[j][1];
> >>> >>
> >>> >> and an SLP node containing 16 instances of “s[j][1]”.  The
> >DR_GROUP_SIZE
> >>> >> is 2 because that's the inner dimension of “s”.
> >>> >>
> >>> >> I don't think maybe_lt is right here though.  The same problem
> >could in
> >>> >> principle happen for cases in which NSCALARS >
> >TYPE_VECTOR_SUBPARTS,
> >>> >> e.g. for different inner dimensions of “s”.
> >>> >>
> >>> >> I think the testcase shows that using DR_GROUP_SIZE in this
> >calculation
> >>> >> is flawed.  I'm not sure whether we can really do better given
> >the current
> >>> >> representation though.  This is one case where having a separate
> >dr_vec_info
> >>> >> per SLP node would help.
> >>> >
> >>> > I guess what the code likes to know is what we now have in
> >SLP_TREE_LANES
> >>> > (or formerly group_size) but t

Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-06-30 Thread Richard Biener
On June 30, 2020 4:23:03 PM GMT+02:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> On Tue, 30 Jun 2020, Richard Sandiford wrote:
>>
>>> Richard Biener  writes:
>>> > On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford
>>> >  wrote:
>>> >>
>>> >> "Yangfei (Felix)"  writes:
>>> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>>> >> > index e050db1a2e4..ea39fcac0e0 100644
>>> >> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>>> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>>> >> > @@ -1,6 +1,7 @@
>>> >> >  /* { dg-do compile } */
>>> >> >  /* { dg-additional-options "-O3" } */
>>> >> >  /* { dg-additional-options "-mavx2" { target { i?86-*-*
>x86_64-*-* } } } */
>>> >> > +/* { dg-additional-options "-march=armv8.2-a+sve
>-fno-vect-cost-model" { target aarch64*-*-* } } */
>>> >> >
>>> >> >  typedef struct {
>>> >> >  unsigned short mprr_2[5][16][16];
>>> >>
>>> >> This test is useful for Advanced SIMD too, so I think we should
>continue
>>> >> to test it with whatever options the person running the testsuite
>chose.
>>> >> Instead we could duplicate the test in gcc.target/aarch64/sve
>with
>>> >> appropriate options.
>>> >>
>>> >> > diff --git a/gcc/tree-vect-data-refs.c
>b/gcc/tree-vect-data-refs.c
>>> >> > index eb8288e7a85..b30a7d8a3bb 100644
>>> >> > --- a/gcc/tree-vect-data-refs.c
>>> >> > +++ b/gcc/tree-vect-data-refs.c
>>> >> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment
>(loop_vec_info loop_vinfo)
>>> >> >   {
>>> >> > poly_uint64 nscalars = (STMT_SLP_TYPE
>(stmt_info)
>>> >> > ? vf * DR_GROUP_SIZE
>(stmt_info) : vf);
>>> >> > -   possible_npeel_number
>>> >> > - = vect_get_num_vectors (nscalars, vectype);
>>> >> > +   if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS
>(vectype)))
>>> >> > + possible_npeel_number = 0;
>>> >> > +   else
>>> >> > + possible_npeel_number
>>> >> > +   = vect_get_num_vectors (nscalars, vectype);
>>> >> >
>>> >> > /* NPEEL_TMP is 0 when there is no
>misalignment, but also
>>> >> >allow peeling NELEMENTS.  */
>>> >>
>>> >> OK, so this is coming from:
>>> >>
>>> >>   int s[16][2];
>>> >>   …
>>> >>   … =s[j][1];
>>> >>
>>> >> and an SLP node containing 16 instances of “s[j][1]”.  The
>DR_GROUP_SIZE
>>> >> is 2 because that's the inner dimension of “s”.
>>> >>
>>> >> I don't think maybe_lt is right here though.  The same problem
>could in
>>> >> principle happen for cases in which NSCALARS >
>TYPE_VECTOR_SUBPARTS,
>>> >> e.g. for different inner dimensions of “s”.
>>> >>
>>> >> I think the testcase shows that using DR_GROUP_SIZE in this
>calculation
>>> >> is flawed.  I'm not sure whether we can really do better given
>the current
>>> >> representation though.  This is one case where having a separate
>dr_vec_info
>>> >> per SLP node would help.
>>> >
>>> > I guess what the code likes to know is what we now have in
>SLP_TREE_LANES
>>> > (or formerly group_size) but that's not necessarily connected to
>DR_GROUP_SIZE.
>>> > Given we only see a stmt_info here and there's no 1:1 mapping of
>SLP node
>>> > to stmt_info (and the reverse mapping doesn't even exist) I do not
>have
>>> > any good idea either.
>>> >
>>> > Honestly I don't really see what this code tries to do ... doesn't
>it
>>> > simply want to set possible_npeel_number to TYPE_VECTOR_SUBPARTS
>(vectype)?!
>>> 
>>> I think it's trying to set possible_npeel_number to the number of
>>> vector stmts per iteration (i.e. ncopies for non-SLP stuff):
>>> 
>>>   /* For multiple types, it is possible that the bigger
>type access
>>>  will have more than one peeling option.  E.g., a
>loop with two
>>>  types: one of size (vector size / 4), and the other
>one of
>>>  size (vector size / 8).  Vectorization factor will
>8.  If both
>>>  accesses are misaligned by 3, the first one needs
>one scalar
>>>  iteration to be aligned, and the second one needs
>5.  But the
>>>  first one will be aligned also by peeling 5 scalar
>>>  iterations, and in that case both accesses will be
>aligned.
>>>  Hence, except for the immediate peeling amount, we
>also want
>>>  to try to add full vector size, while we don't
>exceed
>>>  vectorization factor.
>>>  We do this automatically for cost model, since we
>calculate
>>>  cost for every peeling option.  */
>>> 
>>> So for this example, possible_npeel_number==2 for the first access
>>> (letting us try two peeling values for it) and
>possible_npeel_number==1
>>> for the second.
>>
>> Yeah, but npeel is _scalar_ lanes, since we always peel scalar
>iterations.
>> So it seems odd to somehow put in the number of vectors...  

Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-06-30 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, 30 Jun 2020, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> "Yangfei (Felix)"  writes:
>> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c 
>> >> > b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>> >> > index e050db1a2e4..ea39fcac0e0 100644
>> >> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>> >> > @@ -1,6 +1,7 @@
>> >> >  /* { dg-do compile } */
>> >> >  /* { dg-additional-options "-O3" } */
>> >> >  /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } 
>> >> > } */
>> >> > +/* { dg-additional-options "-march=armv8.2-a+sve -fno-vect-cost-model" 
>> >> > { target aarch64*-*-* } } */
>> >> >
>> >> >  typedef struct {
>> >> >  unsigned short mprr_2[5][16][16];
>> >>
>> >> This test is useful for Advanced SIMD too, so I think we should continue
>> >> to test it with whatever options the person running the testsuite chose.
>> >> Instead we could duplicate the test in gcc.target/aarch64/sve with
>> >> appropriate options.
>> >>
>> >> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> >> > index eb8288e7a85..b30a7d8a3bb 100644
>> >> > --- a/gcc/tree-vect-data-refs.c
>> >> > +++ b/gcc/tree-vect-data-refs.c
>> >> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
>> >> > loop_vinfo)
>> >> >   {
>> >> > poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
>> >> > ? vf * DR_GROUP_SIZE 
>> >> > (stmt_info) : vf);
>> >> > -   possible_npeel_number
>> >> > - = vect_get_num_vectors (nscalars, vectype);
>> >> > +   if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS (vectype)))
>> >> > + possible_npeel_number = 0;
>> >> > +   else
>> >> > + possible_npeel_number
>> >> > +   = vect_get_num_vectors (nscalars, vectype);
>> >> >
>> >> > /* NPEEL_TMP is 0 when there is no misalignment, but 
>> >> > also
>> >> >allow peeling NELEMENTS.  */
>> >>
>> >> OK, so this is coming from:
>> >>
>> >>   int s[16][2];
>> >>   …
>> >>   … =s[j][1];
>> >>
>> >> and an SLP node containing 16 instances of “s[j][1]”.  The DR_GROUP_SIZE
>> >> is 2 because that's the inner dimension of “s”.
>> >>
>> >> I don't think maybe_lt is right here though.  The same problem could in
>> >> principle happen for cases in which NSCALARS > TYPE_VECTOR_SUBPARTS,
>> >> e.g. for different inner dimensions of “s”.
>> >>
>> >> I think the testcase shows that using DR_GROUP_SIZE in this calculation
>> >> is flawed.  I'm not sure whether we can really do better given the current
>> >> representation though.  This is one case where having a separate 
>> >> dr_vec_info
>> >> per SLP node would help.
>> >
>> > I guess what the code likes to know is what we now have in SLP_TREE_LANES
>> > (or formerly group_size) but that's not necessarily connected to 
>> > DR_GROUP_SIZE.
>> > Given we only see a stmt_info here and there's no 1:1 mapping of SLP node
>> > to stmt_info (and the reverse mapping doesn't even exist) I do not have
>> > any good idea either.
>> >
>> > Honestly I don't really see what this code tries to do ... doesn't it
>> > simply want to set possible_npeel_number to TYPE_VECTOR_SUBPARTS 
>> > (vectype)?!
>> 
>> I think it's trying to set possible_npeel_number to the number of
>> vector stmts per iteration (i.e. ncopies for non-SLP stuff):
>> 
>>   /* For multiple types, it is possible that the bigger type 
>> access
>>  will have more than one peeling option.  E.g., a loop with 
>> two
>>  types: one of size (vector size / 4), and the other one of
>>  size (vector size / 8).  Vectorization factor will 8.  If 
>> both
>>  accesses are misaligned by 3, the first one needs one scalar
>>  iteration to be aligned, and the second one needs 5.  But 
>> the
>>  first one will be aligned also by peeling 5 scalar
>>  iterations, and in that case both accesses will be aligned.
>>  Hence, except for the immediate peeling amount, we also want
>>  to try to add full vector size, while we don't exceed
>>  vectorization factor.
>>  We do this automatically for cost model, since we calculate
>>  cost for every peeling option.  */
>> 
>> So for this example, possible_npeel_number==2 for the first access
>> (letting us try two peeling values for it) and possible_npeel_number==1
>> for the second.
>
> Yeah, but npeel is _scalar_ lanes, since we always peel scalar iterations.
> So it seems odd to somehow put in the number of vectors...  so to me
> it would have made sense if it did
>
>   possible_npeel_number = lower_bound (nscalars);
>
> or 

Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-06-30 Thread Richard Biener
On Tue, 30 Jun 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford
> >  wrote:
> >>
> >> "Yangfei (Felix)"  writes:
> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c 
> >> > b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >> > index e050db1a2e4..ea39fcac0e0 100644
> >> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >> > @@ -1,6 +1,7 @@
> >> >  /* { dg-do compile } */
> >> >  /* { dg-additional-options "-O3" } */
> >> >  /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } 
> >> > } */
> >> > +/* { dg-additional-options "-march=armv8.2-a+sve -fno-vect-cost-model" 
> >> > { target aarch64*-*-* } } */
> >> >
> >> >  typedef struct {
> >> >  unsigned short mprr_2[5][16][16];
> >>
> >> This test is useful for Advanced SIMD too, so I think we should continue
> >> to test it with whatever options the person running the testsuite chose.
> >> Instead we could duplicate the test in gcc.target/aarch64/sve with
> >> appropriate options.
> >>
> >> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> >> > index eb8288e7a85..b30a7d8a3bb 100644
> >> > --- a/gcc/tree-vect-data-refs.c
> >> > +++ b/gcc/tree-vect-data-refs.c
> >> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> >> > loop_vinfo)
> >> >   {
> >> > poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
> >> > ? vf * DR_GROUP_SIZE (stmt_info) 
> >> > : vf);
> >> > -   possible_npeel_number
> >> > - = vect_get_num_vectors (nscalars, vectype);
> >> > +   if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS (vectype)))
> >> > + possible_npeel_number = 0;
> >> > +   else
> >> > + possible_npeel_number
> >> > +   = vect_get_num_vectors (nscalars, vectype);
> >> >
> >> > /* NPEEL_TMP is 0 when there is no misalignment, but also
> >> >allow peeling NELEMENTS.  */
> >>
> >> OK, so this is coming from:
> >>
> >>   int s[16][2];
> >>   …
> >>   … =s[j][1];
> >>
> >> and an SLP node containing 16 instances of “s[j][1]”.  The DR_GROUP_SIZE
> >> is 2 because that's the inner dimension of “s”.
> >>
> >> I don't think maybe_lt is right here though.  The same problem could in
> >> principle happen for cases in which NSCALARS > TYPE_VECTOR_SUBPARTS,
> >> e.g. for different inner dimensions of “s”.
> >>
> >> I think the testcase shows that using DR_GROUP_SIZE in this calculation
> >> is flawed.  I'm not sure whether we can really do better given the current
> >> representation though.  This is one case where having a separate 
> >> dr_vec_info
> >> per SLP node would help.
> >
> > I guess what the code likes to know is what we now have in SLP_TREE_LANES
> > (or formerly group_size) but that's not necessarily connected to 
> > DR_GROUP_SIZE.
> > Given we only see a stmt_info here and there's no 1:1 mapping of SLP node
> > to stmt_info (and the reverse mapping doesn't even exist) I do not have
> > any good idea either.
> >
> > Honestly I don't really see what this code tries to do ... doesn't it
> > simply want to set possible_npeel_number to TYPE_VECTOR_SUBPARTS (vectype)?!
> 
> I think it's trying to set possible_npeel_number to the number of
> vector stmts per iteration (i.e. ncopies for non-SLP stuff):
> 
>   /* For multiple types, it is possible that the bigger type 
> access
>  will have more than one peeling option.  E.g., a loop with 
> two
>  types: one of size (vector size / 4), and the other one of
>  size (vector size / 8).  Vectorization factor will 8.  If 
> both
>  accesses are misaligned by 3, the first one needs one scalar
>  iteration to be aligned, and the second one needs 5.  But the
>  first one will be aligned also by peeling 5 scalar
>  iterations, and in that case both accesses will be aligned.
>  Hence, except for the immediate peeling amount, we also want
>  to try to add full vector size, while we don't exceed
>  vectorization factor.
>  We do this automatically for cost model, since we calculate
>  cost for every peeling option.  */
> 
> So for this example, possible_npeel_number==2 for the first access
> (letting us try two peeling values for it) and possible_npeel_number==1
> for the second.

Yeah, but npeel is _scalar_ lanes, since we always peel scalar iterations.
So it seems odd to somehow put in the number of vectors...  so to me
it would have made sense if it did

  possible_npeel_number = lower_bound (nscalars);

or whateveris necessary to make the polys happy.  Thus simply elide
the vect_get_num_vectors call?  But it's been very long since I've
dived into the alignment 

Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-06-30 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford
>  wrote:
>>
>> "Yangfei (Felix)"  writes:
>> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c 
>> > b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>> > index e050db1a2e4..ea39fcac0e0 100644
>> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>> > @@ -1,6 +1,7 @@
>> >  /* { dg-do compile } */
>> >  /* { dg-additional-options "-O3" } */
>> >  /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } 
>> > */
>> > +/* { dg-additional-options "-march=armv8.2-a+sve -fno-vect-cost-model" { 
>> > target aarch64*-*-* } } */
>> >
>> >  typedef struct {
>> >  unsigned short mprr_2[5][16][16];
>>
>> This test is useful for Advanced SIMD too, so I think we should continue
>> to test it with whatever options the person running the testsuite chose.
>> Instead we could duplicate the test in gcc.target/aarch64/sve with
>> appropriate options.
>>
>> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> > index eb8288e7a85..b30a7d8a3bb 100644
>> > --- a/gcc/tree-vect-data-refs.c
>> > +++ b/gcc/tree-vect-data-refs.c
>> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
>> > loop_vinfo)
>> >   {
>> > poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
>> > ? vf * DR_GROUP_SIZE (stmt_info) : 
>> > vf);
>> > -   possible_npeel_number
>> > - = vect_get_num_vectors (nscalars, vectype);
>> > +   if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS (vectype)))
>> > + possible_npeel_number = 0;
>> > +   else
>> > + possible_npeel_number
>> > +   = vect_get_num_vectors (nscalars, vectype);
>> >
>> > /* NPEEL_TMP is 0 when there is no misalignment, but also
>> >allow peeling NELEMENTS.  */
>>
>> OK, so this is coming from:
>>
>>   int s[16][2];
>>   …
>>   … =s[j][1];
>>
>> and an SLP node containing 16 instances of “s[j][1]”.  The DR_GROUP_SIZE
>> is 2 because that's the inner dimension of “s”.
>>
>> I don't think maybe_lt is right here though.  The same problem could in
>> principle happen for cases in which NSCALARS > TYPE_VECTOR_SUBPARTS,
>> e.g. for different inner dimensions of “s”.
>>
>> I think the testcase shows that using DR_GROUP_SIZE in this calculation
>> is flawed.  I'm not sure whether we can really do better given the current
>> representation though.  This is one case where having a separate dr_vec_info
>> per SLP node would help.
>
> I guess what the code likes to know is what we now have in SLP_TREE_LANES
> (or formerly group_size) but that's not necessarily connected to 
> DR_GROUP_SIZE.
> Given we only see a stmt_info here and there's no 1:1 mapping of SLP node
> to stmt_info (and the reverse mapping doesn't even exist) I do not have
> any good idea either.
>
> Honestly I don't really see what this code tries to do ... doesn't it
> simply want to set possible_npeel_number to TYPE_VECTOR_SUBPARTS (vectype)?!

I think it's trying to set possible_npeel_number to the number of
vector stmts per iteration (i.e. ncopies for non-SLP stuff):

  /* For multiple types, it is possible that the bigger type access
 will have more than one peeling option.  E.g., a loop with two
 types: one of size (vector size / 4), and the other one of
 size (vector size / 8).  Vectorization factor will 8.  If both
 accesses are misaligned by 3, the first one needs one scalar
 iteration to be aligned, and the second one needs 5.  But the
 first one will be aligned also by peeling 5 scalar
 iterations, and in that case both accesses will be aligned.
 Hence, except for the immediate peeling amount, we also want
 to try to add full vector size, while we don't exceed
 vectorization factor.
 We do this automatically for cost model, since we calculate
 cost for every peeling option.  */

So for this example, possible_npeel_number==2 for the first access
(letting us try two peeling values for it) and possible_npeel_number==1
for the second.

Thanks,
Richard


Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-06-30 Thread Richard Biener via Gcc-patches
On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford
 wrote:
>
> "Yangfei (Felix)"  writes:
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c 
> > b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> > index e050db1a2e4..ea39fcac0e0 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> > @@ -1,6 +1,7 @@
> >  /* { dg-do compile } */
> >  /* { dg-additional-options "-O3" } */
> >  /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
> > +/* { dg-additional-options "-march=armv8.2-a+sve -fno-vect-cost-model" { 
> > target aarch64*-*-* } } */
> >
> >  typedef struct {
> >  unsigned short mprr_2[5][16][16];
>
> This test is useful for Advanced SIMD too, so I think we should continue
> to test it with whatever options the person running the testsuite chose.
> Instead we could duplicate the test in gcc.target/aarch64/sve with
> appropriate options.
>
> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> > index eb8288e7a85..b30a7d8a3bb 100644
> > --- a/gcc/tree-vect-data-refs.c
> > +++ b/gcc/tree-vect-data-refs.c
> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> > loop_vinfo)
> >   {
> > poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
> > ? vf * DR_GROUP_SIZE (stmt_info) : 
> > vf);
> > -   possible_npeel_number
> > - = vect_get_num_vectors (nscalars, vectype);
> > +   if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS (vectype)))
> > + possible_npeel_number = 0;
> > +   else
> > + possible_npeel_number
> > +   = vect_get_num_vectors (nscalars, vectype);
> >
> > /* NPEEL_TMP is 0 when there is no misalignment, but also
> >allow peeling NELEMENTS.  */
>
> OK, so this is coming from:
>
>   int s[16][2];
>   …
>   … =s[j][1];
>
> and an SLP node containing 16 instances of “s[j][1]”.  The DR_GROUP_SIZE
> is 2 because that's the inner dimension of “s”.
>
> I don't think maybe_lt is right here though.  The same problem could in
> principle happen for cases in which NSCALARS > TYPE_VECTOR_SUBPARTS,
> e.g. for different inner dimensions of “s”.
>
> I think the testcase shows that using DR_GROUP_SIZE in this calculation
> is flawed.  I'm not sure whether we can really do better given the current
> representation though.  This is one case where having a separate dr_vec_info
> per SLP node would help.

I guess what the code likes to know is what we now have in SLP_TREE_LANES
(or formerly group_size) but that's not necessarily connected to DR_GROUP_SIZE.
Given we only see a stmt_info here and there's no 1:1 mapping of SLP node
to stmt_info (and the reverse mapping doesn't even exist) I do not have
any good idea either.

Honestly I don't really see what this code tries to do ... doesn't it
simply want to set possible_npeel_number to TYPE_VECTOR_SUBPARTS (vectype)?!

>
> Maybe one option (for now) would be to use:
>
>   if (multiple_p (nscalars, TREE_VECTOR_SUBPARTS (vectype)))
> possible_npeel_number = vect_get_num_vectors (nscalars, vectype);
>   else
> /* This isn't a simple stream of contiguous vector accesses.  It's hard
>to predict from the available information how many vector accesses
>we'll actually need per iteration, so be conservative and assume
>one.  */
> possible_npeel_number = 1;
>
> BTW, I'm not sure whether the current choice of STMT_SLP_TYPE (stmt_info)
> instead of PURE_SLP_STMT (stmt_info) is optimal or not.  It means that for
> hybrid SLP we base the peeling on the SLP stmt rather than the non-SLP stmt.
> I guess hybrid SLP is going away soon though, so let's not worry about
> that. :-)
>
> Maybe Richard has a better suggestion.
>
> Thanks,
> Richard


Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-06-30 Thread Richard Sandiford
"Yangfei (Felix)"  writes:
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> index e050db1a2e4..ea39fcac0e0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> @@ -1,6 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-additional-options "-O3" } */
>  /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
> +/* { dg-additional-options "-march=armv8.2-a+sve -fno-vect-cost-model" { 
> target aarch64*-*-* } } */
>  
>  typedef struct {
>  unsigned short mprr_2[5][16][16];

This test is useful for Advanced SIMD too, so I think we should continue
to test it with whatever options the person running the testsuite chose.
Instead we could duplicate the test in gcc.target/aarch64/sve with
appropriate options.

> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index eb8288e7a85..b30a7d8a3bb 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>   {
> poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
> ? vf * DR_GROUP_SIZE (stmt_info) : 
> vf);
> -   possible_npeel_number
> - = vect_get_num_vectors (nscalars, vectype);
> +   if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS (vectype)))
> + possible_npeel_number = 0;
> +   else
> + possible_npeel_number
> +   = vect_get_num_vectors (nscalars, vectype);
>  
> /* NPEEL_TMP is 0 when there is no misalignment, but also
>allow peeling NELEMENTS.  */

OK, so this is coming from:

  int s[16][2];
  …
  … =s[j][1];

and an SLP node containing 16 instances of “s[j][1]”.  The DR_GROUP_SIZE
is 2 because that's the inner dimension of “s”.

I don't think maybe_lt is right here though.  The same problem could in
principle happen for cases in which NSCALARS > TYPE_VECTOR_SUBPARTS,
e.g. for different inner dimensions of “s”.

I think the testcase shows that using DR_GROUP_SIZE in this calculation
is flawed.  I'm not sure whether we can really do better given the current
representation though.  This is one case where having a separate dr_vec_info
per SLP node would help.

Maybe one option (for now) would be to use:

  if (multiple_p (nscalars, TREE_VECTOR_SUBPARTS (vectype)))
possible_npeel_number = vect_get_num_vectors (nscalars, vectype);
  else
/* This isn't a simple stream of contiguous vector accesses.  It's hard
   to predict from the available information how many vector accesses
   we'll actually need per iteration, so be conservative and assume
   one.  */
possible_npeel_number = 1;

BTW, I'm not sure whether the current choice of STMT_SLP_TYPE (stmt_info)
instead of PURE_SLP_STMT (stmt_info) is optimal or not.  It means that for
hybrid SLP we base the peeling on the SLP stmt rather than the non-SLP stmt.
I guess hybrid SLP is going away soon though, so let's not worry about
that. :-)

Maybe Richard has a better suggestion.

Thanks,
Richard