RE: Prefer open-coding vector integer division

2018-06-08 Thread Matthew Fortune
Richard Sandiford  writes:
> vect_recog_divmod_pattern currently bails out if the target has
> native support for integer division, but I think in practice
> it's always going to be better to open-code it anyway, just as
> we usually open-code scalar divisions by constants.
> 
> I think the only currently affected target is MIPS MSA, where for:
> 
>   void
>   foo (int *x)
>   {
> for (int i = 0; i < 100; ++i)
>   x[i] /= 2;
>   }
> 
> we previously preferred to use division for powers of 2:
> 
> .setnoreorder
> bnz.w   $w1,1f
> div_s.w $w0,$w0,$w1
> break   7
> .setreorder
> 1:
> 
> (or just the div_s.w for -mno-check-zero-division), but after the patch
> we open-code them using shifts:
> 
> clt_s.w $w1,$w0,$w2
> subv.w  $w0,$w0,$w1
> srai.w  $w0,$w0,1
> 
> I assume that's better.  Matthew, is that right?

Sorry for extreme tardiness. Yes, the alternate sequence has a max latency
of 6. Although I don't have the range of latencies to hand for the FDIV, as
far as I remember 6 cycles is better than the fastest FDIV case at least for
i6400/i6500.

Matthew



Re: Prefer open-coding vector integer division

2018-05-25 Thread Richard Biener
On Thu, May 24, 2018 at 2:52 PM Richard Sandiford <
richard.sandif...@linaro.org> wrote:

> Richard Biener  writes:
> > On Thu, May 24, 2018 at 10:31 AM Jakub Jelinek  wrote:
> >> On Thu, May 24, 2018 at 09:21:35AM +0100, Richard Sandiford wrote:
> >> > vect_recog_divmod_pattern currently bails out if the target has
> >> > native support for integer division, but I think in practice
> >> > it's always going to be better to open-code it anyway, just as
> >> > we usually open-code scalar divisions by constants.
> >> >
> >> > I think the only currently affected target is MIPS MSA, where for:
> >
> >> Isn't powerpcspe affected too?  It has a divv2si3 pattern.

> Ah, yeah.  I forgot to update the description after you mentioned
> that on IRC, sorry.

> >> > --- gcc/tree-vect-patterns.c  2018-05-16 12:48:59.115202362 +0100
> >> > +++ gcc/tree-vect-patterns.c  2018-05-24 09:18:10.445466941 +0100
> >> > @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec
> >> >enum tree_code rhs_code;
> >> >stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
> >> >vec_info *vinfo = stmt_vinfo->vinfo;
> >> > -  optab optab;
> >> >tree q;
> >> >int dummy_int, prec;
> >> >stmt_vec_info def_stmt_vinfo;
> >> > @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec
> >> >if (vectype == NULL_TREE)
> >> >  return NULL;
> >> >
> >> > -  /* If the target can handle vectorized division or modulo
natively,
> >> > - don't attempt to optimize this.  */
> >> > -  optab = optab_for_tree_code (rhs_code, vectype, optab_default);
> >> > -  if (optab != unknown_optab)
> >> > -{
> >> > -  machine_mode vec_mode = TYPE_MODE (vectype);
> >> > -  int icode = (int) optab_handler (optab, vec_mode);
> >> > -  if (icode != CODE_FOR_nothing)
> >> > - return NULL;
> >> > -}
> >> > -
> >
> >> Shouldn't we instead keep it, but only do it if
> >> optimize_bb_for_size_p (gimple_bb (last_stmt)) ?
> >> I mean, a hw division is very likely shorter than what we replace it
> > with...
> >
> > Good idea.  While loops are checked for optimize_loop_nest_for_speed if
> > not forced vect BB vectorization has no such predicate.

> Like this, if it passes testing?

Looks good.

Thanks,
Richard.

> Thanks,
> Richard


> 2018-05-24  Richard Sandiford  

> gcc/
>  * tree-vect-patterns.c: Include predict.h.
>  (vect_recog_divmod_pattern): Restrict check for division support
>  to when optimizing for size.

> gcc/testsuite/
>  * gcc.dg/vect/bb-slp-div-1.c: New XFAILed test.

> Index: gcc/tree-vect-patterns.c
> ===
> *** gcc/tree-vect-patterns.c2018-05-24 13:50:21.656138942 +0100
> --- gcc/tree-vect-patterns.c2018-05-24 13:50:21.819134113 +0100
> *** vect_recog_divmod_pattern (vec
> *** 2674,2688 
>  if (vectype == NULL_TREE)
>return NULL;

> !   /* If the target can handle vectorized division or modulo natively,
> !  don't attempt to optimize this.  */
> !   optab = optab_for_tree_code (rhs_code, vectype, optab_default);
> !   if (optab != unknown_optab)
>{
> !   machine_mode vec_mode = TYPE_MODE (vectype);
> !   int icode = (int) optab_handler (optab, vec_mode);
> !   if (icode != CODE_FOR_nothing)
> !   return NULL;
>}

>  prec = TYPE_PRECISION (itype);
> --- 2674,2692 
>  if (vectype == NULL_TREE)
>return NULL;

> !   if (optimize_bb_for_size_p (gimple_bb (last_stmt)))
>{
> !   /* If the target can handle vectorized division or modulo
natively,
> !don't attempt to optimize this, since native division is likely
> !to give smaller code.  */
> !   optab = optab_for_tree_code (rhs_code, vectype, optab_default);
> !   if (optab != unknown_optab)
> !   {
> ! machine_mode vec_mode = TYPE_MODE (vectype);
> ! int icode = (int) optab_handler (optab, vec_mode);
> ! if (icode != CODE_FOR_nothing)
> !   return NULL;
> !   }
>}

>  prec = TYPE_PRECISION (itype);
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c
> ===
> *** /dev/null   2018-04-20 16:19:46.369131350 +0100
> --- gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c2018-05-24
13:50:21.818134143 +0100
> ***
> *** 0 
> --- 1,19 
> + /* { dg-do compile } */
> + /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve
} } */
> +
> + int x[8];
> +
> + void
> + f (void)
> + {
> +   x[0] /= 2;
> +   x[1] /= 3;
> +   x[2] /= 4;
> +   x[3] /= 5;
> +   x[4] /= 6;
> +   x[5] /= 7;
> +   x[6] /= 8;
> +   x[7] /= 9;
> + }
> +
> + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { xfail
*-*-* } } } */


Re: Prefer open-coding vector integer division

2018-05-24 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, May 24, 2018 at 10:31 AM Jakub Jelinek  wrote:
>> On Thu, May 24, 2018 at 09:21:35AM +0100, Richard Sandiford wrote:
>> > vect_recog_divmod_pattern currently bails out if the target has
>> > native support for integer division, but I think in practice
>> > it's always going to be better to open-code it anyway, just as
>> > we usually open-code scalar divisions by constants.
>> >
>> > I think the only currently affected target is MIPS MSA, where for:
>
>> Isn't powerpcspe affected too?  It has a divv2si3 pattern.

Ah, yeah.  I forgot to update the description after you mentioned
that on IRC, sorry.

>> > --- gcc/tree-vect-patterns.c  2018-05-16 12:48:59.115202362 +0100
>> > +++ gcc/tree-vect-patterns.c  2018-05-24 09:18:10.445466941 +0100
>> > @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec
>> >enum tree_code rhs_code;
>> >stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
>> >vec_info *vinfo = stmt_vinfo->vinfo;
>> > -  optab optab;
>> >tree q;
>> >int dummy_int, prec;
>> >stmt_vec_info def_stmt_vinfo;
>> > @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec
>> >if (vectype == NULL_TREE)
>> >  return NULL;
>> >
>> > -  /* If the target can handle vectorized division or modulo natively,
>> > - don't attempt to optimize this.  */
>> > -  optab = optab_for_tree_code (rhs_code, vectype, optab_default);
>> > -  if (optab != unknown_optab)
>> > -{
>> > -  machine_mode vec_mode = TYPE_MODE (vectype);
>> > -  int icode = (int) optab_handler (optab, vec_mode);
>> > -  if (icode != CODE_FOR_nothing)
>> > - return NULL;
>> > -}
>> > -
>
>> Shouldn't we instead keep it, but only do it if
>> optimize_bb_for_size_p (gimple_bb (last_stmt)) ?
>> I mean, a hw division is very likely shorter than what we replace it
> with...
>
> Good idea.  While loops are checked for optimize_loop_nest_for_speed if
> not forced vect BB vectorization has no such predicate.

Like this, if it passes testing?

Thanks,
Richard


2018-05-24  Richard Sandiford  

gcc/
* tree-vect-patterns.c: Include predict.h.
(vect_recog_divmod_pattern): Restrict check for division support
to when optimizing for size.

gcc/testsuite/
* gcc.dg/vect/bb-slp-div-1.c: New XFAILed test.

Index: gcc/tree-vect-patterns.c
===
*** gcc/tree-vect-patterns.c2018-05-24 13:50:21.656138942 +0100
--- gcc/tree-vect-patterns.c2018-05-24 13:50:21.819134113 +0100
*** vect_recog_divmod_pattern (vec
*** 2674,2688 
if (vectype == NULL_TREE)
  return NULL;
  
!   /* If the target can handle vectorized division or modulo natively,
!  don't attempt to optimize this.  */
!   optab = optab_for_tree_code (rhs_code, vectype, optab_default);
!   if (optab != unknown_optab)
  {
!   machine_mode vec_mode = TYPE_MODE (vectype);
!   int icode = (int) optab_handler (optab, vec_mode);
!   if (icode != CODE_FOR_nothing)
!   return NULL;
  }
  
prec = TYPE_PRECISION (itype);
--- 2674,2692 
if (vectype == NULL_TREE)
  return NULL;
  
!   if (optimize_bb_for_size_p (gimple_bb (last_stmt)))
  {
!   /* If the target can handle vectorized division or modulo natively,
!don't attempt to optimize this, since native division is likely
!to give smaller code.  */
!   optab = optab_for_tree_code (rhs_code, vectype, optab_default);
!   if (optab != unknown_optab)
!   {
! machine_mode vec_mode = TYPE_MODE (vectype);
! int icode = (int) optab_handler (optab, vec_mode);
! if (icode != CODE_FOR_nothing)
!   return NULL;
!   }
  }
  
prec = TYPE_PRECISION (itype);
Index: gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c
===
*** /dev/null   2018-04-20 16:19:46.369131350 +0100
--- gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c2018-05-24 13:50:21.818134143 
+0100
***
*** 0 
--- 1,19 
+ /* { dg-do compile } */
+ /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve } } */
+ 
+ int x[8];
+ 
+ void
+ f (void)
+ {
+   x[0] /= 2;
+   x[1] /= 3;
+   x[2] /= 4;
+   x[3] /= 5;
+   x[4] /= 6;
+   x[5] /= 7;
+   x[6] /= 8;
+   x[7] /= 9;
+ }
+ 
+ /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { xfail *-*-* 
} } } */


Re: Prefer open-coding vector integer division

2018-05-24 Thread Richard Biener
On Thu, May 24, 2018 at 10:31 AM Jakub Jelinek  wrote:

> On Thu, May 24, 2018 at 09:21:35AM +0100, Richard Sandiford wrote:
> > vect_recog_divmod_pattern currently bails out if the target has
> > native support for integer division, but I think in practice
> > it's always going to be better to open-code it anyway, just as
> > we usually open-code scalar divisions by constants.
> >
> > I think the only currently affected target is MIPS MSA, where for:

> Isn't powerpcspe affected too?  It has a divv2si3 pattern.

> > --- gcc/tree-vect-patterns.c  2018-05-16 12:48:59.115202362 +0100
> > +++ gcc/tree-vect-patterns.c  2018-05-24 09:18:10.445466941 +0100
> > @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec
> >enum tree_code rhs_code;
> >stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
> >vec_info *vinfo = stmt_vinfo->vinfo;
> > -  optab optab;
> >tree q;
> >int dummy_int, prec;
> >stmt_vec_info def_stmt_vinfo;
> > @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec
> >if (vectype == NULL_TREE)
> >  return NULL;
> >
> > -  /* If the target can handle vectorized division or modulo natively,
> > - don't attempt to optimize this.  */
> > -  optab = optab_for_tree_code (rhs_code, vectype, optab_default);
> > -  if (optab != unknown_optab)
> > -{
> > -  machine_mode vec_mode = TYPE_MODE (vectype);
> > -  int icode = (int) optab_handler (optab, vec_mode);
> > -  if (icode != CODE_FOR_nothing)
> > - return NULL;
> > -}
> > -

> Shouldn't we instead keep it, but only do it if
> optimize_bb_for_size_p (gimple_bb (last_stmt)) ?
> I mean, a hw division is very likely shorter than what we replace it
with...

Good idea.  While loops are checked for optimize_loop_nest_for_speed if
not forced vect BB vectorization has no such predicate.

Richard.

>  Jakub


Re: Prefer open-coding vector integer division

2018-05-24 Thread Jakub Jelinek
On Thu, May 24, 2018 at 09:21:35AM +0100, Richard Sandiford wrote:
> vect_recog_divmod_pattern currently bails out if the target has
> native support for integer division, but I think in practice
> it's always going to be better to open-code it anyway, just as
> we usually open-code scalar divisions by constants.
> 
> I think the only currently affected target is MIPS MSA, where for:

Isn't powerpcspe affected too?  It has a divv2si3 pattern.

> --- gcc/tree-vect-patterns.c  2018-05-16 12:48:59.115202362 +0100
> +++ gcc/tree-vect-patterns.c  2018-05-24 09:18:10.445466941 +0100
> @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec
>enum tree_code rhs_code;
>stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
>vec_info *vinfo = stmt_vinfo->vinfo;
> -  optab optab;
>tree q;
>int dummy_int, prec;
>stmt_vec_info def_stmt_vinfo;
> @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec
>if (vectype == NULL_TREE)
>  return NULL;
>  
> -  /* If the target can handle vectorized division or modulo natively,
> - don't attempt to optimize this.  */
> -  optab = optab_for_tree_code (rhs_code, vectype, optab_default);
> -  if (optab != unknown_optab)
> -{
> -  machine_mode vec_mode = TYPE_MODE (vectype);
> -  int icode = (int) optab_handler (optab, vec_mode);
> -  if (icode != CODE_FOR_nothing)
> - return NULL;
> -}
> -

Shouldn't we instead keep it, but only do it if
optimize_bb_for_size_p (gimple_bb (last_stmt)) ?
I mean, a hw division is very likely shorter than what we replace it with...

Jakub