Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-28 Thread Segher Boessenkool
On Wed, May 27, 2020 at 10:20:18AM +0200, Richard Biener wrote:
> > How about "Var(flag_cunroll_grow_size) EnabledBy(funroll-loops ||
> > funroll-all-loops || fpeel-loops)" Or flag_cunroll_allow_grow_size?
> >
> > And then using this flags as:
> >   unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
> >|| optimize >= 3, true);
> >
> > And we do not need to enable this flag at -O2.
> 
> Sure this works for me.  Note I'd make funroll-loops enabled by
> funroll-all-loops so you could simplify the above.

But only do that on trunk?


Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-28 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Wed, May 27, 2020 at 6:36 AM Jiufu Guo  wrote:
>>
>> Segher Boessenkool  writes:
>>
>> > Hi!
>> >
>> > On Tue, May 26, 2020 at 08:58:13AM +0200, Richard Biener wrote:
>> >> On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool
>> >>  wrote:
>> >> > Yes, cunroll does not have its own option, and that is a problem.  But
>> >> > that is easy to fix!  Either with an option, or just with params (the
>> >> > option wouldn't do more than set params anyway?)
>> >>
>> >> Well, given coming up with different names for essentially the same
>> >> transform is going to be challenging how about sth like
>> >>
>> >> -funroll-loops={early,late,static,dynamic}[insert better names here]
>> >
>> > User interface is hard :-)  I think luckily we don't need to change
>> > anything there yet, just have an internal flag?
>> >
>> > But complete unrolling is something quite different, so it should have
>> > its own flag anyway (at least internally).
>> >
>> >> note there's also -fpeel-loops which may match the transform
>> >> done on GIMPLE better?
>> >
>> > Peeling and unrolling are the same thing, if doing complete unrolling
>> > (or complete peeling), followed by DCE in both cases.  Peeling is a
>> > nicer name here I think, yeah.
>> >
>> >> I'm not sure which are the technically
>> >> correct terms for unrollings that elide the loop (the backedge).
>> >
>> > I don't know a better term than "complete", I don't remember ever seeing
>> > something else either.
>>
>> How about "Var(flag_cunroll_grow_size) EnabledBy(funroll-loops ||
>> funroll-all-loops || fpeel-loops)" Or flag_cunroll_allow_grow_size?
>>
>> And then using this flags as:
>>   unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
>>|| optimize >= 3, true);
>>
>> And we do not need to enable this flag at -O2.
>
> Sure this works for me.  Note I'd make funroll-loops enabled by
> funroll-all-loops so you could simplify the above.

After some checking, I did not use 'EnabledBy(funroll-all-loops)' for
funroll-loops, because of special behavior on -fno-unroll-all-loops.
Command line -fno-unroll-all-loops turns off flag_unroll_loops, if
flag_unroll_loops is enabled implicitly(not through command line). 

While the orignal logic is: only possitive flag_unroll_all_loops
overrides flag_unroll_loops. 
"if (flag_unroll_all_loops) flag_unroll_loops = 1;" in process_options.

Thanks.
Jiufu

>
> Richard.
>
>> Thanks for all your helpful comments again!
>>
>> Jiufu
>>
>> >
>> >> We're doing such kind of unrolling even if we cannot statically
>> >> decide which of a set of possible exits we take (and internally
>> >> call that peeling, if we can statically decide we call it complete
>> >> unrolling).
>> >
>> > "Peeling" is placing some copies of the loop before the loop;
>> > "unrolling" is placing a few copies of the loop inside the loop body.
>> > Does that match usage here?
>> >
>> >> The RTL side OTOH only performs classical unrolling,
>> >> preserving the backedge with various strategies for the
>> >> remaining iterations.
>> >
>> > And if you do complete unrolling that way, the backedge can be removed,
>> > since it can be shown never to be taken.
>> >
>> >> As said, for the regression on the 10 branch with ppc I'd add
>> >> [a hidden] flag that controls the RTL unroller, also set by
>> >> -funroll-loops and triggered by the ppc specific heuristics.
>> >
>> > But the problem is in cunroll?  This is so backwards...  Because some
>> > other transform abuses the unroller flags, adding a second level flag
>> > with the same meaning :-(  It will work for fixing the regression,
>> > sure, and it is slightly less code as well.
>> >
>> >
>> > Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-27 Thread Richard Biener via Gcc-patches
On Wed, May 27, 2020 at 6:36 AM Jiufu Guo  wrote:
>
> Segher Boessenkool  writes:
>
> > Hi!
> >
> > On Tue, May 26, 2020 at 08:58:13AM +0200, Richard Biener wrote:
> >> On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool
> >>  wrote:
> >> > Yes, cunroll does not have its own option, and that is a problem.  But
> >> > that is easy to fix!  Either with an option, or just with params (the
> >> > option wouldn't do more than set params anyway?)
> >>
> >> Well, given coming up with different names for essentially the same
> >> transform is going to be challenging how about sth like
> >>
> >> -funroll-loops={early,late,static,dynamic}[insert better names here]
> >
> > User interface is hard :-)  I think luckily we don't need to change
> > anything there yet, just have an internal flag?
> >
> > But complete unrolling is something quite different, so it should have
> > its own flag anyway (at least internally).
> >
> >> note there's also -fpeel-loops which may match the transform
> >> done on GIMPLE better?
> >
> > Peeling and unrolling are the same thing, if doing complete unrolling
> > (or complete peeling), followed by DCE in both cases.  Peeling is a
> > nicer name here I think, yeah.
> >
> >> I'm not sure which are the technically
> >> correct terms for unrollings that elide the loop (the backedge).
> >
> > I don't know a better term than "complete", I don't remember ever seeing
> > something else either.
>
> How about "Var(flag_cunroll_grow_size) EnabledBy(funroll-loops ||
> funroll-all-loops || fpeel-loops)" Or flag_cunroll_allow_grow_size?
>
> And then using this flags as:
>   unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
>|| optimize >= 3, true);
>
> And we do not need to enable this flag at -O2.

Sure this works for me.  Note I'd make funroll-loops enabled by
funroll-all-loops so you could simplify the above.

Richard.

> Thanks for all your helpful comments again!
>
> Jiufu
>
> >
> >> We're doing such kind of unrolling even if we cannot statically
> >> decide which of a set of possible exits we take (and internally
> >> call that peeling, if we can statically decide we call it complete
> >> unrolling).
> >
> > "Peeling" is placing some copies of the loop before the loop;
> > "unrolling" is placing a few copies of the loop inside the loop body.
> > Does that match usage here?
> >
> >> The RTL side OTOH only performs classical unrolling,
> >> preserving the backedge with various strategies for the
> >> remaining iterations.
> >
> > And if you do complete unrolling that way, the backedge can be removed,
> > since it can be shown never to be taken.
> >
> >> As said, for the regression on the 10 branch with ppc I'd add
> >> [a hidden] flag that controls the RTL unroller, also set by
> >> -funroll-loops and triggered by the ppc specific heuristics.
> >
> > But the problem is in cunroll?  This is so backwards...  Because some
> > other transform abuses the unroller flags, adding a second level flag
> > with the same meaning :-(  It will work for fixing the regression,
> > sure, and it is slightly less code as well.
> >
> >
> > Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-26 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool  writes:

> Hi!
>
> On Tue, May 26, 2020 at 08:58:13AM +0200, Richard Biener wrote:
>> On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool
>>  wrote:
>> > Yes, cunroll does not have its own option, and that is a problem.  But
>> > that is easy to fix!  Either with an option, or just with params (the
>> > option wouldn't do more than set params anyway?)
>> 
>> Well, given coming up with different names for essentially the same
>> transform is going to be challenging how about sth like
>> 
>> -funroll-loops={early,late,static,dynamic}[insert better names here]
>
> User interface is hard :-)  I think luckily we don't need to change
> anything there yet, just have an internal flag?
>
> But complete unrolling is something quite different, so it should have
> its own flag anyway (at least internally).
>
>> note there's also -fpeel-loops which may match the transform
>> done on GIMPLE better?
>
> Peeling and unrolling are the same thing, if doing complete unrolling
> (or complete peeling), followed by DCE in both cases.  Peeling is a
> nicer name here I think, yeah.
>
>> I'm not sure which are the technically
>> correct terms for unrollings that elide the loop (the backedge).
>
> I don't know a better term than "complete", I don't remember ever seeing
> something else either.

How about "Var(flag_cunroll_grow_size) EnabledBy(funroll-loops ||
funroll-all-loops || fpeel-loops)" Or flag_cunroll_allow_grow_size?

And then using this flags as:
  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
   || optimize >= 3, true);

And we do not need to enable this flag at -O2.

Thanks for all your helpful comments again!

Jiufu

>
>> We're doing such kind of unrolling even if we cannot statically
>> decide which of a set of possible exits we take (and internally
>> call that peeling, if we can statically decide we call it complete
>> unrolling).
>
> "Peeling" is placing some copies of the loop before the loop;
> "unrolling" is placing a few copies of the loop inside the loop body.
> Does that match usage here?
>
>> The RTL side OTOH only performs classical unrolling,
>> preserving the backedge with various strategies for the
>> remaining iterations.
>
> And if you do complete unrolling that way, the backedge can be removed,
> since it can be shown never to be taken.
>
>> As said, for the regression on the 10 branch with ppc I'd add
>> [a hidden] flag that controls the RTL unroller, also set by
>> -funroll-loops and triggered by the ppc specific heuristics.
>
> But the problem is in cunroll?  This is so backwards...  Because some
> other transform abuses the unroller flags, adding a second level flag
> with the same meaning :-(  It will work for fixing the regression,
> sure, and it is slightly less code as well.
>
>
> Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-26 Thread Segher Boessenkool
Hi!

On Tue, May 26, 2020 at 08:58:13AM +0200, Richard Biener wrote:
> On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool
>  wrote:
> > Yes, cunroll does not have its own option, and that is a problem.  But
> > that is easy to fix!  Either with an option, or just with params (the
> > option wouldn't do more than set params anyway?)
> 
> Well, given coming up with different names for essentially the same
> transform is going to be challenging how about sth like
> 
> -funroll-loops={early,late,static,dynamic}[insert better names here]

User interface is hard :-)  I think luckily we don't need to change
anything there yet, just have an internal flag?

But complete unrolling is something quite different, so it should have
its own flag anyway (at least internally).

> note there's also -fpeel-loops which may match the transform
> done on GIMPLE better?

Peeling and unrolling are the same thing, if doing complete unrolling
(or complete peeling), followed by DCE in both cases.  Peeling is a
nicer name here I think, yeah.

> I'm not sure which are the technically
> correct terms for unrollings that elide the loop (the backedge).

I don't know a better term than "complete", I don't remember ever seeing
something else either.

> We're doing such kind of unrolling even if we cannot statically
> decide which of a set of possible exits we take (and internally
> call that peeling, if we can statically decide we call it complete
> unrolling).

"Peeling" is placing some copies of the loop before the loop;
"unrolling" is placing a few copies of the loop inside the loop body.
Does that match usage here?

> The RTL side OTOH only performs classical unrolling,
> preserving the backedge with various strategies for the
> remaining iterations.

And if you do complete unrolling that way, the backedge can be removed,
since it can be shown never to be taken.

> As said, for the regression on the 10 branch with ppc I'd add
> [a hidden] flag that controls the RTL unroller, also set by
> -funroll-loops and triggered by the ppc specific heuristics.

But the problem is in cunroll?  This is so backwards...  Because some
other transform abuses the unroller flags, adding a second level flag
with the same meaning :-(  It will work for fixing the regression,
sure, and it is slightly less code as well.


Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-26 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool
>  wrote:
>>
>> On Mon, May 25, 2020 at 02:39:54PM +0200, Richard Biener wrote:
>> > On Fri, May 22, 2020 at 6:54 PM Segher Boessenkool
>> >  wrote:
>> > > > The split above allows the "bug" to be fixed (even on the branch)
>> > > > without introducing even more target specialities.
>> > >
>> > > So does any split.  Or I don't see what you mean?
>> >
>> > Well, a split that does not affect behavior for non-ppc architectures
>> > when the flags by users are unchanged.  Because that allows
>> > the ppc regression to be fixed on the branch.
>> >
>> > Then, on trunk, we can think of a better overall flag design.  Note
>>
>> Oh, as just a (very) temporary thing, it is fine of course (it should
>> say it is then though).
>>
>> > that cunroll/cunrolli are not controlled by a flag currently, they
>> > are gated on optimize >= [2|3] - it's just that either -funroll-loops
>> > or -fpeel-loops causes its heuristics to allow code-size growth
>> > by its own metrics according to the unroll --params.
>> >
>> > So it's a bit difficult to retrofit the heuristic behavior onto new
>> > flags unless we want to completely move that over to a --param
>> > that may be gets adjusted by -funroll-loops.
>>
>> Yes, cunroll does not have its own option, and that is a problem.  But
>> that is easy to fix!  Either with an option, or just with params (the
>> option wouldn't do more than set params anyway?)
>
> Well, given coming up with different names for essentially the same
> transform is going to be challenging how about sth like
>
> -funroll-loops={early,late,static,dynamic}[insert better names here]
>
> note there's also -fpeel-loops which may match the transform
> done on GIMPLE better?  I'm not sure which are the technically
> correct terms for unrollings that elide the loop (the backedge).
> We're doing such kind of unrolling even if we cannot statically
> decide which of a set of possible exits we take (and internally
> call that peeling, if we can statically decide we call it complete
> unrolling).  The RTL side OTOH only performs classical unrolling,
> preserving the backedge with various strategies for the
> remaining iterations.
>
> As said, for the regression on the 10 branch with ppc I'd add
> [a hidden] flag that controls the RTL unroller, also set by
> -funroll-loops and triggered by the ppc specific heuristics.

This way would enable rtl unroller at -O2 instead enable -funroll-loops,
and then cunroll will not unroll/peel the loop if there is potential
size increasing.  This could avoid the negative affect on the loop which
mentioned in PR95018 code.

While, I still hope to tune cunroll at -O2 for ppc (or general
platforms) with keeping possitive and avoid negative affects. Yes, this
may align with what Honza plan to do.

BR,
Jiufu

>
> Richard.
>
>>
>> Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-26 Thread Richard Biener via Gcc-patches
On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool
 wrote:
>
> On Mon, May 25, 2020 at 02:39:54PM +0200, Richard Biener wrote:
> > On Fri, May 22, 2020 at 6:54 PM Segher Boessenkool
> >  wrote:
> > > > The split above allows the "bug" to be fixed (even on the branch)
> > > > without introducing even more target specialities.
> > >
> > > So does any split.  Or I don't see what you mean?
> >
> > Well, a split that does not affect behavior for non-ppc architectures
> > when the flags by users are unchanged.  Because that allows
> > the ppc regression to be fixed on the branch.
> >
> > Then, on trunk, we can think of a better overall flag design.  Note
>
> Oh, as just a (very) temporary thing, it is fine of course (it should
> say it is then though).
>
> > that cunroll/cunrolli are not controlled by a flag currently, they
> > are gated on optimize >= [2|3] - it's just that either -funroll-loops
> > or -fpeel-loops causes its heuristics to allow code-size growth
> > by its own metrics according to the unroll --params.
> >
> > So it's a bit difficult to retrofit the heuristic behavior onto new
> > flags unless we want to completely move that over to a --param
> > that may be gets adjusted by -funroll-loops.
>
> Yes, cunroll does not have its own option, and that is a problem.  But
> that is easy to fix!  Either with an option, or just with params (the
> option wouldn't do more than set params anyway?)

Well, given coming up with different names for essentially the same
transform is going to be challenging how about sth like

-funroll-loops={early,late,static,dynamic}[insert better names here]

note there's also -fpeel-loops which may match the transform
done on GIMPLE better?  I'm not sure which are the technically
correct terms for unrollings that elide the loop (the backedge).
We're doing such kind of unrolling even if we cannot statically
decide which of a set of possible exits we take (and internally
call that peeling, if we can statically decide we call it complete
unrolling).  The RTL side OTOH only performs classical unrolling,
preserving the backedge with various strategies for the
remaining iterations.

As said, for the regression on the 10 branch with ppc I'd add
[a hidden] flag that controls the RTL unroller, also set by
-funroll-loops and triggered by the ppc specific heuristics.

Richard.

>
> Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-25 Thread Segher Boessenkool
On Mon, May 25, 2020 at 02:39:54PM +0200, Richard Biener wrote:
> On Fri, May 22, 2020 at 6:54 PM Segher Boessenkool
>  wrote:
> > > The split above allows the "bug" to be fixed (even on the branch)
> > > without introducing even more target specialities.
> >
> > So does any split.  Or I don't see what you mean?
> 
> Well, a split that does not affect behavior for non-ppc architectures
> when the flags by users are unchanged.  Because that allows
> the ppc regression to be fixed on the branch.
> 
> Then, on trunk, we can think of a better overall flag design.  Note

Oh, as just a (very) temporary thing, it is fine of course (it should
say it is then though).

> that cunroll/cunrolli are not controlled by a flag currently, they
> are gated on optimize >= [2|3] - it's just that either -funroll-loops
> or -fpeel-loops causes its heuristics to allow code-size growth
> by its own metrics according to the unroll --params.
> 
> So it's a bit difficult to retrofit the heuristic behavior onto new
> flags unless we want to completely move that over to a --param
> that may be gets adjusted by -funroll-loops.

Yes, cunroll does not have its own option, and that is a problem.  But
that is easy to fix!  Either with an option, or just with params (the
option wouldn't do more than set params anyway?)


Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-25 Thread Richard Biener via Gcc-patches
On Fri, May 22, 2020 at 6:54 PM Segher Boessenkool
 wrote:
>
> On Fri, May 22, 2020 at 01:22:10PM +0200, Richard Biener wrote:
> > On Wed, May 20, 2020 at 10:37 PM Segher Boessenkool
> >  wrote:
> > >
> > > On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote:
> > > > I think this is the wrong way to approach this.  You're doing too many
> > > > things at once.  Try to fix the powerpc regression with the extra
> > > > flag_rtl_unroll_loops, that could be backported.  Then you can
> > > > independently see whether enabling more unrolling at -O2 makes
> > > > sense.  Because currently we _do_ unroll at -O2 when it does
> > > > not increase size.  Its just your patches make this as aggressive
> > > > as -O3.
> > >
> > > Just do a separate flag (and option) for cunroll, instead?
> > >
> > > The RTL unroller is *the* unroller, and has been since forever.
> >
> > Sorry but that ship has sailed - I don't think we should make
> > -O2 -funroll-loops no longer affect GIMPLE.
>
> Of course, and it would certainly be good if we could do some non-
> trivial unrolling earlier.  My point is, why rename all RTL stuff
> now?  That is just churn.
>
> (And we will need to keep unrolling in RTL for a long time still, if
> not forever).
>
> > But sure, what I am proposing is have
> >
> > -frtl-unroll-loops
> > -fgimple-unroll-loops
> >
> > with obvious semantics and -funroll-loops enabling both.
> > Users should not really care about what is RTL or GIMPLE.
>
> Yeah -- so this shouldn't be part of the name at all.  What you care
> about are higher level characteristics of the unrolling, like
> "complete" or "early" or whatever.
>
> > The split above allows the "bug" to be fixed (even on the branch)
> > without introducing even more target specialities.
>
> So does any split.  Or I don't see what you mean?

Well, a split that does not affect behavior for non-ppc architectures
when the flags by users are unchanged.  Because that allows
the ppc regression to be fixed on the branch.

Then, on trunk, we can think of a better overall flag design.  Note
that cunroll/cunrolli are not controlled by a flag currently, they
are gated on optimize >= [2|3] - it's just that either -funroll-loops
or -fpeel-loops causes its heuristics to allow code-size growth
by its own metrics according to the unroll --params.

So it's a bit difficult to retrofit the heuristic behavior onto new
flags unless we want to completely move that over to a --param
that may be gets adjusted by -funroll-loops.

Richard.

>
> Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-22 Thread Segher Boessenkool
On Fri, May 22, 2020 at 01:22:10PM +0200, Richard Biener wrote:
> On Wed, May 20, 2020 at 10:37 PM Segher Boessenkool
>  wrote:
> >
> > On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote:
> > > I think this is the wrong way to approach this.  You're doing too many
> > > things at once.  Try to fix the powerpc regression with the extra
> > > flag_rtl_unroll_loops, that could be backported.  Then you can
> > > independently see whether enabling more unrolling at -O2 makes
> > > sense.  Because currently we _do_ unroll at -O2 when it does
> > > not increase size.  Its just your patches make this as aggressive
> > > as -O3.
> >
> > Just do a separate flag (and option) for cunroll, instead?
> >
> > The RTL unroller is *the* unroller, and has been since forever.
> 
> Sorry but that ship has sailed - I don't think we should make
> -O2 -funroll-loops no longer affect GIMPLE.

Of course, and it would certainly be good if we could do some non-
trivial unrolling earlier.  My point is, why rename all RTL stuff
now?  That is just churn.

(And we will need to keep unrolling in RTL for a long time still, if
not forever).

> But sure, what I am proposing is have
> 
> -frtl-unroll-loops
> -fgimple-unroll-loops
> 
> with obvious semantics and -funroll-loops enabling both.
> Users should not really care about what is RTL or GIMPLE.

Yeah -- so this shouldn't be part of the name at all.  What you care
about are higher level characteristics of the unrolling, like
"complete" or "early" or whatever.

> The split above allows the "bug" to be fixed (even on the branch)
> without introducing even more target specialities.

So does any split.  Or I don't see what you mean?


Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-22 Thread Richard Biener via Gcc-patches
On Wed, May 20, 2020 at 10:37 PM Segher Boessenkool
 wrote:
>
> On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote:
> > I think this is the wrong way to approach this.  You're doing too many
> > things at once.  Try to fix the powerpc regression with the extra
> > flag_rtl_unroll_loops, that could be backported.  Then you can
> > independently see whether enabling more unrolling at -O2 makes
> > sense.  Because currently we _do_ unroll at -O2 when it does
> > not increase size.  Its just your patches make this as aggressive
> > as -O3.
>
> Just do a separate flag (and option) for cunroll, instead?
>
> The RTL unroller is *the* unroller, and has been since forever.

Sorry but that ship has sailed - I don't think we should make
-O2 -funroll-loops no longer affect GIMPLE.  But sure, what I
am proposing is have

-frtl-unroll-loops
-fgimple-unroll-loops

with obvious semantics and -funroll-loops enabling both.
Users should not really care about what is RTL or GIMPLE.
The split above allows the "bug" to be fixed (even on the branch)
without introducing even more target specialities.

Richard.

>
> Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-21 Thread Jiufu Guo via Gcc-patches
Jiufu Guo  writes:

> Jan Hubicka  writes:
>
>>> Segher Boessenkool  writes:
>>> 
>>> > On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote:
>>> >> I think this is the wrong way to approach this.  You're doing too many
>>> >> things at once.  Try to fix the powerpc regression with the extra
>>> >> flag_rtl_unroll_loops, that could be backported.  Then you can
>>> 
>>> Or flag_complete_unroll_loops(-fcomplete-unroll-loops) for GIMPLE
>>> cunroll?
>>> >> independently see whether enabling more unrolling at -O2 makes
>>> >> sense.  Because currently we _do_ unroll at -O2 when it does
>>> >> not increase size.  Its just your patches make this as aggressive
>>> >> as -O3.
>>> 
>>> I'm also thinking about enabling more cunroll at -O2 even with some size
>>> increasing.  Full cunroll enablement make it like -O3. As some
>>> discussion in PRs (e.g. PR88760), small/simple loops unrolling may be in
>>> favor of some platforms (but not for all platforms, like x86_64?).  This
>>> would make us to have target specified hook.  Or do some generic
>>> setting: accept to unroll/peel limit times if the loop body is simple
>>> and small, together with target specific hook.
>>
>> We now have --params that can be tuned differently for -O2 and -O3 so
One thing about tunning --param based on optimization level, some times
difference function may has different optimization level.  While
--params may not be set per function, if so, --param may not work as
expect at some functions.  Not sure if this is an issue you may concern
about.

Thanks!
Jiufu

>> looking into cunroll was one of my todo for GCC 10 -O2 retuning but i did
>> not get any very conclusive benchmark results outside SPEC. 
>> I planned to return to it next stage1, so it may be good time.
>> Do you have any benchmarks on ppc?
>
> 541.leela_r, 548.exchange2_r and 557.xz_r from SPEC2017 are visbily
> affected by cunroll.  They can be used to tune cunroll, I think. 
>
>> Of couse there is no need to keep same defaults for all targets, but in
>> general having target specific defaults increases number of knobs we
>> need to check and keep up to date.
>
> Thanks,
> Jiufu
>
>>
>> Honza
>
>>> 
>>> Any comments? Thanks!
>>> Jiufu
>>> 
>>> >
>>> > Just do a separate flag (and option) for cunroll, instead?
>>> >
>>> > The RTL unroller is *the* unroller, and has been since forever.
>>> >
>>> >
>>> > Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-21 Thread Jiufu Guo via Gcc-patches
Jan Hubicka  writes:

>> Segher Boessenkool  writes:
>> 
>> > On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote:
>> >> I think this is the wrong way to approach this.  You're doing too many
>> >> things at once.  Try to fix the powerpc regression with the extra
>> >> flag_rtl_unroll_loops, that could be backported.  Then you can
>> 
>> Or flag_complete_unroll_loops(-fcomplete-unroll-loops) for GIMPLE
>> cunroll?
>> >> independently see whether enabling more unrolling at -O2 makes
>> >> sense.  Because currently we _do_ unroll at -O2 when it does
>> >> not increase size.  Its just your patches make this as aggressive
>> >> as -O3.
>> 
>> I'm also thinking about enabling more cunroll at -O2 even with some size
>> increasing.  Full cunroll enablement make it like -O3. As some
>> discussion in PRs (e.g. PR88760), small/simple loops unrolling may be in
>> favor of some platforms (but not for all platforms, like x86_64?).  This
>> would make us to have target specified hook.  Or do some generic
>> setting: accept to unroll/peel limit times if the loop body is simple
>> and small, together with target specific hook.
>
> We now have --params that can be tuned differently for -O2 and -O3 so
> looking into cunroll was one of my todo for GCC 10 -O2 retuning but i did
> not get any very conclusive benchmark results outside SPEC. 
> I planned to return to it next stage1, so it may be good time.
> Do you have any benchmarks on ppc?

541.leela_r, 548.exchange2_r and 557.xz_r from SPEC2017 are visbily
affected by cunroll.  They can be used to tune cunroll, I think. 

> Of couse there is no need to keep same defaults for all targets, but in
> general having target specific defaults increases number of knobs we
> need to check and keep up to date.

Thanks,
Jiufu

>
> Honza

>> 
>> Any comments? Thanks!
>> Jiufu
>> 
>> >
>> > Just do a separate flag (and option) for cunroll, instead?
>> >
>> > The RTL unroller is *the* unroller, and has been since forever.
>> >
>> >
>> > Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-21 Thread Jan Hubicka
> Segher Boessenkool  writes:
> 
> > On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote:
> >> I think this is the wrong way to approach this.  You're doing too many
> >> things at once.  Try to fix the powerpc regression with the extra
> >> flag_rtl_unroll_loops, that could be backported.  Then you can
> 
> Or flag_complete_unroll_loops(-fcomplete-unroll-loops) for GIMPLE
> cunroll?
> >> independently see whether enabling more unrolling at -O2 makes
> >> sense.  Because currently we _do_ unroll at -O2 when it does
> >> not increase size.  Its just your patches make this as aggressive
> >> as -O3.
> 
> I'm also thinking about enabling more cunroll at -O2 even with some size
> increasing.  Full cunroll enablement make it like -O3. As some
> discussion in PRs (e.g. PR88760), small/simple loops unrolling may be in
> favor of some platforms (but not for all platforms, like x86_64?).  This
> would make us to have target specified hook.  Or do some generic
> setting: accept to unroll/peel limit times if the loop body is simple
> and small, together with target specific hook.

We now have --params that can be tuned differently for -O2 and -O3 so
looking into cunroll was one of my todo for GCC 10 -O2 retuning but i did
not get any very conclusive benchmark results outside SPEC. 
I planned to return to it next stage1, so it may be good time.
Do you have any benchmarks on ppc?
Of couse there is no need to keep same defaults for all targets, but in
general having target specific defaults increases number of knobs we
need to check and keep up to date.

Honza
> 
> Any comments? Thanks!
> Jiufu
> 
> >
> > Just do a separate flag (and option) for cunroll, instead?
> >
> > The RTL unroller is *the* unroller, and has been since forever.
> >
> >
> > Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-20 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool  writes:

> On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote:
>> I think this is the wrong way to approach this.  You're doing too many
>> things at once.  Try to fix the powerpc regression with the extra
>> flag_rtl_unroll_loops, that could be backported.  Then you can

Or flag_complete_unroll_loops(-fcomplete-unroll-loops) for GIMPLE
cunroll?
>> independently see whether enabling more unrolling at -O2 makes
>> sense.  Because currently we _do_ unroll at -O2 when it does
>> not increase size.  Its just your patches make this as aggressive
>> as -O3.

I'm also thinking about enabling more cunroll at -O2 even with some size
increasing.  Full cunroll enablement make it like -O3. As some
discussion in PRs (e.g. PR88760), small/simple loops unrolling may be in
favor of some platforms (but not for all platforms, like x86_64?).  This
would make us to have target specified hook.  Or do some generic
setting: accept to unroll/peel limit times if the loop body is simple
and small, together with target specific hook.

Any comments? Thanks!
Jiufu

>
> Just do a separate flag (and option) for cunroll, instead?
>
> The RTL unroller is *the* unroller, and has been since forever.
>
>
> Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-20 Thread Segher Boessenkool
On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote:
> I think this is the wrong way to approach this.  You're doing too many
> things at once.  Try to fix the powerpc regression with the extra
> flag_rtl_unroll_loops, that could be backported.  Then you can
> independently see whether enabling more unrolling at -O2 makes
> sense.  Because currently we _do_ unroll at -O2 when it does
> not increase size.  Its just your patches make this as aggressive
> as -O3.

Just do a separate flag (and option) for cunroll, instead?

The RTL unroller is *the* unroller, and has been since forever.


Segher


Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-20 Thread Richard Biener via Gcc-patches
On Wed, May 20, 2020 at 11:08 AM Jiufu Guo  wrote:
>
> Richard Biener  writes:
>
> > On Wed, May 20, 2020 at 5:56 AM Jiufu Guo via Gcc-patches
> >  wrote:
> >>
> >> Hi,
> >>
> >> In r10-4525, and r10-4161, loop unroller was enabled for simply loops at 
> >> -O2.
> >> At the same time, the GIMPLE cunroll is also enabled, while it is not only 
> >> for
> >> simple loops.  This patch introduces a hook to check if a loop is suitable 
> >> to
> >> unroll completely.  The hook can be used to check if a loop is simply 
> >> enough
> >> for complete unroll, then avoid negative effects like size increasing or
> >> complex loop unrolling at -O2.
> >>
> >> This patch could help to handle the code in PR95018 which contains complex
> >> loop, and does not cause obvious performance change on SPEC2017.
> >>
> >> Bootstrap/regtest pass on powerpc64le. OK for trunk?
> >
> > I fear you're just digging the rs6000 specific hole you dug up deeper
> > with this ...
>
> I'm also thinking to add a check to cunroll directly without digging to
> platforms. As patch below.  This may not harm for platforms.
> Is this acceptable?
>
> Thanks,
> Jiufu
>
> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> index 8ab6ab3330c..67335b8a911 100644
> --- a/gcc/tree-ssa-loop-ivcanon.c
> +++ b/gcc/tree-ssa-loop-ivcanon.c
> @@ -1231,6 +1231,11 @@ canonicalize_loop_induction_variables (class loop 
> *loop,
>/* Force re-computation of loop bounds so we can remove redundant exits.  
> */
>maxiter = max_loop_iterations_int (loop);
>
> +  /* For optimize level 2, unroll and peel only for simple loops. */
> +  if (allow_peel && optimize == 2 && maxiter >= 4
> +  && !(TREE_CODE (niter) == INTEGER_CST && exit))
> +return false;
> +

I think this is the wrong way to approach this.  You're doing too many
things at once.  Try to fix the powerpc regression with the extra
flag_rtl_unroll_loops, that could be backported.  Then you can
independently see whether enabling more unrolling at -O2 makes
sense.  Because currently we _do_ unroll at -O2 when it does
not increase size.  Its just your patches make this as aggressive
as -O3.

Richard.

>if (dump_file && (dump_flags & TDF_DETAILS)
>&& TREE_CODE (niter) == INTEGER_CST)
>  {
>
>
> >
> > Richard.
> >
> >> Jiufu
> >>
> >> 2020-05-20  Jiufu Guo   
> >>
> >> PR target/95018
> >> * target.def (loop_allow_unroll_completely_peel): New hook.
> >> * doc/tm.texi (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook.
> >> * doc/tm.texi.in (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New 
> >> hook.
> >> * config/rs6000/rs6000.c 
> >> (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook.
> >> (rs6000_loop_allow_unroll_completely_peel): New hook 
> >> implementation.
> >> * tree-ssa-loop-ivcanon.c (canonicalize_loop_induction_variables): 
> >> Call
> >> hook loop_allow_unroll_completely_peel.
> >>
> >>
> >> ---
> >>  gcc/config/rs6000/rs6000.c  | 19 +++
> >>  gcc/doc/tm.texi |  9 +
> >>  gcc/doc/tm.texi.in  |  2 ++
> >>  gcc/target.def  | 12 
> >>  gcc/tree-ssa-loop-ivcanon.c |  5 +
> >>  5 files changed, 47 insertions(+)
> >>
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >> index 8435bc15d72..a1a3f9cb583 100644
> >> --- a/gcc/config/rs6000/rs6000.c
> >> +++ b/gcc/config/rs6000/rs6000.c
> >> @@ -1433,6 +1433,9 @@ static const struct attribute_spec 
> >> rs6000_attribute_table[] =
> >>  #undef TARGET_LOOP_UNROLL_ADJUST
> >>  #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
> >>
> >> +#undef TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL
> >> +#define TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
> >> rs6000_loop_allow_unroll_completely_peel
> >> +
> >>  #undef TARGET_INIT_BUILTINS
> >>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
> >>  #undef TARGET_BUILTIN_DECL
> >> @@ -5139,6 +5142,22 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct 
> >> loop *loop)
> >>return nunroll;
> >>  }
> >>
> >> +/* Implement targetm.loop_allow_unroll_completely_peel.  */
> >> +
> >> +static bool
> >> +rs6000_loop_allow_unroll_completely_peel (HOST_WIDE_INT maxiter, tree 
> >> niter,
> >> + struct loop *loop)
> >> +{
> >> +  if (unroll_only_small_loops && optimize == 2)
> >> +{
> >> +  if (maxiter >= 4
> >> + && !(TREE_CODE (niter) == INTEGER_CST && single_exit (loop)))
> >> +   return false;
> >> +}
> >> +
> >> +  return true;
> >> +}
> >> +
> >>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface 
> >> to a
> >> library with vectorized intrinsics.  */
> >>
> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> >> index 6e7d9dc54a9..f7419872c2f 100644
> >> --- a/gcc/doc/tm.texi
> >> +++ b/gcc/doc/tm.texi
> >> @@ -11893,6 +11893,15 @@ is required only when the target has special 
> >> constraints like maximum
> >>  number of 

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-20 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Wed, May 20, 2020 at 5:56 AM Jiufu Guo via Gcc-patches
>  wrote:
>>
>> Hi,
>>
>> In r10-4525, and r10-4161, loop unroller was enabled for simply loops at -O2.
>> At the same time, the GIMPLE cunroll is also enabled, while it is not only 
>> for
>> simple loops.  This patch introduces a hook to check if a loop is suitable to
>> unroll completely.  The hook can be used to check if a loop is simply enough
>> for complete unroll, then avoid negative effects like size increasing or
>> complex loop unrolling at -O2.
>>
>> This patch could help to handle the code in PR95018 which contains complex
>> loop, and does not cause obvious performance change on SPEC2017.
>>
>> Bootstrap/regtest pass on powerpc64le. OK for trunk?
>
> I fear you're just digging the rs6000 specific hole you dug up deeper
> with this ...

I'm also thinking to add a check to cunroll directly without digging to
platforms. As patch below.  This may not harm for platforms.
Is this acceptable?

Thanks,
Jiufu

diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 8ab6ab3330c..67335b8a911 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -1231,6 +1231,11 @@ canonicalize_loop_induction_variables (class loop *loop,
   /* Force re-computation of loop bounds so we can remove redundant exits.  */
   maxiter = max_loop_iterations_int (loop);
 
+  /* For optimize level 2, unroll and peel only for simple loops. */
+  if (allow_peel && optimize == 2 && maxiter >= 4
+  && !(TREE_CODE (niter) == INTEGER_CST && exit))
+return false;
+
   if (dump_file && (dump_flags & TDF_DETAILS)
   && TREE_CODE (niter) == INTEGER_CST)
 {


>
> Richard.
>
>> Jiufu
>>
>> 2020-05-20  Jiufu Guo   
>>
>> PR target/95018
>> * target.def (loop_allow_unroll_completely_peel): New hook.
>> * doc/tm.texi (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook.
>> * doc/tm.texi.in (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New 
>> hook.
>> * config/rs6000/rs6000.c (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): 
>> New hook.
>> (rs6000_loop_allow_unroll_completely_peel): New hook implementation.
>> * tree-ssa-loop-ivcanon.c (canonicalize_loop_induction_variables): 
>> Call
>> hook loop_allow_unroll_completely_peel.
>>
>>
>> ---
>>  gcc/config/rs6000/rs6000.c  | 19 +++
>>  gcc/doc/tm.texi |  9 +
>>  gcc/doc/tm.texi.in  |  2 ++
>>  gcc/target.def  | 12 
>>  gcc/tree-ssa-loop-ivcanon.c |  5 +
>>  5 files changed, 47 insertions(+)
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 8435bc15d72..a1a3f9cb583 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -1433,6 +1433,9 @@ static const struct attribute_spec 
>> rs6000_attribute_table[] =
>>  #undef TARGET_LOOP_UNROLL_ADJUST
>>  #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
>>
>> +#undef TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL
>> +#define TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
>> rs6000_loop_allow_unroll_completely_peel
>> +
>>  #undef TARGET_INIT_BUILTINS
>>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
>>  #undef TARGET_BUILTIN_DECL
>> @@ -5139,6 +5142,22 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct 
>> loop *loop)
>>return nunroll;
>>  }
>>
>> +/* Implement targetm.loop_allow_unroll_completely_peel.  */
>> +
>> +static bool
>> +rs6000_loop_allow_unroll_completely_peel (HOST_WIDE_INT maxiter, tree niter,
>> + struct loop *loop)
>> +{
>> +  if (unroll_only_small_loops && optimize == 2)
>> +{
>> +  if (maxiter >= 4
>> + && !(TREE_CODE (niter) == INTEGER_CST && single_exit (loop)))
>> +   return false;
>> +}
>> +
>> +  return true;
>> +}
>> +
>>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
>> library with vectorized intrinsics.  */
>>
>> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> index 6e7d9dc54a9..f7419872c2f 100644
>> --- a/gcc/doc/tm.texi
>> +++ b/gcc/doc/tm.texi
>> @@ -11893,6 +11893,15 @@ is required only when the target has special 
>> constraints like maximum
>>  number of memory accesses.
>>  @end deftypefn
>>
>> +@deftypefn {Target Hook} bool TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
>> (HOST_WIDE_INT @var{maxiter}, tree @var{niter}, class loop *@var{loop})
>> +This target hook check if the loop @var{loop} is suitable to unroll
>> +completely on the target. The parameter @var{maxiter} is the possible max
>> +bound of iterations. The parameter @var{niter} is the number expression of
>> +iterations the loop is executed. The parameter @var{loop} is a pointer to
>> +the loop. This target hook is required only when the target has special
>> +constraints.
>> +@end deftypefn
>> +
>>  @defmac POWI_MAX_MULTS
>>  If defined, this macro is interpreted as a signed integer C expression
>>  that specifies the 

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-20 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Wed, May 20, 2020 at 10:27 AM Jiufu Guo  wrote:
>>
>> Richard Biener  writes:
>>
>> > On Wed, May 20, 2020 at 5:56 AM Jiufu Guo via Gcc-patches
>> >  wrote:
>> >>
>> >> Hi,
>> >>
>> >> In r10-4525, and r10-4161, loop unroller was enabled for simply loops at 
>> >> -O2.
>> >> At the same time, the GIMPLE cunroll is also enabled, while it is not 
>> >> only for
>> >> simple loops.  This patch introduces a hook to check if a loop is 
>> >> suitable to
>> >> unroll completely.  The hook can be used to check if a loop is simply 
>> >> enough
>> >> for complete unroll, then avoid negative effects like size increasing or
>> >> complex loop unrolling at -O2.
>> >>
>> >> This patch could help to handle the code in PR95018 which contains complex
>> >> loop, and does not cause obvious performance change on SPEC2017.
>> >>
>> >> Bootstrap/regtest pass on powerpc64le. OK for trunk?
>> >
>> > I fear you're just digging the rs6000 specific hole you dug up deeper
>> > with this ...
>>
>> Yes, this is because r10-4525 and r10-4161 adjust RTL unroller hook for
>> rs6000, and they have the effect to enable GIMPLE cunroll at O2 for all
>> loops.  So, I introduce this patch to adjust GIMPLE cunroll similarly.
>
> I think a better approach may be to leave GIMPLE unroll untouched by
> changing the earlier patch to set a new flag_rtl_unroll_loops
> (where -funroll-loops would enable -frtl-unroll-loops) and gate the RTL
> unroller on flag_rtl_unroll_loops which you can enable separately
> on rs6000.
>
> That is, not add even more rs6000 specialities but restrict the earlier
> one to what it was supposed to do - the GIMPLE effects were clearly
> not your goal.

I also lean to enable GIMPLE cunroll for O2 (at that time only for
rs6000), especially for those loops which are simple and not increase
size too much.

Thanks Richard.

>
> Richard.
>
>> Thanks,
>> Jiufu
>>
>> >
>> > Richard.
>> >
>> >> Jiufu
>> >>
>> >> 2020-05-20  Jiufu Guo   
>> >>
>> >> PR target/95018
>> >> * target.def (loop_allow_unroll_completely_peel): New hook.
>> >> * doc/tm.texi (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New 
>> >> hook.
>> >> * doc/tm.texi.in (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New 
>> >> hook.
>> >> * config/rs6000/rs6000.c 
>> >> (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook.
>> >> (rs6000_loop_allow_unroll_completely_peel): New hook 
>> >> implementation.
>> >> * tree-ssa-loop-ivcanon.c 
>> >> (canonicalize_loop_induction_variables): Call
>> >> hook loop_allow_unroll_completely_peel.
>> >>
>> >>
>> >> ---
>> >>  gcc/config/rs6000/rs6000.c  | 19 +++
>> >>  gcc/doc/tm.texi |  9 +
>> >>  gcc/doc/tm.texi.in  |  2 ++
>> >>  gcc/target.def  | 12 
>> >>  gcc/tree-ssa-loop-ivcanon.c |  5 +
>> >>  5 files changed, 47 insertions(+)
>> >>
>> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> >> index 8435bc15d72..a1a3f9cb583 100644
>> >> --- a/gcc/config/rs6000/rs6000.c
>> >> +++ b/gcc/config/rs6000/rs6000.c
>> >> @@ -1433,6 +1433,9 @@ static const struct attribute_spec 
>> >> rs6000_attribute_table[] =
>> >>  #undef TARGET_LOOP_UNROLL_ADJUST
>> >>  #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
>> >>
>> >> +#undef TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL
>> >> +#define TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
>> >> rs6000_loop_allow_unroll_completely_peel
>> >> +
>> >>  #undef TARGET_INIT_BUILTINS
>> >>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
>> >>  #undef TARGET_BUILTIN_DECL
>> >> @@ -5139,6 +5142,22 @@ rs6000_loop_unroll_adjust (unsigned nunroll, 
>> >> struct loop *loop)
>> >>return nunroll;
>> >>  }
>> >>
>> >> +/* Implement targetm.loop_allow_unroll_completely_peel.  */
>> >> +
>> >> +static bool
>> >> +rs6000_loop_allow_unroll_completely_peel (HOST_WIDE_INT maxiter, tree 
>> >> niter,
>> >> + struct loop *loop)
>> >> +{
>> >> +  if (unroll_only_small_loops && optimize == 2)
>> >> +{
>> >> +  if (maxiter >= 4
>> >> + && !(TREE_CODE (niter) == INTEGER_CST && single_exit (loop)))
>> >> +   return false;
>> >> +}
>> >> +
>> >> +  return true;
>> >> +}
>> >> +
>> >>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface 
>> >> to a
>> >> library with vectorized intrinsics.  */
>> >>
>> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> >> index 6e7d9dc54a9..f7419872c2f 100644
>> >> --- a/gcc/doc/tm.texi
>> >> +++ b/gcc/doc/tm.texi
>> >> @@ -11893,6 +11893,15 @@ is required only when the target has special 
>> >> constraints like maximum
>> >>  number of memory accesses.
>> >>  @end deftypefn
>> >>
>> >> +@deftypefn {Target Hook} bool TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
>> >> (HOST_WIDE_INT @var{maxiter}, tree @var{niter}, class loop *@var{loop})
>> >> +This target hook check if the loop @var{loop} is suitable to unroll
>> >> 

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-20 Thread Richard Biener via Gcc-patches
On Wed, May 20, 2020 at 10:27 AM Jiufu Guo  wrote:
>
> Richard Biener  writes:
>
> > On Wed, May 20, 2020 at 5:56 AM Jiufu Guo via Gcc-patches
> >  wrote:
> >>
> >> Hi,
> >>
> >> In r10-4525, and r10-4161, loop unroller was enabled for simply loops at 
> >> -O2.
> >> At the same time, the GIMPLE cunroll is also enabled, while it is not only 
> >> for
> >> simple loops.  This patch introduces a hook to check if a loop is suitable 
> >> to
> >> unroll completely.  The hook can be used to check if a loop is simply 
> >> enough
> >> for complete unroll, then avoid negative effects like size increasing or
> >> complex loop unrolling at -O2.
> >>
> >> This patch could help to handle the code in PR95018 which contains complex
> >> loop, and does not cause obvious performance change on SPEC2017.
> >>
> >> Bootstrap/regtest pass on powerpc64le. OK for trunk?
> >
> > I fear you're just digging the rs6000 specific hole you dug up deeper
> > with this ...
>
> Yes, this is because r10-4525 and r10-4161 adjust RTL unroller hook for
> rs6000, and they have the effect to enable GIMPLE cunroll at O2 for all
> loops.  So, I introduce this patch to adjust GIMPLE cunroll similarly.

I think a better approach may be to leave GIMPLE unroll untouched by
changing the earlier patch to set a new flag_rtl_unroll_loops
(where -funroll-loops would enable -frtl-unroll-loops) and gate the RTL
unroller on flag_rtl_unroll_loops which you can enable separately
on rs6000.

That is, not add even more rs6000 specialities but restrict the earlier
one to what it was supposed to do - the GIMPLE effects were clearly
not your goal.

Richard.

> Thanks,
> Jiufu
>
> >
> > Richard.
> >
> >> Jiufu
> >>
> >> 2020-05-20  Jiufu Guo   
> >>
> >> PR target/95018
> >> * target.def (loop_allow_unroll_completely_peel): New hook.
> >> * doc/tm.texi (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook.
> >> * doc/tm.texi.in (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New 
> >> hook.
> >> * config/rs6000/rs6000.c 
> >> (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook.
> >> (rs6000_loop_allow_unroll_completely_peel): New hook 
> >> implementation.
> >> * tree-ssa-loop-ivcanon.c (canonicalize_loop_induction_variables): 
> >> Call
> >> hook loop_allow_unroll_completely_peel.
> >>
> >>
> >> ---
> >>  gcc/config/rs6000/rs6000.c  | 19 +++
> >>  gcc/doc/tm.texi |  9 +
> >>  gcc/doc/tm.texi.in  |  2 ++
> >>  gcc/target.def  | 12 
> >>  gcc/tree-ssa-loop-ivcanon.c |  5 +
> >>  5 files changed, 47 insertions(+)
> >>
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >> index 8435bc15d72..a1a3f9cb583 100644
> >> --- a/gcc/config/rs6000/rs6000.c
> >> +++ b/gcc/config/rs6000/rs6000.c
> >> @@ -1433,6 +1433,9 @@ static const struct attribute_spec 
> >> rs6000_attribute_table[] =
> >>  #undef TARGET_LOOP_UNROLL_ADJUST
> >>  #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
> >>
> >> +#undef TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL
> >> +#define TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
> >> rs6000_loop_allow_unroll_completely_peel
> >> +
> >>  #undef TARGET_INIT_BUILTINS
> >>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
> >>  #undef TARGET_BUILTIN_DECL
> >> @@ -5139,6 +5142,22 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct 
> >> loop *loop)
> >>return nunroll;
> >>  }
> >>
> >> +/* Implement targetm.loop_allow_unroll_completely_peel.  */
> >> +
> >> +static bool
> >> +rs6000_loop_allow_unroll_completely_peel (HOST_WIDE_INT maxiter, tree 
> >> niter,
> >> + struct loop *loop)
> >> +{
> >> +  if (unroll_only_small_loops && optimize == 2)
> >> +{
> >> +  if (maxiter >= 4
> >> + && !(TREE_CODE (niter) == INTEGER_CST && single_exit (loop)))
> >> +   return false;
> >> +}
> >> +
> >> +  return true;
> >> +}
> >> +
> >>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface 
> >> to a
> >> library with vectorized intrinsics.  */
> >>
> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> >> index 6e7d9dc54a9..f7419872c2f 100644
> >> --- a/gcc/doc/tm.texi
> >> +++ b/gcc/doc/tm.texi
> >> @@ -11893,6 +11893,15 @@ is required only when the target has special 
> >> constraints like maximum
> >>  number of memory accesses.
> >>  @end deftypefn
> >>
> >> +@deftypefn {Target Hook} bool TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
> >> (HOST_WIDE_INT @var{maxiter}, tree @var{niter}, class loop *@var{loop})
> >> +This target hook check if the loop @var{loop} is suitable to unroll
> >> +completely on the target. The parameter @var{maxiter} is the possible max
> >> +bound of iterations. The parameter @var{niter} is the number expression of
> >> +iterations the loop is executed. The parameter @var{loop} is a pointer to
> >> +the loop. This target hook is required only when the target has special
> >> +constraints.
> >> +@end 

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-20 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Wed, May 20, 2020 at 5:56 AM Jiufu Guo via Gcc-patches
>  wrote:
>>
>> Hi,
>>
>> In r10-4525, and r10-4161, loop unroller was enabled for simply loops at -O2.
>> At the same time, the GIMPLE cunroll is also enabled, while it is not only 
>> for
>> simple loops.  This patch introduces a hook to check if a loop is suitable to
>> unroll completely.  The hook can be used to check if a loop is simply enough
>> for complete unroll, then avoid negative effects like size increasing or
>> complex loop unrolling at -O2.
>>
>> This patch could help to handle the code in PR95018 which contains complex
>> loop, and does not cause obvious performance change on SPEC2017.
>>
>> Bootstrap/regtest pass on powerpc64le. OK for trunk?
>
> I fear you're just digging the rs6000 specific hole you dug up deeper
> with this ...

Yes, this is because r10-4525 and r10-4161 adjust RTL unroller hook for
rs6000, and they have the effect to enable GIMPLE cunroll at O2 for all
loops.  So, I introduce this patch to adjust GIMPLE cunroll similarly.

Thanks,
Jiufu

>
> Richard.
>
>> Jiufu
>>
>> 2020-05-20  Jiufu Guo   
>>
>> PR target/95018
>> * target.def (loop_allow_unroll_completely_peel): New hook.
>> * doc/tm.texi (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook.
>> * doc/tm.texi.in (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New 
>> hook.
>> * config/rs6000/rs6000.c (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): 
>> New hook.
>> (rs6000_loop_allow_unroll_completely_peel): New hook implementation.
>> * tree-ssa-loop-ivcanon.c (canonicalize_loop_induction_variables): 
>> Call
>> hook loop_allow_unroll_completely_peel.
>>
>>
>> ---
>>  gcc/config/rs6000/rs6000.c  | 19 +++
>>  gcc/doc/tm.texi |  9 +
>>  gcc/doc/tm.texi.in  |  2 ++
>>  gcc/target.def  | 12 
>>  gcc/tree-ssa-loop-ivcanon.c |  5 +
>>  5 files changed, 47 insertions(+)
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 8435bc15d72..a1a3f9cb583 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -1433,6 +1433,9 @@ static const struct attribute_spec 
>> rs6000_attribute_table[] =
>>  #undef TARGET_LOOP_UNROLL_ADJUST
>>  #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
>>
>> +#undef TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL
>> +#define TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
>> rs6000_loop_allow_unroll_completely_peel
>> +
>>  #undef TARGET_INIT_BUILTINS
>>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
>>  #undef TARGET_BUILTIN_DECL
>> @@ -5139,6 +5142,22 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct 
>> loop *loop)
>>return nunroll;
>>  }
>>
>> +/* Implement targetm.loop_allow_unroll_completely_peel.  */
>> +
>> +static bool
>> +rs6000_loop_allow_unroll_completely_peel (HOST_WIDE_INT maxiter, tree niter,
>> + struct loop *loop)
>> +{
>> +  if (unroll_only_small_loops && optimize == 2)
>> +{
>> +  if (maxiter >= 4
>> + && !(TREE_CODE (niter) == INTEGER_CST && single_exit (loop)))
>> +   return false;
>> +}
>> +
>> +  return true;
>> +}
>> +
>>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
>> library with vectorized intrinsics.  */
>>
>> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> index 6e7d9dc54a9..f7419872c2f 100644
>> --- a/gcc/doc/tm.texi
>> +++ b/gcc/doc/tm.texi
>> @@ -11893,6 +11893,15 @@ is required only when the target has special 
>> constraints like maximum
>>  number of memory accesses.
>>  @end deftypefn
>>
>> +@deftypefn {Target Hook} bool TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
>> (HOST_WIDE_INT @var{maxiter}, tree @var{niter}, class loop *@var{loop})
>> +This target hook check if the loop @var{loop} is suitable to unroll
>> +completely on the target. The parameter @var{maxiter} is the possible max
>> +bound of iterations. The parameter @var{niter} is the number expression of
>> +iterations the loop is executed. The parameter @var{loop} is a pointer to
>> +the loop. This target hook is required only when the target has special
>> +constraints.
>> +@end deftypefn
>> +
>>  @defmac POWI_MAX_MULTS
>>  If defined, this macro is interpreted as a signed integer C expression
>>  that specifies the maximum number of floating point multiplications
>> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> index 3be984bbd5c..4ae079a650c 100644
>> --- a/gcc/doc/tm.texi.in
>> +++ b/gcc/doc/tm.texi.in
>> @@ -8010,6 +8010,8 @@ lists.
>>
>>  @hook TARGET_LOOP_UNROLL_ADJUST
>>
>> +@hook TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL
>> +
>>  @defmac POWI_MAX_MULTS
>>  If defined, this macro is interpreted as a signed integer C expression
>>  that specifies the maximum number of floating point multiplications
>> diff --git a/gcc/target.def b/gcc/target.def
>> index 07059a87caf..3842ba611a2 100644
>> --- a/gcc/target.def
>> 

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-20 Thread Richard Biener via Gcc-patches
On Wed, May 20, 2020 at 5:56 AM Jiufu Guo via Gcc-patches
 wrote:
>
> Hi,
>
> In r10-4525, and r10-4161, loop unroller was enabled for simply loops at -O2.
> At the same time, the GIMPLE cunroll is also enabled, while it is not only for
> simple loops.  This patch introduces a hook to check if a loop is suitable to
> unroll completely.  The hook can be used to check if a loop is simply enough
> for complete unroll, then avoid negative effects like size increasing or
> complex loop unrolling at -O2.
>
> This patch could help to handle the code in PR95018 which contains complex
> loop, and does not cause obvious performance change on SPEC2017.
>
> Bootstrap/regtest pass on powerpc64le. OK for trunk?

I fear you're just digging the rs6000 specific hole you dug up deeper
with this ...

Richard.

> Jiufu
>
> 2020-05-20  Jiufu Guo   
>
> PR target/95018
> * target.def (loop_allow_unroll_completely_peel): New hook.
> * doc/tm.texi (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook.
> * doc/tm.texi.in (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook.
> * config/rs6000/rs6000.c (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): 
> New hook.
> (rs6000_loop_allow_unroll_completely_peel): New hook implementation.
> * tree-ssa-loop-ivcanon.c (canonicalize_loop_induction_variables): 
> Call
> hook loop_allow_unroll_completely_peel.
>
>
> ---
>  gcc/config/rs6000/rs6000.c  | 19 +++
>  gcc/doc/tm.texi |  9 +
>  gcc/doc/tm.texi.in  |  2 ++
>  gcc/target.def  | 12 
>  gcc/tree-ssa-loop-ivcanon.c |  5 +
>  5 files changed, 47 insertions(+)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 8435bc15d72..a1a3f9cb583 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -1433,6 +1433,9 @@ static const struct attribute_spec 
> rs6000_attribute_table[] =
>  #undef TARGET_LOOP_UNROLL_ADJUST
>  #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
>
> +#undef TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL
> +#define TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
> rs6000_loop_allow_unroll_completely_peel
> +
>  #undef TARGET_INIT_BUILTINS
>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
>  #undef TARGET_BUILTIN_DECL
> @@ -5139,6 +5142,22 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct 
> loop *loop)
>return nunroll;
>  }
>
> +/* Implement targetm.loop_allow_unroll_completely_peel.  */
> +
> +static bool
> +rs6000_loop_allow_unroll_completely_peel (HOST_WIDE_INT maxiter, tree niter,
> + struct loop *loop)
> +{
> +  if (unroll_only_small_loops && optimize == 2)
> +{
> +  if (maxiter >= 4
> + && !(TREE_CODE (niter) == INTEGER_CST && single_exit (loop)))
> +   return false;
> +}
> +
> +  return true;
> +}
> +
>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
> library with vectorized intrinsics.  */
>
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 6e7d9dc54a9..f7419872c2f 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -11893,6 +11893,15 @@ is required only when the target has special 
> constraints like maximum
>  number of memory accesses.
>  @end deftypefn
>
> +@deftypefn {Target Hook} bool TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
> (HOST_WIDE_INT @var{maxiter}, tree @var{niter}, class loop *@var{loop})
> +This target hook check if the loop @var{loop} is suitable to unroll
> +completely on the target. The parameter @var{maxiter} is the possible max
> +bound of iterations. The parameter @var{niter} is the number expression of
> +iterations the loop is executed. The parameter @var{loop} is a pointer to
> +the loop. This target hook is required only when the target has special
> +constraints.
> +@end deftypefn
> +
>  @defmac POWI_MAX_MULTS
>  If defined, this macro is interpreted as a signed integer C expression
>  that specifies the maximum number of floating point multiplications
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 3be984bbd5c..4ae079a650c 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -8010,6 +8010,8 @@ lists.
>
>  @hook TARGET_LOOP_UNROLL_ADJUST
>
> +@hook TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL
> +
>  @defmac POWI_MAX_MULTS
>  If defined, this macro is interpreted as a signed integer C expression
>  that specifies the maximum number of floating point multiplications
> diff --git a/gcc/target.def b/gcc/target.def
> index 07059a87caf..3842ba611a2 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -2709,6 +2709,18 @@ number of memory accesses.",
>   unsigned, (unsigned nunroll, class loop *loop),
>   NULL)
>
> +/* Return if a loop is suitable for complete unroll.  */
> +DEFHOOK
> +(loop_allow_unroll_completely_peel,
> + "This target hook check if the loop @var{loop} is suitable to unroll\n\
> +completely on the target. The parameter @var{maxiter} is the possible 

[PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-19 Thread Jiufu Guo via Gcc-patches
Hi,

In r10-4525, and r10-4161, loop unroller was enabled for simply loops at -O2.
At the same time, the GIMPLE cunroll is also enabled, while it is not only for
simple loops.  This patch introduces a hook to check if a loop is suitable to
unroll completely.  The hook can be used to check if a loop is simply enough
for complete unroll, then avoid negative effects like size increasing or
complex loop unrolling at -O2.

This patch could help to handle the code in PR95018 which contains complex
loop, and does not cause obvious performance change on SPEC2017.

Bootstrap/regtest pass on powerpc64le. OK for trunk?

Jiufu

2020-05-20  Jiufu Guo   

PR target/95018
* target.def (loop_allow_unroll_completely_peel): New hook.
* doc/tm.texi (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook.
* doc/tm.texi.in (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook.
* config/rs6000/rs6000.c (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): 
New hook.
(rs6000_loop_allow_unroll_completely_peel): New hook implementation.
* tree-ssa-loop-ivcanon.c (canonicalize_loop_induction_variables): Call
hook loop_allow_unroll_completely_peel.


---
 gcc/config/rs6000/rs6000.c  | 19 +++
 gcc/doc/tm.texi |  9 +
 gcc/doc/tm.texi.in  |  2 ++
 gcc/target.def  | 12 
 gcc/tree-ssa-loop-ivcanon.c |  5 +
 5 files changed, 47 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8435bc15d72..a1a3f9cb583 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1433,6 +1433,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_LOOP_UNROLL_ADJUST
 #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
 
+#undef TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL
+#define TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
rs6000_loop_allow_unroll_completely_peel
+
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS rs6000_init_builtins
 #undef TARGET_BUILTIN_DECL
@@ -5139,6 +5142,22 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct loop 
*loop)
   return nunroll;
 }
 
+/* Implement targetm.loop_allow_unroll_completely_peel.  */
+
+static bool
+rs6000_loop_allow_unroll_completely_peel (HOST_WIDE_INT maxiter, tree niter,
+ struct loop *loop)
+{
+  if (unroll_only_small_loops && optimize == 2)
+{
+  if (maxiter >= 4
+ && !(TREE_CODE (niter) == INTEGER_CST && single_exit (loop)))
+   return false;
+}
+
+  return true;
+}
+
 /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
library with vectorized intrinsics.  */
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 6e7d9dc54a9..f7419872c2f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11893,6 +11893,15 @@ is required only when the target has special 
constraints like maximum
 number of memory accesses.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
(HOST_WIDE_INT @var{maxiter}, tree @var{niter}, class loop *@var{loop})
+This target hook check if the loop @var{loop} is suitable to unroll
+completely on the target. The parameter @var{maxiter} is the possible max
+bound of iterations. The parameter @var{niter} is the number expression of
+iterations the loop is executed. The parameter @var{loop} is a pointer to
+the loop. This target hook is required only when the target has special
+constraints.
+@end deftypefn
+
 @defmac POWI_MAX_MULTS
 If defined, this macro is interpreted as a signed integer C expression
 that specifies the maximum number of floating point multiplications
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 3be984bbd5c..4ae079a650c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8010,6 +8010,8 @@ lists.
 
 @hook TARGET_LOOP_UNROLL_ADJUST
 
+@hook TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL
+
 @defmac POWI_MAX_MULTS
 If defined, this macro is interpreted as a signed integer C expression
 that specifies the maximum number of floating point multiplications
diff --git a/gcc/target.def b/gcc/target.def
index 07059a87caf..3842ba611a2 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2709,6 +2709,18 @@ number of memory accesses.",
  unsigned, (unsigned nunroll, class loop *loop),
  NULL)
 
+/* Return if a loop is suitable for complete unroll.  */
+DEFHOOK
+(loop_allow_unroll_completely_peel,
+ "This target hook check if the loop @var{loop} is suitable to unroll\n\
+completely on the target. The parameter @var{maxiter} is the possible max\n\
+bound of iterations. The parameter @var{niter} is the number expression of\n\
+iterations the loop is executed. The parameter @var{loop} is a pointer to\n\
+the loop. This target hook is required only when the target has special\n\
+constraints.",
+ bool, (HOST_WIDE_INT maxiter, tree niter, class loop *loop),
+ NULL)
+
 /* True if X is a legitimate MODE-mode immediate operand.  */