Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-09-14 Thread Kewen.Lin
on 2019/9/12 下午4:14, Richard Biener wrote:
> On Wed, 11 Sep 2019, Kewen.Lin wrote:
> 
>> Hi,
>>
>> Sorry for the late update.  I've updated the words of target hooks part.
>>
>> Could someone help to review it?  Thanks in advance!
>>
>> By the way, as previous emails in this thread, Bin has approved the IVOPTs
>> part, while Segher has approved the rs6000 part.
> 
> The target hooks part is OK.  I guess we'll have to extend it eventually
> in case other targets want to make use of it.
> 

Thanks Richard!  Committed by r275713.

Yes, it's enough when doloop IV costs zero or infinite for generic/address use,
but if one target wants some other values, we may have to take it as one common
cost shared for all generic/address uses.  It's like IV candidate cost but not
the same since it's only needed when doloop IV is used for generic/address uses,
I guess it requires some changes in candidate set cost calculation.  I chose to
keep it simple at the first place, but radar on for any other target adoptions.


Thanks,
Kewen

> Thanks,
> Richard.
> 
>>
>> Thanks,
>> Kewen
>>
>> -
>>
>> gcc/ChangeLog
>>
>> 2019-09-11  Kewen Lin  
>>
>>  PR middle-end/80791
>>  * config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macro.
>>  (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise.
>>  (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise.
>>  * target.def (have_count_reg_decr_p): New hook.
>>  (doloop_cost_for_generic): Likewise.
>>  (doloop_cost_for_address): Likewise.
>>  * doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): Likewise.
>>  (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise.
>>  (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise.
>>  * doc/tm.texi: Regenerate.
>>  * tree-ssa-loop-ivopts.c (comp_cost::operator+=): Consider infinite cost
>>  addend.
>>  (record_group): Init doloop_p.
>>  (add_candidate_1): Add optional argument doloop, change the handlings
>>  accordingly.
>>  (add_candidate): Likewise.
>>  (generic_predict_doloop_p): Update attribute.
>>  (force_expr_to_var_cost): Add costing for expressions COND_EXPR/LT_EXPR/
>>  LE_EXPR/GT_EXPR/GE_EXPR/EQ_EXPR/NE_EXPR/UNORDERED_EXPR/ORDERED_EXPR/
>>  UNLT_EXPR/UNLE_EXPR/UNGT_EXPR/UNGE_EXPR/UNEQ_EXPR/LTGT_EXPR/MAX_EXPR/
>>  MIN_EXPR.
>>  (get_computation_cost): Update for doloop IV cand extra cost.   
>>  (determine_group_iv_cost_cond): Update for doloop IV cand.
>>  (determine_iv_cost): Likewise.
>>  (ivopts_estimate_reg_pressure): Likewise.
>>  (may_eliminate_iv): Update handlings for doloop IV cand.
>>  (add_iv_candidate_for_doloop): New function.
>>  (find_iv_candidates): Call function add_iv_candidate_for_doloop.
>>  (iv_ca_set_no_cp): Update for doloop IV cand.
>>  (iv_ca_set_cp): Likewise.
>>  (iv_ca_dump): Dump register cost.
>>  (find_doloop_use): New function.
>>  (analyze_and_mark_doloop_use): Likewise.
>>  (tree_ssa_iv_optimize_loop): Call function analyze_and_mark_doloop_use.
>>
>> gcc/testsuite/ChangeLog
>>
>> 2019-09-11  Kewen Lin  
>>
>>  PR middle-end/80791
>>  * gcc.dg/tree-ssa/ivopts-3.c: Adjust for doloop change.
>>  * gcc.dg/tree-ssa/ivopts-lt.c: Likewise.
>>  * gcc.dg/tree-ssa/pr32044.c: Likewise.
>>
>>
>> on 2019/8/23 下午6:18, Segher Boessenkool wrote:
>>> Hi!
>>>
>>> On Fri, Aug 23, 2019 at 05:43:32PM +0800, Bin.Cheng wrote:
 On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin  wrote:
 Not sure if non-ivopts parts are already approved?  If so, the patch
 is okay with above issues addressed.
>>>
>>> The rs6000 part is fine.  The target.def entries need some spell check
>>> and copy-editing, but are obvious and trivial otherwise, and/or you can
>>> approve it as ivopts maintainer.
>>>
 Thanks very much for your time!
>>>
>>> And thank you as well Bin :-)
>>>
>>>
>>> Segher
>>>
>>
> 



Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-09-12 Thread Richard Biener
On Wed, 11 Sep 2019, Kewen.Lin wrote:

> Hi,
> 
> Sorry for the late update.  I've updated the words of target hooks part.
> 
> Could someone help to review it?  Thanks in advance!
> 
> By the way, as previous emails in this thread, Bin has approved the IVOPTs
> part, while Segher has approved the rs6000 part.

The target hooks part is OK.  I guess we'll have to extend it eventually
in case other targets want to make use of it.

Thanks,
Richard.

> 
> Thanks,
> Kewen
> 
> -
> 
> gcc/ChangeLog
> 
> 2019-09-11  Kewen Lin  
> 
>   PR middle-end/80791
>   * config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macro.
>   (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise.
>   (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise.
>   * target.def (have_count_reg_decr_p): New hook.
>   (doloop_cost_for_generic): Likewise.
>   (doloop_cost_for_address): Likewise.
>   * doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): Likewise.
>   (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise.
>   (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise.
>   * doc/tm.texi: Regenerate.
>   * tree-ssa-loop-ivopts.c (comp_cost::operator+=): Consider infinite cost
>   addend.
>   (record_group): Init doloop_p.
>   (add_candidate_1): Add optional argument doloop, change the handlings
>   accordingly.
>   (add_candidate): Likewise.
>   (generic_predict_doloop_p): Update attribute.
>   (force_expr_to_var_cost): Add costing for expressions COND_EXPR/LT_EXPR/
>   LE_EXPR/GT_EXPR/GE_EXPR/EQ_EXPR/NE_EXPR/UNORDERED_EXPR/ORDERED_EXPR/
>   UNLT_EXPR/UNLE_EXPR/UNGT_EXPR/UNGE_EXPR/UNEQ_EXPR/LTGT_EXPR/MAX_EXPR/
>   MIN_EXPR.
>   (get_computation_cost): Update for doloop IV cand extra cost.   
>   (determine_group_iv_cost_cond): Update for doloop IV cand.
>   (determine_iv_cost): Likewise.
>   (ivopts_estimate_reg_pressure): Likewise.
>   (may_eliminate_iv): Update handlings for doloop IV cand.
>   (add_iv_candidate_for_doloop): New function.
>   (find_iv_candidates): Call function add_iv_candidate_for_doloop.
>   (iv_ca_set_no_cp): Update for doloop IV cand.
>   (iv_ca_set_cp): Likewise.
>   (iv_ca_dump): Dump register cost.
>   (find_doloop_use): New function.
>   (analyze_and_mark_doloop_use): Likewise.
>   (tree_ssa_iv_optimize_loop): Call function analyze_and_mark_doloop_use.
> 
> gcc/testsuite/ChangeLog
> 
> 2019-09-11  Kewen Lin  
> 
>   PR middle-end/80791
>   * gcc.dg/tree-ssa/ivopts-3.c: Adjust for doloop change.
>   * gcc.dg/tree-ssa/ivopts-lt.c: Likewise.
>   * gcc.dg/tree-ssa/pr32044.c: Likewise.
> 
> 
> on 2019/8/23 下午6:18, Segher Boessenkool wrote:
> > Hi!
> > 
> > On Fri, Aug 23, 2019 at 05:43:32PM +0800, Bin.Cheng wrote:
> >> On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin  wrote:
> >> Not sure if non-ivopts parts are already approved?  If so, the patch
> >> is okay with above issues addressed.
> > 
> > The rs6000 part is fine.  The target.def entries need some spell check
> > and copy-editing, but are obvious and trivial otherwise, and/or you can
> > approve it as ivopts maintainer.
> > 
> >> Thanks very much for your time!
> > 
> > And thank you as well Bin :-)
> > 
> > 
> > Segher
> > 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-09-11 Thread Kewen.Lin
Hi,

Sorry for the late update.  I've updated the words of target hooks part.

Could someone help to review it?  Thanks in advance!

By the way, as previous emails in this thread, Bin has approved the IVOPTs
part, while Segher has approved the rs6000 part.


Thanks,
Kewen

-

gcc/ChangeLog

2019-09-11  Kewen Lin  

PR middle-end/80791
* config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macro.
(TARGET_DOLOOP_COST_FOR_GENERIC): Likewise.
(TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise.
* target.def (have_count_reg_decr_p): New hook.
(doloop_cost_for_generic): Likewise.
(doloop_cost_for_address): Likewise.
* doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): Likewise.
(TARGET_DOLOOP_COST_FOR_GENERIC): Likewise.
(TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise.
* doc/tm.texi: Regenerate.
* tree-ssa-loop-ivopts.c (comp_cost::operator+=): Consider infinite cost
addend.
(record_group): Init doloop_p.
(add_candidate_1): Add optional argument doloop, change the handlings
accordingly.
(add_candidate): Likewise.
(generic_predict_doloop_p): Update attribute.
(force_expr_to_var_cost): Add costing for expressions COND_EXPR/LT_EXPR/
LE_EXPR/GT_EXPR/GE_EXPR/EQ_EXPR/NE_EXPR/UNORDERED_EXPR/ORDERED_EXPR/
UNLT_EXPR/UNLE_EXPR/UNGT_EXPR/UNGE_EXPR/UNEQ_EXPR/LTGT_EXPR/MAX_EXPR/
MIN_EXPR.
(get_computation_cost): Update for doloop IV cand extra cost.   
(determine_group_iv_cost_cond): Update for doloop IV cand.
(determine_iv_cost): Likewise.
(ivopts_estimate_reg_pressure): Likewise.
(may_eliminate_iv): Update handlings for doloop IV cand.
(add_iv_candidate_for_doloop): New function.
(find_iv_candidates): Call function add_iv_candidate_for_doloop.
(iv_ca_set_no_cp): Update for doloop IV cand.
(iv_ca_set_cp): Likewise.
(iv_ca_dump): Dump register cost.
(find_doloop_use): New function.
(analyze_and_mark_doloop_use): Likewise.
(tree_ssa_iv_optimize_loop): Call function analyze_and_mark_doloop_use.

gcc/testsuite/ChangeLog

2019-09-11  Kewen Lin  

PR middle-end/80791
* gcc.dg/tree-ssa/ivopts-3.c: Adjust for doloop change.
* gcc.dg/tree-ssa/ivopts-lt.c: Likewise.
* gcc.dg/tree-ssa/pr32044.c: Likewise.


on 2019/8/23 下午6:18, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Aug 23, 2019 at 05:43:32PM +0800, Bin.Cheng wrote:
>> On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin  wrote:
>> Not sure if non-ivopts parts are already approved?  If so, the patch
>> is okay with above issues addressed.
> 
> The rs6000 part is fine.  The target.def entries need some spell check
> and copy-editing, but are obvious and trivial otherwise, and/or you can
> approve it as ivopts maintainer.
> 
>> Thanks very much for your time!
> 
> And thank you as well Bin :-)
> 
> 
> Segher
> 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6667cd0..5eccbdc 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1912,6 +1912,16 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_PREDICT_DOLOOP_P
 #define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p
 
+#undef TARGET_HAVE_COUNT_REG_DECR_P
+#define TARGET_HAVE_COUNT_REG_DECR_P true
+
+/* 10 is infinite cost in IVOPTs.  */
+#undef TARGET_DOLOOP_COST_FOR_GENERIC
+#define TARGET_DOLOOP_COST_FOR_GENERIC 10
+
+#undef TARGET_DOLOOP_COST_FOR_ADDRESS
+#define TARGET_DOLOOP_COST_FOR_ADDRESS 10
+
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c2aa4d0..2d3015c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11618,6 +11618,36 @@ loops, and will help ivopts to make some decisions.
 The default version of this hook returns false.
 @end deftypefn
 
+@deftypevr {Target Hook} bool TARGET_HAVE_COUNT_REG_DECR_P
+Return true if the target supports hardware count register for decrement
+and branch.
+The default value is false.
+@end deftypevr
+
+@deftypevr {Target Hook} int64_t TARGET_DOLOOP_COST_FOR_GENERIC
+One IV candidate dedicated for doloop is introduced in IVOPTs, we can
+calculate the computation cost of adopting it to any generic IV use by
+function get_computation_cost as before.  But for targets which have
+hardware count register support for decrement and branch, it may have to
+move IV value from hardware count register to general purpose register
+while doloop IV candidate is used for generic IV uses.  It probably takes
+expensive penalty.  This hook allows target owners to define the cost for
+this especially for generic IV uses.
+The default value is zero.
+@end deftypevr
+
+@deftypevr {Target Hook} int64_t TARGET_DOLOOP_COST_FOR_ADDRESS
+One IV candidate dedicated for doloop is introduced in IVOPTs, 

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-24 Thread Kewen.Lin
Hi Bin,

on 2019/8/23 下午5:43, Bin.Cheng wrote:
> On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin  wrote:
>>
>> Hi Bin
>>
>> on 2019/8/23 上午10:19, Bin.Cheng wrote:
>>> On Thu, Aug 22, 2019 at 3:09 PM Kewen.Lin  wrote:

 Hi Bin,

 on 2019/8/22 下午1:46, Bin.Cheng wrote:
> On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin  wrote:
>>
>> Hi Bin,
>>
>> Thanks for your time!
>>
>> on 2019/8/21 下午8:32, Bin.Cheng wrote:
>>> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin  wrote:

 Hi!

 Comparing to the previous versions of implementation mainly based on 
 the
 existing IV cands but zeroing the related group/use cost, this new one 
 is based
 on Richard and Segher's suggestion introducing one doloop dedicated IV 
 cand.

 Some key points are listed below:
   1) New field doloop_p in struct iv_cand to indicate doloop dedicated 
 IV cand.
   2) Special name "doloop" assigned.
   3) Doloop IV cand with form (niter+1, +, -1)
   4) For doloop IV cand, no extra one cost like BIV, assign zero cost 
 for step.
   5) Support may_be_zero (regressed PR is in this case), the base of 
 doloop IV
  can be COND_EXPR, add handlings in cand_value_at and 
 may_eliminate_iv.
   6) Add more expr support in force_expr_to_var_cost for reasonable 
 cost
  calculation on the IV base with may_be_zero (like COND_EXPR).
   7) Set zero cost when using doloop IV cand for doloop use.
   8) Add three hooks (should we merge _generic and _address?).
 *) have_count_reg_decr_p, is to indicate the target has special 
 hardware
count register, we shouldn't consider the impact of doloop IV 
 when
calculating register pressures.
 *) doloop_cost_for_generic, is the extra cost when using doloop IV 
 cand for
generic type IV use.
 *) doloop_cost_for_address, is the extra cost when using doloop IV 
 cand for
address type IV use.
>>
 The new patch addressing the comments is attached.
 Could you please have a look again?  Thanks in advance!
>>> Thanks for working on this.  A bit more nit-pickings.
>>>
>>> -add_candidate_1 (data, base, step, important,
>>> -IP_NORMAL, use, NULL, orig_iv);
>>> -  if (ip_end_pos (data->current_loop)
>>> +add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, 
>>> doloop,
>>> +orig_iv);
>>> +  if (!doloop && ip_end_pos (data->current_loop)
>>> Could you add some comments elaborating why ip_end_pos candidate
>>> shouldn't be added for doloop case?  Because the increment position is
>>> wrong.
>>>
>>> Also if you make doloop the last default parameter of add_candidate_1,
>>> you can save more unnecessary changes to calls to add_candidate?
>>>
>>> -cost = get_computation_cost (data, use, cand, false,
>>> -_vars, NULL, _expr);
>>> +{
>>> +  cost = get_computation_cost (data, use, cand, false, _vars, NULL,
>>> +  _expr);
>>> +  if (cand->doloop_p)
>>> +   cost += targetm.doloop_cost_for_generic;
>>> +}
>>> This adjustment
>>>
>>>cost = get_computation_cost (data, use, cand, true,
>>>_vars, _autoinc, _expr);
>>>
>>> +  if (cand->doloop_p)
>>> +cost += targetm.doloop_cost_for_address;
>>> +
>>> and this adjustment can be moved into get_computation_cost where all
>>> cost adjustments are done.
>>>
>>> +  /* For doloop candidate/use pair, adjust to zero cost.  */
>>> +  if (group->doloop_p && cand->doloop_p)
>>> +   cost = no_cost;
>>> Note above code handles comparing against zero case and decreases the
>>> cost by one (which prefers the same kind candidate as doloop one),
>>> it's very possible to have -1 cost for doloop cand here.  how about
>>> just set to no_cost if it's positive?  Your call.
>>>
>>> +/* For the targets which support doloop, to predict whether later RTL 
>>> doloop
>>> +   transformation will perform on this loop, further detect the doloop use 
>>> and
>>> +   mark the flag doloop_use_p if predicted.  */
>>> +
>>> +void
>>> +predict_and_process_doloop (struct ivopts_data *data)
>>> A better name here? Sorry I don't have another candidate in mind...
>>>
>>> +  data->doloop_use_p = false;
>>> This can be moved to the beginning of above
>>> 'predict_and_process_doloop' function.
>>>
>>> Lastly, could you please add some brief description/comment about
>>> doloop handling as a subsection in the file head comment?
>>>
>>> Otherwise, the ivopt changes look good to me.
>>>
>>> Thanks,
>>> bin
>>>
>>
>> Thanks for your prompt reply!  I've updated the code as your comments,
>> the updated version is attached.  Looking forward to your review again.
> 
> 

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-23 Thread Segher Boessenkool
Hi!

On Fri, Aug 23, 2019 at 05:43:32PM +0800, Bin.Cheng wrote:
> On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin  wrote:
> Not sure if non-ivopts parts are already approved?  If so, the patch
> is okay with above issues addressed.

The rs6000 part is fine.  The target.def entries need some spell check
and copy-editing, but are obvious and trivial otherwise, and/or you can
approve it as ivopts maintainer.

> Thanks very much for your time!

And thank you as well Bin :-)


Segher


Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-23 Thread Bin.Cheng
On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin  wrote:
>
> Hi Bin
>
> on 2019/8/23 上午10:19, Bin.Cheng wrote:
> > On Thu, Aug 22, 2019 at 3:09 PM Kewen.Lin  wrote:
> >>
> >> Hi Bin,
> >>
> >> on 2019/8/22 下午1:46, Bin.Cheng wrote:
> >>> On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin  wrote:
> 
>  Hi Bin,
> 
>  Thanks for your time!
> 
>  on 2019/8/21 下午8:32, Bin.Cheng wrote:
> > On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin  wrote:
> >>
> >> Hi!
> >>
> >> Comparing to the previous versions of implementation mainly based on 
> >> the
> >> existing IV cands but zeroing the related group/use cost, this new one 
> >> is based
> >> on Richard and Segher's suggestion introducing one doloop dedicated IV 
> >> cand.
> >>
> >> Some key points are listed below:
> >>   1) New field doloop_p in struct iv_cand to indicate doloop dedicated 
> >> IV cand.
> >>   2) Special name "doloop" assigned.
> >>   3) Doloop IV cand with form (niter+1, +, -1)
> >>   4) For doloop IV cand, no extra one cost like BIV, assign zero cost 
> >> for step.
> >>   5) Support may_be_zero (regressed PR is in this case), the base of 
> >> doloop IV
> >>  can be COND_EXPR, add handlings in cand_value_at and 
> >> may_eliminate_iv.
> >>   6) Add more expr support in force_expr_to_var_cost for reasonable 
> >> cost
> >>  calculation on the IV base with may_be_zero (like COND_EXPR).
> >>   7) Set zero cost when using doloop IV cand for doloop use.
> >>   8) Add three hooks (should we merge _generic and _address?).
> >> *) have_count_reg_decr_p, is to indicate the target has special 
> >> hardware
> >>count register, we shouldn't consider the impact of doloop IV 
> >> when
> >>calculating register pressures.
> >> *) doloop_cost_for_generic, is the extra cost when using doloop IV 
> >> cand for
> >>generic type IV use.
> >> *) doloop_cost_for_address, is the extra cost when using doloop IV 
> >> cand for
> >>address type IV use.
>
> >> The new patch addressing the comments is attached.
> >> Could you please have a look again?  Thanks in advance!
> > Thanks for working on this.  A bit more nit-pickings.
> >
> > -add_candidate_1 (data, base, step, important,
> > -IP_NORMAL, use, NULL, orig_iv);
> > -  if (ip_end_pos (data->current_loop)
> > +add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, 
> > doloop,
> > +orig_iv);
> > +  if (!doloop && ip_end_pos (data->current_loop)
> > Could you add some comments elaborating why ip_end_pos candidate
> > shouldn't be added for doloop case?  Because the increment position is
> > wrong.
> >
> > Also if you make doloop the last default parameter of add_candidate_1,
> > you can save more unnecessary changes to calls to add_candidate?
> >
> > -cost = get_computation_cost (data, use, cand, false,
> > -_vars, NULL, _expr);
> > +{
> > +  cost = get_computation_cost (data, use, cand, false, _vars, NULL,
> > +  _expr);
> > +  if (cand->doloop_p)
> > +   cost += targetm.doloop_cost_for_generic;
> > +}
> > This adjustment
> >
> >cost = get_computation_cost (data, use, cand, true,
> >_vars, _autoinc, _expr);
> >
> > +  if (cand->doloop_p)
> > +cost += targetm.doloop_cost_for_address;
> > +
> > and this adjustment can be moved into get_computation_cost where all
> > cost adjustments are done.
> >
> > +  /* For doloop candidate/use pair, adjust to zero cost.  */
> > +  if (group->doloop_p && cand->doloop_p)
> > +   cost = no_cost;
> > Note above code handles comparing against zero case and decreases the
> > cost by one (which prefers the same kind candidate as doloop one),
> > it's very possible to have -1 cost for doloop cand here.  how about
> > just set to no_cost if it's positive?  Your call.
> >
> > +/* For the targets which support doloop, to predict whether later RTL 
> > doloop
> > +   transformation will perform on this loop, further detect the doloop use 
> > and
> > +   mark the flag doloop_use_p if predicted.  */
> > +
> > +void
> > +predict_and_process_doloop (struct ivopts_data *data)
> > A better name here? Sorry I don't have another candidate in mind...
> >
> > +  data->doloop_use_p = false;
> > This can be moved to the beginning of above
> > 'predict_and_process_doloop' function.
> >
> > Lastly, could you please add some brief description/comment about
> > doloop handling as a subsection in the file head comment?
> >
> > Otherwise, the ivopt changes look good to me.
> >
> > Thanks,
> > bin
> >
>
> Thanks for your prompt reply!  I've updated the code as your comments,
> the updated version is attached.  Looking forward to your review again.

Sorry to bother.

-  return 

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-23 Thread Kewen.Lin
Hi Bin

on 2019/8/23 上午10:19, Bin.Cheng wrote:
> On Thu, Aug 22, 2019 at 3:09 PM Kewen.Lin  wrote:
>>
>> Hi Bin,
>>
>> on 2019/8/22 下午1:46, Bin.Cheng wrote:
>>> On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin  wrote:

 Hi Bin,

 Thanks for your time!

 on 2019/8/21 下午8:32, Bin.Cheng wrote:
> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin  wrote:
>>
>> Hi!
>>
>> Comparing to the previous versions of implementation mainly based on the
>> existing IV cands but zeroing the related group/use cost, this new one 
>> is based
>> on Richard and Segher's suggestion introducing one doloop dedicated IV 
>> cand.
>>
>> Some key points are listed below:
>>   1) New field doloop_p in struct iv_cand to indicate doloop dedicated 
>> IV cand.
>>   2) Special name "doloop" assigned.
>>   3) Doloop IV cand with form (niter+1, +, -1)
>>   4) For doloop IV cand, no extra one cost like BIV, assign zero cost 
>> for step.
>>   5) Support may_be_zero (regressed PR is in this case), the base of 
>> doloop IV
>>  can be COND_EXPR, add handlings in cand_value_at and 
>> may_eliminate_iv.
>>   6) Add more expr support in force_expr_to_var_cost for reasonable cost
>>  calculation on the IV base with may_be_zero (like COND_EXPR).
>>   7) Set zero cost when using doloop IV cand for doloop use.
>>   8) Add three hooks (should we merge _generic and _address?).
>> *) have_count_reg_decr_p, is to indicate the target has special 
>> hardware
>>count register, we shouldn't consider the impact of doloop IV when
>>calculating register pressures.
>> *) doloop_cost_for_generic, is the extra cost when using doloop IV 
>> cand for
>>generic type IV use.
>> *) doloop_cost_for_address, is the extra cost when using doloop IV 
>> cand for
>>address type IV use.

>> The new patch addressing the comments is attached.
>> Could you please have a look again?  Thanks in advance!
> Thanks for working on this.  A bit more nit-pickings.
> 
> -add_candidate_1 (data, base, step, important,
> -IP_NORMAL, use, NULL, orig_iv);
> -  if (ip_end_pos (data->current_loop)
> +add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, 
> doloop,
> +orig_iv);
> +  if (!doloop && ip_end_pos (data->current_loop)
> Could you add some comments elaborating why ip_end_pos candidate
> shouldn't be added for doloop case?  Because the increment position is
> wrong.
> 
> Also if you make doloop the last default parameter of add_candidate_1,
> you can save more unnecessary changes to calls to add_candidate?
> 
> -cost = get_computation_cost (data, use, cand, false,
> -_vars, NULL, _expr);
> +{
> +  cost = get_computation_cost (data, use, cand, false, _vars, NULL,
> +  _expr);
> +  if (cand->doloop_p)
> +   cost += targetm.doloop_cost_for_generic;
> +}
> This adjustment
> 
>cost = get_computation_cost (data, use, cand, true,
>_vars, _autoinc, _expr);
> 
> +  if (cand->doloop_p)
> +cost += targetm.doloop_cost_for_address;
> +
> and this adjustment can be moved into get_computation_cost where all
> cost adjustments are done.
> 
> +  /* For doloop candidate/use pair, adjust to zero cost.  */
> +  if (group->doloop_p && cand->doloop_p)
> +   cost = no_cost;
> Note above code handles comparing against zero case and decreases the
> cost by one (which prefers the same kind candidate as doloop one),
> it's very possible to have -1 cost for doloop cand here.  how about
> just set to no_cost if it's positive?  Your call.
> 
> +/* For the targets which support doloop, to predict whether later RTL doloop
> +   transformation will perform on this loop, further detect the doloop use 
> and
> +   mark the flag doloop_use_p if predicted.  */
> +
> +void
> +predict_and_process_doloop (struct ivopts_data *data)
> A better name here? Sorry I don't have another candidate in mind...
> 
> +  data->doloop_use_p = false;
> This can be moved to the beginning of above
> 'predict_and_process_doloop' function.
> 
> Lastly, could you please add some brief description/comment about
> doloop handling as a subsection in the file head comment?
> 
> Otherwise, the ivopt changes look good to me.
> 
> Thanks,
> bin
> 

Thanks for your prompt reply!  I've updated the code as your comments,
the updated version is attached.  Looking forward to your review again.


Thanks,
Kewen

-

gcc/ChangeLog

2019-08-23  Kewen Lin  

PR middle-end/80791
* config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macro.
(TARGET_DOLOOP_COST_FOR_GENERIC): Likewise.
(TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise.
* target.def (have_count_reg_decr_p): New hook.
(doloop_cost_for_generic): Likewise.
 

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-22 Thread Bin.Cheng
On Thu, Aug 22, 2019 at 3:09 PM Kewen.Lin  wrote:
>
> Hi Bin,
>
> on 2019/8/22 下午1:46, Bin.Cheng wrote:
> > On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin  wrote:
> >>
> >> Hi Bin,
> >>
> >> Thanks for your time!
> >>
> >> on 2019/8/21 下午8:32, Bin.Cheng wrote:
> >>> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin  wrote:
> 
>  Hi!
> 
>  Comparing to the previous versions of implementation mainly based on the
>  existing IV cands but zeroing the related group/use cost, this new one 
>  is based
>  on Richard and Segher's suggestion introducing one doloop dedicated IV 
>  cand.
> 
>  Some key points are listed below:
>    1) New field doloop_p in struct iv_cand to indicate doloop dedicated 
>  IV cand.
>    2) Special name "doloop" assigned.
>    3) Doloop IV cand with form (niter+1, +, -1)
>    4) For doloop IV cand, no extra one cost like BIV, assign zero cost 
>  for step.
>    5) Support may_be_zero (regressed PR is in this case), the base of 
>  doloop IV
>   can be COND_EXPR, add handlings in cand_value_at and 
>  may_eliminate_iv.
>    6) Add more expr support in force_expr_to_var_cost for reasonable cost
>   calculation on the IV base with may_be_zero (like COND_EXPR).
>    7) Set zero cost when using doloop IV cand for doloop use.
>    8) Add three hooks (should we merge _generic and _address?).
>  *) have_count_reg_decr_p, is to indicate the target has special 
>  hardware
> count register, we shouldn't consider the impact of doloop IV when
> calculating register pressures.
>  *) doloop_cost_for_generic, is the extra cost when using doloop IV 
>  cand for
> generic type IV use.
>  *) doloop_cost_for_address, is the extra cost when using doloop IV 
>  cand for
> address type IV use.
> >>> What will happen if doloop IV cand be used for generic/address type iv
> >>> use?  Can RTL doloop can still perform doloop optimization in this
> >>> case?
> >>>
> >>
> >> On Power, we put the iteration count into hardware count register, it 
> >> takes very
> >> high cost to move the count to GPR, so the cost is set as INF to make it 
> >> impossible
> >> to use it for generic/address type iv use.  But as some discussion before, 
> >> on some
> >> targets using GPR instead of hardware count register, they probably want 
> >> to use this
> >> doloop iv used for other uses if profitable.  These two hooks offer the 
> >> possibility.
> >> In that case, I think RTL doloop can still perform since it can still get 
> >> the
> >> pattern and transform.  The generic/address uses can still use it.
> 
>  Bootstrapped on powerpc64le-linux-gnu and regression testing passed 
>  excepting
>  for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is 
>  tracked
>  by PR89983.
> 
>  Any comments and suggestions are highly appreciated.  Thanks!
> >>> Not sure if I understand the patch correctly, some comments embedded.
> >>>
> >>> +  /* The number of doloop candidate in the set.  */
> >>> +  unsigned n_doloop_cands;
> >>> +
> >>> This is unnecessary.  See below comments.
> >>>
> >>> -add_candidate_1 (data, base, step, important,
> >>> -IP_NORMAL, use, NULL, orig_iv);
> >>> +add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, 
> >>> doloop,
> >>> +orig_iv);
> >>>if (ip_end_pos (data->current_loop)
> >>>&& allow_ip_end_pos_p (data->current_loop))
> >>> -add_candidate_1 (data, base, step, important, IP_END, use, NULL, 
> >>> orig_iv);
> >>> +add_candidate_1 (data, base, step, important, IP_END, use, NULL, 
> >>> doloop,
> >>> +orig_iv);
> >>> Do we need to skip ip_end_pos case for doloop candidate?  Because the
> >>> candidate increment will be inserted in latch, i.e, increment position
> >>> is after exit condition.
> >>>
> >>
> >> Yes, we should skip it.  Currently function find_doloop_use has the check 
> >> on an
> >> empty latch and gimple_cond to latch, partially excluding it.  But it's 
> >> still good
> >> to guard it directly here.
> >>
> >>> -  tree_to_aff_combination (iv->base, type, val);
> >>> +  tree base = iv->base;
> >>> +  /* See add_iv_candidate_for_doloop, if may_be_zero is set, we want to 
> >>> extract
> >>> + the value under !may_be_zero to get the compact bound which also 
> >>> well fits
> >>> + for may_be_zero since we ensure the value for it is const one.  */
> >>> +  if (cand->doloop_p && desc->may_be_zero && !integer_zerop
> >>> (desc->may_be_zero))
> >>> +base = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> >>> +   unshare_expr (rewrite_to_non_trapping_overflow 
> >>> (niter)),
> >>> +   build_int_cst (TREE_TYPE (niter), 1));
> >>> +  tree_to_aff_combination (base, type, val);
> >>> I don't quite follow here.  The iv->base is 

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-22 Thread Kewen.Lin
Hi Bin,

on 2019/8/22 下午1:46, Bin.Cheng wrote:
> On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin  wrote:
>>
>> Hi Bin,
>>
>> Thanks for your time!
>>
>> on 2019/8/21 下午8:32, Bin.Cheng wrote:
>>> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin  wrote:

 Hi!

 Comparing to the previous versions of implementation mainly based on the
 existing IV cands but zeroing the related group/use cost, this new one is 
 based
 on Richard and Segher's suggestion introducing one doloop dedicated IV 
 cand.

 Some key points are listed below:
   1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV 
 cand.
   2) Special name "doloop" assigned.
   3) Doloop IV cand with form (niter+1, +, -1)
   4) For doloop IV cand, no extra one cost like BIV, assign zero cost for 
 step.
   5) Support may_be_zero (regressed PR is in this case), the base of 
 doloop IV
  can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv.
   6) Add more expr support in force_expr_to_var_cost for reasonable cost
  calculation on the IV base with may_be_zero (like COND_EXPR).
   7) Set zero cost when using doloop IV cand for doloop use.
   8) Add three hooks (should we merge _generic and _address?).
 *) have_count_reg_decr_p, is to indicate the target has special 
 hardware
count register, we shouldn't consider the impact of doloop IV when
calculating register pressures.
 *) doloop_cost_for_generic, is the extra cost when using doloop IV 
 cand for
generic type IV use.
 *) doloop_cost_for_address, is the extra cost when using doloop IV 
 cand for
address type IV use.
>>> What will happen if doloop IV cand be used for generic/address type iv
>>> use?  Can RTL doloop can still perform doloop optimization in this
>>> case?
>>>
>>
>> On Power, we put the iteration count into hardware count register, it takes 
>> very
>> high cost to move the count to GPR, so the cost is set as INF to make it 
>> impossible
>> to use it for generic/address type iv use.  But as some discussion before, 
>> on some
>> targets using GPR instead of hardware count register, they probably want to 
>> use this
>> doloop iv used for other uses if profitable.  These two hooks offer the 
>> possibility.
>> In that case, I think RTL doloop can still perform since it can still get the
>> pattern and transform.  The generic/address uses can still use it.

 Bootstrapped on powerpc64le-linux-gnu and regression testing passed 
 excepting
 for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is 
 tracked
 by PR89983.

 Any comments and suggestions are highly appreciated.  Thanks!
>>> Not sure if I understand the patch correctly, some comments embedded.
>>>
>>> +  /* The number of doloop candidate in the set.  */
>>> +  unsigned n_doloop_cands;
>>> +
>>> This is unnecessary.  See below comments.
>>>
>>> -add_candidate_1 (data, base, step, important,
>>> -IP_NORMAL, use, NULL, orig_iv);
>>> +add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, 
>>> doloop,
>>> +orig_iv);
>>>if (ip_end_pos (data->current_loop)
>>>&& allow_ip_end_pos_p (data->current_loop))
>>> -add_candidate_1 (data, base, step, important, IP_END, use, NULL, 
>>> orig_iv);
>>> +add_candidate_1 (data, base, step, important, IP_END, use, NULL, 
>>> doloop,
>>> +orig_iv);
>>> Do we need to skip ip_end_pos case for doloop candidate?  Because the
>>> candidate increment will be inserted in latch, i.e, increment position
>>> is after exit condition.
>>>
>>
>> Yes, we should skip it.  Currently function find_doloop_use has the check on 
>> an
>> empty latch and gimple_cond to latch, partially excluding it.  But it's 
>> still good
>> to guard it directly here.
>>
>>> -  tree_to_aff_combination (iv->base, type, val);
>>> +  tree base = iv->base;
>>> +  /* See add_iv_candidate_for_doloop, if may_be_zero is set, we want to 
>>> extract
>>> + the value under !may_be_zero to get the compact bound which also well 
>>> fits
>>> + for may_be_zero since we ensure the value for it is const one.  */
>>> +  if (cand->doloop_p && desc->may_be_zero && !integer_zerop
>>> (desc->may_be_zero))
>>> +base = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
>>> +   unshare_expr (rewrite_to_non_trapping_overflow 
>>> (niter)),
>>> +   build_int_cst (TREE_TYPE (niter), 1));
>>> +  tree_to_aff_combination (base, type, val);
>>> I don't quite follow here.  The iv->base is computed from niter, I
>>> suppose compact bound is for cheaper candidate initialization?  Why
>>> it's possible to extract !may_be_zero niter for may_be_zero here?  The
>>> niter under !may_be_zero has no indication about the real niter under
>>> may_be_zero.
>>>
>>
>> As you note below, the 

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-21 Thread Bin.Cheng
On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin  wrote:
>
> Hi Bin,
>
> Thanks for your time!
>
> on 2019/8/21 下午8:32, Bin.Cheng wrote:
> > On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin  wrote:
> >>
> >> Hi!
> >>
> >> Comparing to the previous versions of implementation mainly based on the
> >> existing IV cands but zeroing the related group/use cost, this new one is 
> >> based
> >> on Richard and Segher's suggestion introducing one doloop dedicated IV 
> >> cand.
> >>
> >> Some key points are listed below:
> >>   1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV 
> >> cand.
> >>   2) Special name "doloop" assigned.
> >>   3) Doloop IV cand with form (niter+1, +, -1)
> >>   4) For doloop IV cand, no extra one cost like BIV, assign zero cost for 
> >> step.
> >>   5) Support may_be_zero (regressed PR is in this case), the base of 
> >> doloop IV
> >>  can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv.
> >>   6) Add more expr support in force_expr_to_var_cost for reasonable cost
> >>  calculation on the IV base with may_be_zero (like COND_EXPR).
> >>   7) Set zero cost when using doloop IV cand for doloop use.
> >>   8) Add three hooks (should we merge _generic and _address?).
> >> *) have_count_reg_decr_p, is to indicate the target has special 
> >> hardware
> >>count register, we shouldn't consider the impact of doloop IV when
> >>calculating register pressures.
> >> *) doloop_cost_for_generic, is the extra cost when using doloop IV 
> >> cand for
> >>generic type IV use.
> >> *) doloop_cost_for_address, is the extra cost when using doloop IV 
> >> cand for
> >>address type IV use.
> > What will happen if doloop IV cand be used for generic/address type iv
> > use?  Can RTL doloop can still perform doloop optimization in this
> > case?
> >
>
> On Power, we put the iteration count into hardware count register, it takes 
> very
> high cost to move the count to GPR, so the cost is set as INF to make it 
> impossible
> to use it for generic/address type iv use.  But as some discussion before, on 
> some
> targets using GPR instead of hardware count register, they probably want to 
> use this
> doloop iv used for other uses if profitable.  These two hooks offer the 
> possibility.
> In that case, I think RTL doloop can still perform since it can still get the
> pattern and transform.  The generic/address uses can still use it.
> >>
> >> Bootstrapped on powerpc64le-linux-gnu and regression testing passed 
> >> excepting
> >> for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is 
> >> tracked
> >> by PR89983.
> >>
> >> Any comments and suggestions are highly appreciated.  Thanks!
> > Not sure if I understand the patch correctly, some comments embedded.
> >
> > +  /* The number of doloop candidate in the set.  */
> > +  unsigned n_doloop_cands;
> > +
> > This is unnecessary.  See below comments.
> >
> > -add_candidate_1 (data, base, step, important,
> > -IP_NORMAL, use, NULL, orig_iv);
> > +add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, 
> > doloop,
> > +orig_iv);
> >if (ip_end_pos (data->current_loop)
> >&& allow_ip_end_pos_p (data->current_loop))
> > -add_candidate_1 (data, base, step, important, IP_END, use, NULL, 
> > orig_iv);
> > +add_candidate_1 (data, base, step, important, IP_END, use, NULL, 
> > doloop,
> > +orig_iv);
> > Do we need to skip ip_end_pos case for doloop candidate?  Because the
> > candidate increment will be inserted in latch, i.e, increment position
> > is after exit condition.
> >
>
> Yes, we should skip it.  Currently function find_doloop_use has the check on 
> an
> empty latch and gimple_cond to latch, partially excluding it.  But it's still 
> good
> to guard it directly here.
>
> > -  tree_to_aff_combination (iv->base, type, val);
> > +  tree base = iv->base;
> > +  /* See add_iv_candidate_for_doloop, if may_be_zero is set, we want to 
> > extract
> > + the value under !may_be_zero to get the compact bound which also well 
> > fits
> > + for may_be_zero since we ensure the value for it is const one.  */
> > +  if (cand->doloop_p && desc->may_be_zero && !integer_zerop
> > (desc->may_be_zero))
> > +base = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> > +   unshare_expr (rewrite_to_non_trapping_overflow 
> > (niter)),
> > +   build_int_cst (TREE_TYPE (niter), 1));
> > +  tree_to_aff_combination (base, type, val);
> > I don't quite follow here.  The iv->base is computed from niter, I
> > suppose compact bound is for cheaper candidate initialization?  Why
> > it's possible to extract !may_be_zero niter for may_be_zero here?  The
> > niter under !may_be_zero has no indication about the real niter under
> > may_be_zero.
> >
>
> As you note below, the cand_value for doloop would be zero, but for the case
> may_be_zero set, the 

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-21 Thread Kewen.Lin
Hi Bin,

Thanks for your time!

on 2019/8/21 下午8:32, Bin.Cheng wrote:
> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin  wrote:
>>
>> Hi!
>>
>> Comparing to the previous versions of implementation mainly based on the
>> existing IV cands but zeroing the related group/use cost, this new one is 
>> based
>> on Richard and Segher's suggestion introducing one doloop dedicated IV cand.
>>
>> Some key points are listed below:
>>   1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV 
>> cand.
>>   2) Special name "doloop" assigned.
>>   3) Doloop IV cand with form (niter+1, +, -1)
>>   4) For doloop IV cand, no extra one cost like BIV, assign zero cost for 
>> step.
>>   5) Support may_be_zero (regressed PR is in this case), the base of doloop 
>> IV
>>  can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv.
>>   6) Add more expr support in force_expr_to_var_cost for reasonable cost
>>  calculation on the IV base with may_be_zero (like COND_EXPR).
>>   7) Set zero cost when using doloop IV cand for doloop use.
>>   8) Add three hooks (should we merge _generic and _address?).
>> *) have_count_reg_decr_p, is to indicate the target has special hardware
>>count register, we shouldn't consider the impact of doloop IV when
>>calculating register pressures.
>> *) doloop_cost_for_generic, is the extra cost when using doloop IV cand 
>> for
>>generic type IV use.
>> *) doloop_cost_for_address, is the extra cost when using doloop IV cand 
>> for
>>address type IV use.
> What will happen if doloop IV cand be used for generic/address type iv
> use?  Can RTL doloop can still perform doloop optimization in this
> case?
> 

On Power, we put the iteration count into hardware count register, it takes very
high cost to move the count to GPR, so the cost is set as INF to make it 
impossible
to use it for generic/address type iv use.  But as some discussion before, on 
some
targets using GPR instead of hardware count register, they probably want to use 
this
doloop iv used for other uses if profitable.  These two hooks offer the 
possibility.
In that case, I think RTL doloop can still perform since it can still get the 
pattern and transform.  The generic/address uses can still use it.
>>
>> Bootstrapped on powerpc64le-linux-gnu and regression testing passed excepting
>> for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is 
>> tracked
>> by PR89983.
>>
>> Any comments and suggestions are highly appreciated.  Thanks!
> Not sure if I understand the patch correctly, some comments embedded.
> 
> +  /* The number of doloop candidate in the set.  */
> +  unsigned n_doloop_cands;
> +
> This is unnecessary.  See below comments.
> 
> -add_candidate_1 (data, base, step, important,
> -IP_NORMAL, use, NULL, orig_iv);
> +add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, 
> doloop,
> +orig_iv);
>if (ip_end_pos (data->current_loop)
>&& allow_ip_end_pos_p (data->current_loop))
> -add_candidate_1 (data, base, step, important, IP_END, use, NULL, 
> orig_iv);
> +add_candidate_1 (data, base, step, important, IP_END, use, NULL, doloop,
> +orig_iv);
> Do we need to skip ip_end_pos case for doloop candidate?  Because the
> candidate increment will be inserted in latch, i.e, increment position
> is after exit condition.
> 

Yes, we should skip it.  Currently function find_doloop_use has the check on an
empty latch and gimple_cond to latch, partially excluding it.  But it's still 
good
to guard it directly here.

> -  tree_to_aff_combination (iv->base, type, val);
> +  tree base = iv->base;
> +  /* See add_iv_candidate_for_doloop, if may_be_zero is set, we want to 
> extract
> + the value under !may_be_zero to get the compact bound which also well 
> fits
> + for may_be_zero since we ensure the value for it is const one.  */
> +  if (cand->doloop_p && desc->may_be_zero && !integer_zerop
> (desc->may_be_zero))
> +base = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> +   unshare_expr (rewrite_to_non_trapping_overflow 
> (niter)),
> +   build_int_cst (TREE_TYPE (niter), 1));
> +  tree_to_aff_combination (base, type, val);
> I don't quite follow here.  The iv->base is computed from niter, I
> suppose compact bound is for cheaper candidate initialization?  Why
> it's possible to extract !may_be_zero niter for may_be_zero here?  The
> niter under !may_be_zero has no indication about the real niter under
> may_be_zero.
> 

As you note below, the cand_value for doloop would be zero, but for the case
may_be_zero set, the current calculation would take care of the whole niter
expression including the cond_expr introduced by may_be_zero check, it's 
unexpected.  The purpose is to use the value under condition !may_be_zero
for the calculation, and yes, to get expected zero finally.

> -  cand_value_at (loop, cand, 

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-21 Thread Bin.Cheng
On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin  wrote:
>
> Hi!
>
> Comparing to the previous versions of implementation mainly based on the
> existing IV cands but zeroing the related group/use cost, this new one is 
> based
> on Richard and Segher's suggestion introducing one doloop dedicated IV cand.
>
> Some key points are listed below:
>   1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV 
> cand.
>   2) Special name "doloop" assigned.
>   3) Doloop IV cand with form (niter+1, +, -1)
>   4) For doloop IV cand, no extra one cost like BIV, assign zero cost for 
> step.
>   5) Support may_be_zero (regressed PR is in this case), the base of doloop IV
>  can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv.
>   6) Add more expr support in force_expr_to_var_cost for reasonable cost
>  calculation on the IV base with may_be_zero (like COND_EXPR).
>   7) Set zero cost when using doloop IV cand for doloop use.
>   8) Add three hooks (should we merge _generic and _address?).
> *) have_count_reg_decr_p, is to indicate the target has special hardware
>count register, we shouldn't consider the impact of doloop IV when
>calculating register pressures.
> *) doloop_cost_for_generic, is the extra cost when using doloop IV cand 
> for
>generic type IV use.
> *) doloop_cost_for_address, is the extra cost when using doloop IV cand 
> for
>address type IV use.
What will happen if doloop IV cand be used for generic/address type iv
use?  Can RTL doloop can still perform doloop optimization in this
case?

>
> Bootstrapped on powerpc64le-linux-gnu and regression testing passed excepting
> for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is 
> tracked
> by PR89983.
>
> Any comments and suggestions are highly appreciated.  Thanks!
Not sure if I understand the patch correctly, some comments embedded.

+  /* The number of doloop candidate in the set.  */
+  unsigned n_doloop_cands;
+
This is unnecessary.  See below comments.

-add_candidate_1 (data, base, step, important,
-IP_NORMAL, use, NULL, orig_iv);
+add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, doloop,
+orig_iv);
   if (ip_end_pos (data->current_loop)
   && allow_ip_end_pos_p (data->current_loop))
-add_candidate_1 (data, base, step, important, IP_END, use, NULL, orig_iv);
+add_candidate_1 (data, base, step, important, IP_END, use, NULL, doloop,
+orig_iv);
Do we need to skip ip_end_pos case for doloop candidate?  Because the
candidate increment will be inserted in latch, i.e, increment position
is after exit condition.

-  tree_to_aff_combination (iv->base, type, val);
+  tree base = iv->base;
+  /* See add_iv_candidate_for_doloop, if may_be_zero is set, we want to extract
+ the value under !may_be_zero to get the compact bound which also well fits
+ for may_be_zero since we ensure the value for it is const one.  */
+  if (cand->doloop_p && desc->may_be_zero && !integer_zerop
(desc->may_be_zero))
+base = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
+   unshare_expr (rewrite_to_non_trapping_overflow (niter)),
+   build_int_cst (TREE_TYPE (niter), 1));
+  tree_to_aff_combination (base, type, val);
I don't quite follow here.  The iv->base is computed from niter, I
suppose compact bound is for cheaper candidate initialization?  Why
it's possible to extract !may_be_zero niter for may_be_zero here?  The
niter under !may_be_zero has no indication about the real niter under
may_be_zero.

-  cand_value_at (loop, cand, use->stmt, desc->niter, );
+  cand_value_at (loop, cand, use->stmt, desc, );
If I understand correctly, doloop use/cand will only be
identified/added for single exit loop, and there will be only one
cond(doloop) iv_use and only one doloop cand for doloop loop.  So the
cand_value at niter at use position would be 0.  If that's the case,
we can skip calling cand_value_at here for doloop cand.  The change to
cand_value_at would be unnecessary neither.

-  expensive.  */
-  if (!integer_zerop (desc->may_be_zero))
+  expensive.
+
+ For doloop candidate, we have considered MAY_BE_ZERO for IV base, need to
+ support MAY_BE_ZERO ? 0 : NITER, so simply bypass this check.  */
+  if (!integer_zerop (desc->may_be_zero) && !cand->doloop_p)
 return iv_elimination_compare_lt (data, cand, comp, desc);
And we can early return before this?

+  if (may_be_zero)
+{
+  if (COMPARISON_CLASS_P (may_be_zero))
+   {
+ niter = fold_build3 (COND_EXPR, ntype, may_be_zero,
+  build_int_cst (ntype, 0),
+  rewrite_to_non_trapping_overflow (niter));
+   }
+  /* Don't try to obtain the iteration count expression when may_be_zero is
+integer_nonzerop (actually iteration count is one) or else.  */
+  else
+   return;
+}

[PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-14 Thread Kewen.Lin
Hi!

Comparing to the previous versions of implementation mainly based on the 
existing IV cands but zeroing the related group/use cost, this new one is based
on Richard and Segher's suggestion introducing one doloop dedicated IV cand.  

Some key points are listed below:
  1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV cand.
  2) Special name "doloop" assigned.
  3) Doloop IV cand with form (niter+1, +, -1)
  4) For doloop IV cand, no extra one cost like BIV, assign zero cost for step.
  5) Support may_be_zero (regressed PR is in this case), the base of doloop IV
 can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv.
  6) Add more expr support in force_expr_to_var_cost for reasonable cost
 calculation on the IV base with may_be_zero (like COND_EXPR).
  7) Set zero cost when using doloop IV cand for doloop use.
  8) Add three hooks (should we merge _generic and _address?).
*) have_count_reg_decr_p, is to indicate the target has special hardware
   count register, we shouldn't consider the impact of doloop IV when
   calculating register pressures.
*) doloop_cost_for_generic, is the extra cost when using doloop IV cand for
   generic type IV use.
*) doloop_cost_for_address, is the extra cost when using doloop IV cand for
   address type IV use.

Bootstrapped on powerpc64le-linux-gnu and regression testing passed excepting
for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is tracked
by PR89983.

Any comments and suggestions are highly appreciated.  Thanks!

Kewen

-

gcc/ChangeLog

2019-08-14  Kewen Lin  

PR middle-end/80791
* config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macro.
(TARGET_DOLOOP_COST_FOR_GENERIC): Likewise.
(TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise.
* target.def (have_count_reg_decr_p): New hook.
(doloop_cost_for_generic): Likewise.
(doloop_cost_for_address): Likewise.
* doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): Likewise.
(TARGET_DOLOOP_COST_FOR_GENERIC): Likewise.
(TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise.
* doc/tm.texi: Regenerate.
* tree-ssa-loop-ivopts.c (comp_cost::operator+=): Consider infinite cost
addend.
(record_group): Init doloop_p.
(add_candidate_1): Add optional argument doloop, change the handlings
accordingly.
(add_candidate): Likewise.
(add_iv_candidate_for_biv): Update the call to add_candidate.
(generic_predict_doloop_p): Update attribute.
(force_expr_to_var_cost): Add costing for expressions COND_EXPR/LT_EXPR/
LE_EXPR/GT_EXPR/GE_EXPR/EQ_EXPR/NE_EXPR/UNORDERED_EXPR/ORDERED_EXPR/
UNLT_EXPR/UNLE_EXPR/UNGT_EXPR/UNGE_EXPR/UNEQ_EXPR/LTGT_EXPR/MAX_EXPR/
MIN_EXPR.
(determine_group_iv_cost_generic): Update for doloop IV cand.
(determine_group_iv_cost_address): Likewise.
(determine_group_iv_cost_cond): Likewise.
(determine_iv_cost): Likewise.
(ivopts_estimate_reg_pressure): Likewise.
(cand_value_at): Update argument niter type to struct tree_niter_desc*,
consider doloop IV cand and may_be_zero.
(may_eliminate_iv): Update the call to cand_value_at, consider doloop
IV cand and may_be_zero.
(add_iv_candidate_for_doloop): New function.
(find_iv_candidates): Call function add_iv_candidate_for_doloop.
(determine_set_costs): Update the call to ivopts_estimate_reg_pressure.
(iv_ca_recount_cost): Likewise.
(iv_ca_new): Init n_doloop_cands.
(iv_ca_set_no_cp): Update n_doloop_cands.
(iv_ca_set_cp): Likewise.
(iv_ca_dump): Dump register cost.
(find_doloop_use): Likewise.
(tree_ssa_iv_optimize_loop): Call function generic_predict_doloop_p and
find_doloop_use.

gcc/testsuite/ChangeLog

2019-08-14  Kewen Lin  

PR middle-end/80791
* gcc.dg/tree-ssa/ivopts-3.c: Adjust for doloop change.
* gcc.dg/tree-ssa/ivopts-lt.c: Likewise.
* gcc.dg/tree-ssa/pr32044.c: Likewise.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6667cd0..5eccbdc 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1912,6 +1912,16 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_PREDICT_DOLOOP_P
 #define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p
 
+#undef TARGET_HAVE_COUNT_REG_DECR_P
+#define TARGET_HAVE_COUNT_REG_DECR_P true
+
+/* 10 is infinite cost in IVOPTs.  */
+#undef TARGET_DOLOOP_COST_FOR_GENERIC
+#define TARGET_DOLOOP_COST_FOR_GENERIC 10
+
+#undef TARGET_DOLOOP_COST_FOR_ADDRESS
+#define TARGET_DOLOOP_COST_FOR_ADDRESS 10
+
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c2aa4d0..9f3a08a 100644
---