Re: [PATCH] Add vector cost model density heuristic

2012-06-21 Thread William J. Schmidt
On Tue, 2012-06-19 at 16:20 +0200, Richard Guenther wrote:
> On Tue, 19 Jun 2012, William J. Schmidt wrote:
> 
> > On Tue, 2012-06-19 at 14:48 +0200, Richard Guenther wrote:
> > > On Tue, 19 Jun 2012, William J. Schmidt wrote:
> > > 
> > > > I remember having this discussion, and I was looking for it to check on
> > > > the details, but I can't seem to find it either in my inbox or in the
> > > > archives.  Can you please point me to that again?  Sorry for the bother.
> > > 
> > > It was in the "Correct cost model for strided loads" thread.
> > 
> > Ah, right, thanks.  I think it will be best to make that a separate
> > patch in the series.  Like so:
> > 
> > (1) Add calls to the new interface without disturbing existing logic;
> > modify the profitability algorithms to query the new model for inside
> > costs.  Default algorithm for the model is to just sum costs as is done
> > today.

Just FYI, this is not quite as straightforward as I thought.  There is
some code in tree-vect-data-refs.c that computes costs for various
peeling options and picks one of them.  In most other places we can just
pass the instructions to the back end at the same place that the costs
are currently calculated, but not here.  This will require some more
major surgery to save the instructions needed from each peeling option
and only pass along the ones that end up being chosen.

The upside is the same sort of "delayed emit" is needed for the SLP
ordering problem, so the infrastructure for this will be reusable for
that problem.

Grumble.

Bill

> > (1a) Split up the cost hooks (one for loads/stores with misalign parm,
> > one for vector_stmt with tree_code, etc.).
> > (x) Add heuristics to target models as desired.
> > (2) Handle the SLP ordering problem.
> > (3) Handle outside costs in the target model.
> > (4) Remove the now unnecessary cost fields and the calls that set them.
> > 
> > I'll start work on this series of patches as I have time between other
> > projects.
> 
> Thanks!
> Richard.
> 



Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Tue, 19 Jun 2012, William J. Schmidt wrote:

> On Tue, 2012-06-19 at 14:48 +0200, Richard Guenther wrote:
> > On Tue, 19 Jun 2012, William J. Schmidt wrote:
> > 
> > > I remember having this discussion, and I was looking for it to check on
> > > the details, but I can't seem to find it either in my inbox or in the
> > > archives.  Can you please point me to that again?  Sorry for the bother.
> > 
> > It was in the "Correct cost model for strided loads" thread.
> 
> Ah, right, thanks.  I think it will be best to make that a separate
> patch in the series.  Like so:
> 
> (1) Add calls to the new interface without disturbing existing logic;
> modify the profitability algorithms to query the new model for inside
> costs.  Default algorithm for the model is to just sum costs as is done
> today.
> (1a) Split up the cost hooks (one for loads/stores with misalign parm,
> one for vector_stmt with tree_code, etc.).
> (x) Add heuristics to target models as desired.
> (2) Handle the SLP ordering problem.
> (3) Handle outside costs in the target model.
> (4) Remove the now unnecessary cost fields and the calls that set them.
> 
> I'll start work on this series of patches as I have time between other
> projects.

Thanks!
Richard.


Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread William J. Schmidt
On Tue, 2012-06-19 at 14:48 +0200, Richard Guenther wrote:
> On Tue, 19 Jun 2012, William J. Schmidt wrote:
> 
> > I remember having this discussion, and I was looking for it to check on
> > the details, but I can't seem to find it either in my inbox or in the
> > archives.  Can you please point me to that again?  Sorry for the bother.
> 
> It was in the "Correct cost model for strided loads" thread.

Ah, right, thanks.  I think it will be best to make that a separate
patch in the series.  Like so:

(1) Add calls to the new interface without disturbing existing logic;
modify the profitability algorithms to query the new model for inside
costs.  Default algorithm for the model is to just sum costs as is done
today.
(1a) Split up the cost hooks (one for loads/stores with misalign parm,
one for vector_stmt with tree_code, etc.).
(x) Add heuristics to target models as desired.
(2) Handle the SLP ordering problem.
(3) Handle outside costs in the target model.
(4) Remove the now unnecessary cost fields and the calls that set them.

I'll start work on this series of patches as I have time between other
projects.

Thanks,
Bill

> 
> Richard.
> 



Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Tue, 19 Jun 2012, William J. Schmidt wrote:

> On Tue, 2012-06-19 at 12:10 +0200, Richard Guenther wrote:
> > On Mon, 18 Jun 2012, William J. Schmidt wrote:
> > 
> > > On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote:
> > > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > > > > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > > > > 
> > > > 
> > > > > 
> > > > > Hmm.  I don't like this patch or its general idea too much.  Instead
> > > > > I'd like us to move more of the cost model detail to the target, 
> > > > > giving
> > > > > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > > > > posting the overall idea at some point, but let me repeat it here 
> > > > > instead
> > > > > of trying to find that e-mail.
> > > > > 
> > > > > The basic interface of the cost model should be, in targetm.vectorize
> > > > > 
> > > > >   /* Tell the target to start cost analysis of a loop or a basic-block
> > > > >  (if the loop argument is NULL).  Returns an opaque pointer to
> > > > >  target-private data.  */
> > > > >   void *init_cost (struct loop *loop);
> > > > > 
> > > > >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  
> > > > > */
> > > > >   void add_stmt_cost (void *data, unsigned n,
> > > > > vectorized-stmt-kind,
> > > > >   enum machine_mode vector_mode);
> > > > > 
> > > > >   /* Tell the target to compute and return the cost of the accumulated
> > > > >  statements and free any target-private data.  */
> > > > >   unsigned finish_cost (void *data);
> > > 
> > > By the way, I don't see much point in passing the void *data around
> > > here.  Too many levels of interfaces that we'd have to pass it around in
> > > the vectorizer, so it would just sit in a static variable.  Might as
> > > well let the data be wholly private to the target.
> > 
> > Ok, so you'd have void init_cost (struct loop *) and
> > unsigned finish_cost (void); then?  Static variables are of couse
> > not properly "abstracted" so we can't ever compute two set of costs
> > at the same time ... but that's true all-over-the-place in GCC ...
> 
> It's a fair point, and perhaps I'll decide to pass the data pointer
> around anyway to keep that option open.  We'll see which looks uglier.
> 
> > 
> > With previous discussion the add_stmt_cost hook would be split up
> > to also allow passing the operation code for example.
> 
> I remember having this discussion, and I was looking for it to check on
> the details, but I can't seem to find it either in my inbox or in the
> archives.  Can you please point me to that again?  Sorry for the bother.

It was in the "Correct cost model for strided loads" thread.

Richard.


Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread William J. Schmidt
On Tue, 2012-06-19 at 12:10 +0200, Richard Guenther wrote:
> On Mon, 18 Jun 2012, William J. Schmidt wrote:
> 
> > On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote:
> > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > > > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > > > 
> > > 
> > > > 
> > > > Hmm.  I don't like this patch or its general idea too much.  Instead
> > > > I'd like us to move more of the cost model detail to the target, giving
> > > > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > > > posting the overall idea at some point, but let me repeat it here 
> > > > instead
> > > > of trying to find that e-mail.
> > > > 
> > > > The basic interface of the cost model should be, in targetm.vectorize
> > > > 
> > > >   /* Tell the target to start cost analysis of a loop or a basic-block
> > > >  (if the loop argument is NULL).  Returns an opaque pointer to
> > > >  target-private data.  */
> > > >   void *init_cost (struct loop *loop);
> > > > 
> > > >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> > > >   void add_stmt_cost (void *data, unsigned n,
> > > >   vectorized-stmt-kind,
> > > >   enum machine_mode vector_mode);
> > > > 
> > > >   /* Tell the target to compute and return the cost of the accumulated
> > > >  statements and free any target-private data.  */
> > > >   unsigned finish_cost (void *data);
> > 
> > By the way, I don't see much point in passing the void *data around
> > here.  Too many levels of interfaces that we'd have to pass it around in
> > the vectorizer, so it would just sit in a static variable.  Might as
> > well let the data be wholly private to the target.
> 
> Ok, so you'd have void init_cost (struct loop *) and
> unsigned finish_cost (void); then?  Static variables are of couse
> not properly "abstracted" so we can't ever compute two set of costs
> at the same time ... but that's true all-over-the-place in GCC ...

It's a fair point, and perhaps I'll decide to pass the data pointer
around anyway to keep that option open.  We'll see which looks uglier.

> 
> With previous discussion the add_stmt_cost hook would be split up
> to also allow passing the operation code for example.

I remember having this discussion, and I was looking for it to check on
the details, but I can't seem to find it either in my inbox or in the
archives.  Can you please point me to that again?  Sorry for the bother.

Thanks,
Bill

> 
> Richard.
> 



Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Tue, 19 Jun 2012, William J. Schmidt wrote:

> On Tue, 2012-06-19 at 12:08 +0200, Richard Guenther wrote:
> > On Mon, 18 Jun 2012, William J. Schmidt wrote:
> > 
> > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > > > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > > > 
> > > 
> > > > 
> > > > Hmm.  I don't like this patch or its general idea too much.  Instead
> > > > I'd like us to move more of the cost model detail to the target, giving
> > > > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > > > posting the overall idea at some point, but let me repeat it here 
> > > > instead
> > > > of trying to find that e-mail.
> > > > 
> > > > The basic interface of the cost model should be, in targetm.vectorize
> > > > 
> > > >   /* Tell the target to start cost analysis of a loop or a basic-block
> > > >  (if the loop argument is NULL).  Returns an opaque pointer to
> > > >  target-private data.  */
> > > >   void *init_cost (struct loop *loop);
> > > > 
> > > >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> > > >   void add_stmt_cost (void *data, unsigned n,
> > > >   vectorized-stmt-kind,
> > > >   enum machine_mode vector_mode);
> > > > 
> > > >   /* Tell the target to compute and return the cost of the accumulated
> > > >  statements and free any target-private data.  */
> > > >   unsigned finish_cost (void *data);
> > > > 
> > > > with eventually slightly different signatures for add_stmt_cost
> > > > (like pass in the original scalar stmt?).
> > > > 
> > > > It allows the target, at finish_cost time, to evaluate things like
> > > > register pressure and resource utilization.
> > > > 
> > > > Thanks,
> > > > Richard.
> > > 
> > > I've been looking at this in between other projects.  I wanted to be
> > > sure I understood the SLP infrastructure and whether it would cause any
> > > problems.  It looks to me like it will be mostly ok.  One issue I
> > > noticed is a possible difference in the order in which SLP instructions
> > > are analyzed and the order in which the instructions are "issued" during
> > > transformation.
> > > 
> > > For both loop analysis and basic block analysis, SLP trees are
> > > constructed and analyzed prior to examining other vectorizable
> > > instructions.  Their costs are calculated and stored in the SLP trees at
> > > this time.  Later, when transforming statements to their vector
> > > equivalents, instructions in the block (or loop body) are processed in
> > > order until the first instruction that's part of an SLP tree is
> > > encountered.  At that point, every instruction that's part of any SLP
> > > tree is transformed; then the vectorizer continues with the remaining
> > > non-SLP vectorizable statements.
> > > 
> > > So if we do the natural and easy thing of placing calls to add_stmt_cost
> > > everywhere that costs are calculated today, the order that those costs
> > > are presented to the back end model will possibly be different than the
> > > order they are actually "emitted."
> > 
> > Interesting.  But I suppose this is similar to how pattern statements
> > are handled?  Thus, the whole pattern sequence is processed when
> > we encounter the "main" pattern statement?
> 
> Yes, but the difference is that both vect_analyze_stmt and
> vect_transform_loop handle the pattern statements in the same order
> (thankfully -- I would hate to have to deal with the pattern mess).
> With SLP, all SLP statements are analyzed ahead of time, but they aren't
> transformed until one of them is encountered in the statement walk.

Ah, ok.  I suppose we can simply declare that when we register
vectorized stmts with the backend they are in arbitrary oder.
After all this is not supposed to be another machine dependent reorg
phase (to quote David).

> > 
> > > For a first cut at this, I suggest ignoring the problem other than to
> > > document it as an opportunity for improvement.  Later we could improve
> > > it by using an add_stmt_slp_cost () interface (or adding an is_slp
> > > flag), and another interface to be called at the time during analysis
> > > when the SLP statements will be issued during transformation.  This
> > > would allow the back end model to queue up the SLP costs in a separate
> > > vector and later place them in its internal structures at the
> > > appropriate place.
> > >
> > > It should eventually be possible to remove these fields/accessors:
> > > 
> > >  * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST
> > >  * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST
> > >  * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST
> > > 
> > > However, I think this should be delayed until we have the basic
> > > infrastructure in place for the new model and well-tested.
> > 
> > Indeed.
> > 
> > > The other issue is that we should have the model track both the inside
> > > and outside costs if we're going to get everything into the target
> > > model.  For a first pass we can ignore this 

Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread William J. Schmidt
On Tue, 2012-06-19 at 12:08 +0200, Richard Guenther wrote:
> On Mon, 18 Jun 2012, William J. Schmidt wrote:
> 
> > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > > 
> > 
> > > 
> > > Hmm.  I don't like this patch or its general idea too much.  Instead
> > > I'd like us to move more of the cost model detail to the target, giving
> > > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > > posting the overall idea at some point, but let me repeat it here instead
> > > of trying to find that e-mail.
> > > 
> > > The basic interface of the cost model should be, in targetm.vectorize
> > > 
> > >   /* Tell the target to start cost analysis of a loop or a basic-block
> > >  (if the loop argument is NULL).  Returns an opaque pointer to
> > >  target-private data.  */
> > >   void *init_cost (struct loop *loop);
> > > 
> > >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> > >   void add_stmt_cost (void *data, unsigned n,
> > > vectorized-stmt-kind,
> > >   enum machine_mode vector_mode);
> > > 
> > >   /* Tell the target to compute and return the cost of the accumulated
> > >  statements and free any target-private data.  */
> > >   unsigned finish_cost (void *data);
> > > 
> > > with eventually slightly different signatures for add_stmt_cost
> > > (like pass in the original scalar stmt?).
> > > 
> > > It allows the target, at finish_cost time, to evaluate things like
> > > register pressure and resource utilization.
> > > 
> > > Thanks,
> > > Richard.
> > 
> > I've been looking at this in between other projects.  I wanted to be
> > sure I understood the SLP infrastructure and whether it would cause any
> > problems.  It looks to me like it will be mostly ok.  One issue I
> > noticed is a possible difference in the order in which SLP instructions
> > are analyzed and the order in which the instructions are "issued" during
> > transformation.
> > 
> > For both loop analysis and basic block analysis, SLP trees are
> > constructed and analyzed prior to examining other vectorizable
> > instructions.  Their costs are calculated and stored in the SLP trees at
> > this time.  Later, when transforming statements to their vector
> > equivalents, instructions in the block (or loop body) are processed in
> > order until the first instruction that's part of an SLP tree is
> > encountered.  At that point, every instruction that's part of any SLP
> > tree is transformed; then the vectorizer continues with the remaining
> > non-SLP vectorizable statements.
> > 
> > So if we do the natural and easy thing of placing calls to add_stmt_cost
> > everywhere that costs are calculated today, the order that those costs
> > are presented to the back end model will possibly be different than the
> > order they are actually "emitted."
> 
> Interesting.  But I suppose this is similar to how pattern statements
> are handled?  Thus, the whole pattern sequence is processed when
> we encounter the "main" pattern statement?

Yes, but the difference is that both vect_analyze_stmt and
vect_transform_loop handle the pattern statements in the same order
(thankfully -- I would hate to have to deal with the pattern mess).
With SLP, all SLP statements are analyzed ahead of time, but they aren't
transformed until one of them is encountered in the statement walk.

> 
> > For a first cut at this, I suggest ignoring the problem other than to
> > document it as an opportunity for improvement.  Later we could improve
> > it by using an add_stmt_slp_cost () interface (or adding an is_slp
> > flag), and another interface to be called at the time during analysis
> > when the SLP statements will be issued during transformation.  This
> > would allow the back end model to queue up the SLP costs in a separate
> > vector and later place them in its internal structures at the
> > appropriate place.
> >
> > It should eventually be possible to remove these fields/accessors:
> > 
> >  * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST
> >  * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST
> >  * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST
> > 
> > However, I think this should be delayed until we have the basic
> > infrastructure in place for the new model and well-tested.
> 
> Indeed.
> 
> > The other issue is that we should have the model track both the inside
> > and outside costs if we're going to get everything into the target
> > model.  For a first pass we can ignore this and keep the existing logic
> > for the outside costs.  Later we should add some interfaces analogous to
> > add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so
> > the model can track this stuff as carefully as it wants to.
> 
> Outside costs are merely added to the niter * inner-cost metric to
> be compared with the scalar cost niter * scalar-cost, right?  Thus
> they would be tracked completely separate - eventually similar to
> how we compute 

Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Mon, 18 Jun 2012, William J. Schmidt wrote:

> On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote:
> > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > > 
> > 
> > > 
> > > Hmm.  I don't like this patch or its general idea too much.  Instead
> > > I'd like us to move more of the cost model detail to the target, giving
> > > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > > posting the overall idea at some point, but let me repeat it here instead
> > > of trying to find that e-mail.
> > > 
> > > The basic interface of the cost model should be, in targetm.vectorize
> > > 
> > >   /* Tell the target to start cost analysis of a loop or a basic-block
> > >  (if the loop argument is NULL).  Returns an opaque pointer to
> > >  target-private data.  */
> > >   void *init_cost (struct loop *loop);
> > > 
> > >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> > >   void add_stmt_cost (void *data, unsigned n,
> > > vectorized-stmt-kind,
> > >   enum machine_mode vector_mode);
> > > 
> > >   /* Tell the target to compute and return the cost of the accumulated
> > >  statements and free any target-private data.  */
> > >   unsigned finish_cost (void *data);
> 
> By the way, I don't see much point in passing the void *data around
> here.  Too many levels of interfaces that we'd have to pass it around in
> the vectorizer, so it would just sit in a static variable.  Might as
> well let the data be wholly private to the target.

Ok, so you'd have void init_cost (struct loop *) and
unsigned finish_cost (void); then?  Static variables are of couse
not properly "abstracted" so we can't ever compute two set of costs
at the same time ... but that's true all-over-the-place in GCC ...

With previous discussion the add_stmt_cost hook would be split up
to also allow passing the operation code for example.

Richard.


Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Mon, 18 Jun 2012, William J. Schmidt wrote:

> On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > 
> 
> > 
> > Hmm.  I don't like this patch or its general idea too much.  Instead
> > I'd like us to move more of the cost model detail to the target, giving
> > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > posting the overall idea at some point, but let me repeat it here instead
> > of trying to find that e-mail.
> > 
> > The basic interface of the cost model should be, in targetm.vectorize
> > 
> >   /* Tell the target to start cost analysis of a loop or a basic-block
> >  (if the loop argument is NULL).  Returns an opaque pointer to
> >  target-private data.  */
> >   void *init_cost (struct loop *loop);
> > 
> >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> >   void add_stmt_cost (void *data, unsigned n,
> >   vectorized-stmt-kind,
> >   enum machine_mode vector_mode);
> > 
> >   /* Tell the target to compute and return the cost of the accumulated
> >  statements and free any target-private data.  */
> >   unsigned finish_cost (void *data);
> > 
> > with eventually slightly different signatures for add_stmt_cost
> > (like pass in the original scalar stmt?).
> > 
> > It allows the target, at finish_cost time, to evaluate things like
> > register pressure and resource utilization.
> > 
> > Thanks,
> > Richard.
> 
> I've been looking at this in between other projects.  I wanted to be
> sure I understood the SLP infrastructure and whether it would cause any
> problems.  It looks to me like it will be mostly ok.  One issue I
> noticed is a possible difference in the order in which SLP instructions
> are analyzed and the order in which the instructions are "issued" during
> transformation.
> 
> For both loop analysis and basic block analysis, SLP trees are
> constructed and analyzed prior to examining other vectorizable
> instructions.  Their costs are calculated and stored in the SLP trees at
> this time.  Later, when transforming statements to their vector
> equivalents, instructions in the block (or loop body) are processed in
> order until the first instruction that's part of an SLP tree is
> encountered.  At that point, every instruction that's part of any SLP
> tree is transformed; then the vectorizer continues with the remaining
> non-SLP vectorizable statements.
> 
> So if we do the natural and easy thing of placing calls to add_stmt_cost
> everywhere that costs are calculated today, the order that those costs
> are presented to the back end model will possibly be different than the
> order they are actually "emitted."

Interesting.  But I suppose this is similar to how pattern statements
are handled?  Thus, the whole pattern sequence is processed when
we encounter the "main" pattern statement?

> For a first cut at this, I suggest ignoring the problem other than to
> document it as an opportunity for improvement.  Later we could improve
> it by using an add_stmt_slp_cost () interface (or adding an is_slp
> flag), and another interface to be called at the time during analysis
> when the SLP statements will be issued during transformation.  This
> would allow the back end model to queue up the SLP costs in a separate
> vector and later place them in its internal structures at the
> appropriate place.
>
> It should eventually be possible to remove these fields/accessors:
> 
>  * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST
>  * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST
>  * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST
> 
> However, I think this should be delayed until we have the basic
> infrastructure in place for the new model and well-tested.

Indeed.
 
> The other issue is that we should have the model track both the inside
> and outside costs if we're going to get everything into the target
> model.  For a first pass we can ignore this and keep the existing logic
> for the outside costs.  Later we should add some interfaces analogous to
> add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so
> the model can track this stuff as carefully as it wants to.

Outside costs are merely added to the niter * inner-cost metric to
be compared with the scalar cost niter * scalar-cost, right?  Thus
they would be tracked completely separate - eventually similar to
how we compute the cost of the scalar loop.

> So, I'd propose going at this in several phases:
> 
> (1) Add calls to the new interface without disturbing existing logic;
> modify the profitability algorithms to query the new model for inside
> costs.  Default algorithm for the model is to just sum costs as is done
> today.

Right.

> (x) Add heuristics to target models as desired.
> (2) Handle the SLP ordering problem.
> (3) Handle outside costs in the target model.
> (4) Remove the now unnecessary cost fields and the calls that set them.
> 
> Item (x) can happen anytime after item (1).
> 
>

Re: [PATCH] Add vector cost model density heuristic

2012-06-18 Thread William J. Schmidt
On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote:
> On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > 
> 
> > 
> > Hmm.  I don't like this patch or its general idea too much.  Instead
> > I'd like us to move more of the cost model detail to the target, giving
> > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > posting the overall idea at some point, but let me repeat it here instead
> > of trying to find that e-mail.
> > 
> > The basic interface of the cost model should be, in targetm.vectorize
> > 
> >   /* Tell the target to start cost analysis of a loop or a basic-block
> >  (if the loop argument is NULL).  Returns an opaque pointer to
> >  target-private data.  */
> >   void *init_cost (struct loop *loop);
> > 
> >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> >   void add_stmt_cost (void *data, unsigned n,
> >   vectorized-stmt-kind,
> >   enum machine_mode vector_mode);
> > 
> >   /* Tell the target to compute and return the cost of the accumulated
> >  statements and free any target-private data.  */
> >   unsigned finish_cost (void *data);

By the way, I don't see much point in passing the void *data around
here.  Too many levels of interfaces that we'd have to pass it around in
the vectorizer, so it would just sit in a static variable.  Might as
well let the data be wholly private to the target.
> > 
> > with eventually slightly different signatures for add_stmt_cost
> > (like pass in the original scalar stmt?).
> > 
> > It allows the target, at finish_cost time, to evaluate things like
> > register pressure and resource utilization.
> > 
> > Thanks,
> > Richard.




Re: [PATCH] Add vector cost model density heuristic

2012-06-18 Thread William J. Schmidt
On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> On Fri, 8 Jun 2012, William J. Schmidt wrote:
> 

> 
> Hmm.  I don't like this patch or its general idea too much.  Instead
> I'd like us to move more of the cost model detail to the target, giving
> it a chance to look at the whole loop before deciding on a cost.  ISTR
> posting the overall idea at some point, but let me repeat it here instead
> of trying to find that e-mail.
> 
> The basic interface of the cost model should be, in targetm.vectorize
> 
>   /* Tell the target to start cost analysis of a loop or a basic-block
>  (if the loop argument is NULL).  Returns an opaque pointer to
>  target-private data.  */
>   void *init_cost (struct loop *loop);
> 
>   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
>   void add_stmt_cost (void *data, unsigned n,
> vectorized-stmt-kind,
>   enum machine_mode vector_mode);
> 
>   /* Tell the target to compute and return the cost of the accumulated
>  statements and free any target-private data.  */
>   unsigned finish_cost (void *data);
> 
> with eventually slightly different signatures for add_stmt_cost
> (like pass in the original scalar stmt?).
> 
> It allows the target, at finish_cost time, to evaluate things like
> register pressure and resource utilization.
> 
> Thanks,
> Richard.

I've been looking at this in between other projects.  I wanted to be
sure I understood the SLP infrastructure and whether it would cause any
problems.  It looks to me like it will be mostly ok.  One issue I
noticed is a possible difference in the order in which SLP instructions
are analyzed and the order in which the instructions are "issued" during
transformation.

For both loop analysis and basic block analysis, SLP trees are
constructed and analyzed prior to examining other vectorizable
instructions.  Their costs are calculated and stored in the SLP trees at
this time.  Later, when transforming statements to their vector
equivalents, instructions in the block (or loop body) are processed in
order until the first instruction that's part of an SLP tree is
encountered.  At that point, every instruction that's part of any SLP
tree is transformed; then the vectorizer continues with the remaining
non-SLP vectorizable statements.

So if we do the natural and easy thing of placing calls to add_stmt_cost
everywhere that costs are calculated today, the order that those costs
are presented to the back end model will possibly be different than the
order they are actually "emitted."

For a first cut at this, I suggest ignoring the problem other than to
document it as an opportunity for improvement.  Later we could improve
it by using an add_stmt_slp_cost () interface (or adding an is_slp
flag), and another interface to be called at the time during analysis
when the SLP statements will be issued during transformation.  This
would allow the back end model to queue up the SLP costs in a separate
vector and later place them in its internal structures at the
appropriate place.

It should eventually be possible to remove these fields/accessors:

 * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST
 * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST
 * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST

However, I think this should be delayed until we have the basic
infrastructure in place for the new model and well-tested.

The other issue is that we should have the model track both the inside
and outside costs if we're going to get everything into the target
model.  For a first pass we can ignore this and keep the existing logic
for the outside costs.  Later we should add some interfaces analogous to
add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so
the model can track this stuff as carefully as it wants to.

So, I'd propose going at this in several phases:

(1) Add calls to the new interface without disturbing existing logic;
modify the profitability algorithms to query the new model for inside
costs.  Default algorithm for the model is to just sum costs as is done
today.
(x) Add heuristics to target models as desired.
(2) Handle the SLP ordering problem.
(3) Handle outside costs in the target model.
(4) Remove the now unnecessary cost fields and the calls that set them.

Item (x) can happen anytime after item (1).

I don't think this work is terribly difficult, just a bit tedious.  The
only really time-consuming aspect of it will be in very careful testing
to keep from changing existing behavior.

All comments welcome -- please let me know what you think.

Thanks,
Bill





Re: [PATCH] Add vector cost model density heuristic

2012-06-11 Thread Richard Guenther
On Mon, Jun 11, 2012 at 5:38 PM, William J. Schmidt
 wrote:
>
>
> On Mon, 2012-06-11 at 11:09 -0400, David Edelsohn wrote:
>> On Mon, Jun 11, 2012 at 10:55 AM, Richard Guenther  wrote:
>>
>> > Well, they are at least magic numbers and heuristics that apply
>> > generally and not only to the single issue in sphinx.  And in
>> > fact how it works for sphinx _is_ magic.
>> >
>> >> Second, I suggest that you need to rephrase "I can make you" and
>> >> re-send your reply.
>> >
>> > Sorry for my bad english.  Consider it meaning that I'd rather have
>> > you think about a more proper solution.  That's what patch review
>> > is about after all, no?  Sometimes a complete re-write (which gets
>> > more difficult which each of the patches "enhancing" the not ideal
>> > current state) is the best thing to do.
>>
>> Richard,
>>
>> The values of the heuristics may be "magic", but Bill believes the
>> heuristics are testing the important characteristics.  The heuristics
>> themselves are controlled by hooks, so the target can set the correct
>> values for their own requirements.
>>
>> The concern is that a general cost infrastructure is too general.
>> And, based on history, all ports simply will copy the boilerplate from
>> the first implementation. It also may cause more problems because the
>> target has relatively little information to be able to judge
>> heuristics at that point in the middle-end. If the targets start to
>> get too "cute" or too complicated, it may cause more problems or more
>> confusion about why more complicated heuristics are not effective and
>> not producing the expected results.
>>
>> I worry about creating another machine dependent reorg catch-all pass.
>>
>> Maybe an incremental pre- and/or post- cost hook would be more
>> effective. I will let Bill comment.
>
> Thanks David,
>
> I can see both sides of this, and it's hard to judge the future from
> where I stand.  My belief is that the number of heuristics targets will
> implement will be fairly limited, since judgments about cycle-level
> costs are not accurately predictable during the middle end.  All we can
> do is come up with a few things that seem to make sense.  Doing too much
> in the back end seems impractical.
>
> The interesting question to me is whether cost model heuristics are
> general enough to be reusable.  What I saw in this case was what I
> considered to be a somewhat target-neutral problem:  overwhelming those
> assets of the processor that implement vectorization.  It seemed
> reasonable to provide hooks for others to use the idea if they encounter
> similar issues.  If reusing the heuristic is useful, then having to copy
> the logic from one target to another isn't the best approach.  If nobody
> else will ever use it, then embedding it in the back end is reasonable.
> Unfortunately my crystal ball has been on the fritz for several decades,
> so I can't tell you for sure which is right...
>
> Richard, my biggest question is whether you think other targets are
> likely to take advantage of a more general back-end interface, or
> whether this will end up just being a PowerPC wart.  If you know of ways
> this will be useful for i386, that would be helpful to know.  Perhaps
> this requires your crystal ball as well; not sure how well yours
> works...
>
> If we look at just this one issue in isolation, then changing all the
> code in the vectorizer that calculates inside/outside loop costs and
> moving it to targetm seems more invasive than adding the few hooks.  But
> if this will really be a useful feature for the community as a whole I
> am certainly willing to tackle it.

Well.  At the moment the cost model is purely "local" in that the target
receives no context about the cost it has to return for (part of) a vectorized
operation.  In fact the vectorizer already creates multiple
instructions (and thus
calls to the cost target hook) for a single scalar statement in some cases.
There are two ways to increase the context the target can look at - invent
more kinds of "statements" we pass to the cost model hook, and thus
arrive at a point where for each group of stmts the vectorizer emits and
that possibly has a more efficient target implementation we have a different
statement kind (only for the cost model, we continue to generate multiple
GIMPLE stmts).  The other way is to give the target the ability to see context,
thus relation of the (sub-)statements the vectorizer asks the target to compute
costs for.  I've run into the situation where there was missing context on x86,
too (but I forgot the specific case), and that's where I came up with
the general idea.
Of course both work, but for your kind of heuristic only looking at
context works.
In the scheme I suggested the fini_cost hook would be able to do the counting
(as would a simple counter - I don't believe that this would not work - you do
counting, too, after all, simply make the fini hook able to reject the
vectorization).
If powerpc is issue-limited then you h

Re: [PATCH] Add vector cost model density heuristic

2012-06-11 Thread William J. Schmidt


On Mon, 2012-06-11 at 11:09 -0400, David Edelsohn wrote:
> On Mon, Jun 11, 2012 at 10:55 AM, Richard Guenther  wrote:
> 
> > Well, they are at least magic numbers and heuristics that apply
> > generally and not only to the single issue in sphinx.  And in
> > fact how it works for sphinx _is_ magic.
> >
> >> Second, I suggest that you need to rephrase "I can make you" and
> >> re-send your reply.
> >
> > Sorry for my bad english.  Consider it meaning that I'd rather have
> > you think about a more proper solution.  That's what patch review
> > is about after all, no?  Sometimes a complete re-write (which gets
> > more difficult which each of the patches "enhancing" the not ideal
> > current state) is the best thing to do.
> 
> Richard,
> 
> The values of the heuristics may be "magic", but Bill believes the
> heuristics are testing the important characteristics.  The heuristics
> themselves are controlled by hooks, so the target can set the correct
> values for their own requirements.
> 
> The concern is that a general cost infrastructure is too general.
> And, based on history, all ports simply will copy the boilerplate from
> the first implementation. It also may cause more problems because the
> target has relatively little information to be able to judge
> heuristics at that point in the middle-end. If the targets start to
> get too "cute" or too complicated, it may cause more problems or more
> confusion about why more complicated heuristics are not effective and
> not producing the expected results.
> 
> I worry about creating another machine dependent reorg catch-all pass.
> 
> Maybe an incremental pre- and/or post- cost hook would be more
> effective. I will let Bill comment.

Thanks David,

I can see both sides of this, and it's hard to judge the future from
where I stand.  My belief is that the number of heuristics targets will
implement will be fairly limited, since judgments about cycle-level
costs are not accurately predictable during the middle end.  All we can
do is come up with a few things that seem to make sense.  Doing too much
in the back end seems impractical.

The interesting question to me is whether cost model heuristics are
general enough to be reusable.  What I saw in this case was what I
considered to be a somewhat target-neutral problem:  overwhelming those
assets of the processor that implement vectorization.  It seemed
reasonable to provide hooks for others to use the idea if they encounter
similar issues.  If reusing the heuristic is useful, then having to copy
the logic from one target to another isn't the best approach.  If nobody
else will ever use it, then embedding it in the back end is reasonable.
Unfortunately my crystal ball has been on the fritz for several decades,
so I can't tell you for sure which is right...

Richard, my biggest question is whether you think other targets are
likely to take advantage of a more general back-end interface, or
whether this will end up just being a PowerPC wart.  If you know of ways
this will be useful for i386, that would be helpful to know.  Perhaps
this requires your crystal ball as well; not sure how well yours
works...

If we look at just this one issue in isolation, then changing all the
code in the vectorizer that calculates inside/outside loop costs and
moving it to targetm seems more invasive than adding the few hooks.  But
if this will really be a useful feature for the community as a whole I
am certainly willing to tackle it.

Thanks,
Bill

> 
> Thanks, David
> 



Re: [PATCH] Add vector cost model density heuristic

2012-06-11 Thread William J. Schmidt
On Mon, 2012-06-11 at 16:58 +0200, Richard Guenther wrote:
> On Mon, 11 Jun 2012, Richard Guenther wrote:
> 
> > On Mon, 11 Jun 2012, William J. Schmidt wrote:
> > 
> > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > > > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > > > 
> > > > > This patch adds a heuristic to the vectorizer when estimating the
> > > > > minimum profitable number of iterations.  The heuristic is
> > > > > target-dependent, and is currently disabled for all targets except
> > > > > PowerPC.  However, the intent is to make it general enough to be 
> > > > > useful
> > > > > for other targets that want to opt in.
> > > > > 
> > > > > A previous patch addressed some PowerPC SPEC degradations by modifying
> > > > > the vector cost model values for vec_perm and vec_promote_demote.  The
> > > > > values were set a little higher than their natural values because the
> > > > > natural values were not sufficient to prevent a poor vectorization
> > > > > choice.  However, this is not the right long-term solution, since it 
> > > > > can
> > > > > unnecessarily constrain other vectorization choices involving permute
> > > > > instructions.
> > > > > 
> > > > > Analysis of the badly vectorized loop (in sphinx3) showed that the
> > > > > problem was overcommitment of vector resources -- too many vector
> > > > > instructions issued without enough non-vector instructions available 
> > > > > to
> > > > > cover the delays.  The vector cost model assumes that instructions
> > > > > always have a constant cost, and doesn't have a way of judging this 
> > > > > kind
> > > > > of "density" of vector instructions.
> > > > > 
> > > > > The present patch adds a heuristic to recognize when a loop is likely 
> > > > > to
> > > > > overcommit resources, and adds a small penalty to the inside-loop cost
> > > > > to account for the expected stalls.  The heuristic is parameterized 
> > > > > with
> > > > > three target-specific values:
> > > > > 
> > > > >  * Density threshold: The heuristic will apply only when the
> > > > >percentage of inside-loop cost attributable to vectorized
> > > > >instructions exceeds this value.
> > > > > 
> > > > >  * Size threshold: The heuristic will apply only when the
> > > > >inside-loop cost exceeds this value.
> > > > > 
> > > > >  * Penalty: The inside-loop cost will be increased by this
> > > > >percentage value when the heuristic applies.
> > > > > 
> > > > > Thus only reasonably large loop bodies that are mostly vectorized
> > > > > instructions will be affected.
> > > > > 
> > > > > By applying only a small percentage bump to the inside-loop cost, the
> > > > > heuristic will only turn off vectorization for loops that were
> > > > > considered "barely profitable" to begin with (such as the sphinx3 
> > > > > loop).
> > > > > So the heuristic is quite conservative and should not affect the vast
> > > > > majority of vectorization decisions.
> > > > > 
> > > > > Together with the new heuristic, this patch reduces the vec_perm and
> > > > > vec_promote_demote costs for PowerPC to their natural values.
> > > > > 
> > > > > I've regstrapped this with no regressions on 
> > > > > powerpc64-unknown-linux-gnu
> > > > > and verified that no performance regressions occur on SPEC cpu2006.  
> > > > > Is
> > > > > this ok for trunk?
> > > > 
> > > > Hmm.  I don't like this patch or its general idea too much.  Instead
> > > > I'd like us to move more of the cost model detail to the target, giving
> > > > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > > > posting the overall idea at some point, but let me repeat it here 
> > > > instead
> > > > of trying to find that e-mail.
> > > > 
> > > > The basic interface of the cost model should be, in targetm.vectorize
> > > > 
> > > >   /* Tell the target to start cost analysis of a loop or a basic-block
> > > >  (if the loop argument is NULL).  Returns an opaque pointer to
> > > >  target-private data.  */
> > > >   void *init_cost (struct loop *loop);
> > > > 
> > > >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> > > >   void add_stmt_cost (void *data, unsigned n,
> > > >   vectorized-stmt-kind,
> > > >   enum machine_mode vector_mode);
> > > > 
> > > >   /* Tell the target to compute and return the cost of the accumulated
> > > >  statements and free any target-private data.  */
> > > >   unsigned finish_cost (void *data);
> > > > 
> > > > with eventually slightly different signatures for add_stmt_cost
> > > > (like pass in the original scalar stmt?).
> > > > 
> > > > It allows the target, at finish_cost time, to evaluate things like
> > > > register pressure and resource utilization.
> > > 
> > > OK, I'm trying to understand how you would want this built into the
> > > present structure.  Taking just the loop case for now:
> > > 
> > > Judging by your suggested API, we would have to call add_stmt_cost 

Re: [PATCH] Add vector cost model density heuristic

2012-06-11 Thread David Edelsohn
On Mon, Jun 11, 2012 at 10:55 AM, Richard Guenther  wrote:

> Well, they are at least magic numbers and heuristics that apply
> generally and not only to the single issue in sphinx.  And in
> fact how it works for sphinx _is_ magic.
>
>> Second, I suggest that you need to rephrase "I can make you" and
>> re-send your reply.
>
> Sorry for my bad english.  Consider it meaning that I'd rather have
> you think about a more proper solution.  That's what patch review
> is about after all, no?  Sometimes a complete re-write (which gets
> more difficult which each of the patches "enhancing" the not ideal
> current state) is the best thing to do.

Richard,

The values of the heuristics may be "magic", but Bill believes the
heuristics are testing the important characteristics.  The heuristics
themselves are controlled by hooks, so the target can set the correct
values for their own requirements.

The concern is that a general cost infrastructure is too general.
And, based on history, all ports simply will copy the boilerplate from
the first implementation. It also may cause more problems because the
target has relatively little information to be able to judge
heuristics at that point in the middle-end. If the targets start to
get too "cute" or too complicated, it may cause more problems or more
confusion about why more complicated heuristics are not effective and
not producing the expected results.

I worry about creating another machine dependent reorg catch-all pass.

Maybe an incremental pre- and/or post- cost hook would be more
effective. I will let Bill comment.

Thanks, David


Re: [PATCH] Add vector cost model density heuristic

2012-06-11 Thread Richard Guenther
On Mon, 11 Jun 2012, Richard Guenther wrote:

> On Mon, 11 Jun 2012, William J. Schmidt wrote:
> 
> > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > > 
> > > > This patch adds a heuristic to the vectorizer when estimating the
> > > > minimum profitable number of iterations.  The heuristic is
> > > > target-dependent, and is currently disabled for all targets except
> > > > PowerPC.  However, the intent is to make it general enough to be useful
> > > > for other targets that want to opt in.
> > > > 
> > > > A previous patch addressed some PowerPC SPEC degradations by modifying
> > > > the vector cost model values for vec_perm and vec_promote_demote.  The
> > > > values were set a little higher than their natural values because the
> > > > natural values were not sufficient to prevent a poor vectorization
> > > > choice.  However, this is not the right long-term solution, since it can
> > > > unnecessarily constrain other vectorization choices involving permute
> > > > instructions.
> > > > 
> > > > Analysis of the badly vectorized loop (in sphinx3) showed that the
> > > > problem was overcommitment of vector resources -- too many vector
> > > > instructions issued without enough non-vector instructions available to
> > > > cover the delays.  The vector cost model assumes that instructions
> > > > always have a constant cost, and doesn't have a way of judging this kind
> > > > of "density" of vector instructions.
> > > > 
> > > > The present patch adds a heuristic to recognize when a loop is likely to
> > > > overcommit resources, and adds a small penalty to the inside-loop cost
> > > > to account for the expected stalls.  The heuristic is parameterized with
> > > > three target-specific values:
> > > > 
> > > >  * Density threshold: The heuristic will apply only when the
> > > >percentage of inside-loop cost attributable to vectorized
> > > >instructions exceeds this value.
> > > > 
> > > >  * Size threshold: The heuristic will apply only when the
> > > >inside-loop cost exceeds this value.
> > > > 
> > > >  * Penalty: The inside-loop cost will be increased by this
> > > >percentage value when the heuristic applies.
> > > > 
> > > > Thus only reasonably large loop bodies that are mostly vectorized
> > > > instructions will be affected.
> > > > 
> > > > By applying only a small percentage bump to the inside-loop cost, the
> > > > heuristic will only turn off vectorization for loops that were
> > > > considered "barely profitable" to begin with (such as the sphinx3 loop).
> > > > So the heuristic is quite conservative and should not affect the vast
> > > > majority of vectorization decisions.
> > > > 
> > > > Together with the new heuristic, this patch reduces the vec_perm and
> > > > vec_promote_demote costs for PowerPC to their natural values.
> > > > 
> > > > I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu
> > > > and verified that no performance regressions occur on SPEC cpu2006.  Is
> > > > this ok for trunk?
> > > 
> > > Hmm.  I don't like this patch or its general idea too much.  Instead
> > > I'd like us to move more of the cost model detail to the target, giving
> > > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > > posting the overall idea at some point, but let me repeat it here instead
> > > of trying to find that e-mail.
> > > 
> > > The basic interface of the cost model should be, in targetm.vectorize
> > > 
> > >   /* Tell the target to start cost analysis of a loop or a basic-block
> > >  (if the loop argument is NULL).  Returns an opaque pointer to
> > >  target-private data.  */
> > >   void *init_cost (struct loop *loop);
> > > 
> > >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> > >   void add_stmt_cost (void *data, unsigned n,
> > > vectorized-stmt-kind,
> > >   enum machine_mode vector_mode);
> > > 
> > >   /* Tell the target to compute and return the cost of the accumulated
> > >  statements and free any target-private data.  */
> > >   unsigned finish_cost (void *data);
> > > 
> > > with eventually slightly different signatures for add_stmt_cost
> > > (like pass in the original scalar stmt?).
> > > 
> > > It allows the target, at finish_cost time, to evaluate things like
> > > register pressure and resource utilization.
> > 
> > OK, I'm trying to understand how you would want this built into the
> > present structure.  Taking just the loop case for now:
> > 
> > Judging by your suggested API, we would have to call add_stmt_cost ()
> > everywhere that we now call stmt_vinfo_set_inside_of_loop_cost ().  For
> > now this would be an additional call, not a replacement, though maybe
> > the other goes away eventually.  This allows the target to save more
> > data about the vectorized instructions than just an accumulated cost
> > number (order and quantity of various kinds of instr

Re: [PATCH] Add vector cost model density heuristic

2012-06-11 Thread Richard Guenther
On Mon, 11 Jun 2012, David Edelsohn wrote:

> On Mon, Jun 11, 2012 at 10:48 AM, Richard Guenther  wrote:
> > On Mon, 11 Jun 2012, David Edelsohn wrote:
> >
> >> On Mon, Jun 11, 2012 at 7:40 AM, Richard Guenther  
> >> wrote:
> >>
> >> > Hmm.  I don't like this patch or its general idea too much.  Instead
> >> > I'd like us to move more of the cost model detail to the target, giving
> >> > it a chance to look at the whole loop before deciding on a cost.  ISTR
> >> > posting the overall idea at some point, but let me repeat it here instead
> >> > of trying to find that e-mail.
> >>
> >> Richard,
> >>
> >> Allowing the target to see the entire loop and have more control over
> >> the cost model is a fine goal, but the current code is using incorrect
> >> heuristics. Requiring a complete re-write of the auto-vectorizer cost
> >> model should not be a requirement for fixing the current code.
> >
> > If fixing the current code adds more "magic" hooks at the wrong spot
> > then yes, in stage1 I can make you think about a more proper solution
> > to the issue.
> 
> First, these are not magic hooks.

Well, they are at least magic numbers and heuristics that apply
generally and not only to the single issue in sphinx.  And in
fact how it works for sphinx _is_ magic.

> Second, I suggest that you need to rephrase "I can make you" and
> re-send your reply.

Sorry for my bad english.  Consider it meaning that I'd rather have
you think about a more proper solution.  That's what patch review
is about after all, no?  Sometimes a complete re-write (which gets
more difficult which each of the patches "enhancing" the not ideal
current state) is the best thing to do.

Richard.

Re: [PATCH] Add vector cost model density heuristic

2012-06-11 Thread David Edelsohn
On Mon, Jun 11, 2012 at 10:48 AM, Richard Guenther  wrote:
> On Mon, 11 Jun 2012, David Edelsohn wrote:
>
>> On Mon, Jun 11, 2012 at 7:40 AM, Richard Guenther  wrote:
>>
>> > Hmm.  I don't like this patch or its general idea too much.  Instead
>> > I'd like us to move more of the cost model detail to the target, giving
>> > it a chance to look at the whole loop before deciding on a cost.  ISTR
>> > posting the overall idea at some point, but let me repeat it here instead
>> > of trying to find that e-mail.
>>
>> Richard,
>>
>> Allowing the target to see the entire loop and have more control over
>> the cost model is a fine goal, but the current code is using incorrect
>> heuristics. Requiring a complete re-write of the auto-vectorizer cost
>> model should not be a requirement for fixing the current code.
>
> If fixing the current code adds more "magic" hooks at the wrong spot
> then yes, in stage1 I can make you think about a more proper solution
> to the issue.

First, these are not magic hooks.

Second, I suggest that you need to rephrase "I can make you" and
re-send your reply.

Thanks, David


Re: [PATCH] Add vector cost model density heuristic

2012-06-11 Thread Richard Guenther
On Mon, 11 Jun 2012, David Edelsohn wrote:

> On Mon, Jun 11, 2012 at 7:40 AM, Richard Guenther  wrote:
> 
> > Hmm.  I don't like this patch or its general idea too much.  Instead
> > I'd like us to move more of the cost model detail to the target, giving
> > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > posting the overall idea at some point, but let me repeat it here instead
> > of trying to find that e-mail.
> 
> Richard,
> 
> Allowing the target to see the entire loop and have more control over
> the cost model is a fine goal, but the current code is using incorrect
> heuristics. Requiring a complete re-write of the auto-vectorizer cost
> model should not be a requirement for fixing the current code.

If fixing the current code adds more "magic" hooks at the wrong spot
then yes, in stage1 I can make you think about a more proper solution
to the issue.

Richard.

Re: [PATCH] Add vector cost model density heuristic

2012-06-11 Thread Richard Guenther
On Mon, 11 Jun 2012, William J. Schmidt wrote:

> On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > 
> > > This patch adds a heuristic to the vectorizer when estimating the
> > > minimum profitable number of iterations.  The heuristic is
> > > target-dependent, and is currently disabled for all targets except
> > > PowerPC.  However, the intent is to make it general enough to be useful
> > > for other targets that want to opt in.
> > > 
> > > A previous patch addressed some PowerPC SPEC degradations by modifying
> > > the vector cost model values for vec_perm and vec_promote_demote.  The
> > > values were set a little higher than their natural values because the
> > > natural values were not sufficient to prevent a poor vectorization
> > > choice.  However, this is not the right long-term solution, since it can
> > > unnecessarily constrain other vectorization choices involving permute
> > > instructions.
> > > 
> > > Analysis of the badly vectorized loop (in sphinx3) showed that the
> > > problem was overcommitment of vector resources -- too many vector
> > > instructions issued without enough non-vector instructions available to
> > > cover the delays.  The vector cost model assumes that instructions
> > > always have a constant cost, and doesn't have a way of judging this kind
> > > of "density" of vector instructions.
> > > 
> > > The present patch adds a heuristic to recognize when a loop is likely to
> > > overcommit resources, and adds a small penalty to the inside-loop cost
> > > to account for the expected stalls.  The heuristic is parameterized with
> > > three target-specific values:
> > > 
> > >  * Density threshold: The heuristic will apply only when the
> > >percentage of inside-loop cost attributable to vectorized
> > >instructions exceeds this value.
> > > 
> > >  * Size threshold: The heuristic will apply only when the
> > >inside-loop cost exceeds this value.
> > > 
> > >  * Penalty: The inside-loop cost will be increased by this
> > >percentage value when the heuristic applies.
> > > 
> > > Thus only reasonably large loop bodies that are mostly vectorized
> > > instructions will be affected.
> > > 
> > > By applying only a small percentage bump to the inside-loop cost, the
> > > heuristic will only turn off vectorization for loops that were
> > > considered "barely profitable" to begin with (such as the sphinx3 loop).
> > > So the heuristic is quite conservative and should not affect the vast
> > > majority of vectorization decisions.
> > > 
> > > Together with the new heuristic, this patch reduces the vec_perm and
> > > vec_promote_demote costs for PowerPC to their natural values.
> > > 
> > > I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu
> > > and verified that no performance regressions occur on SPEC cpu2006.  Is
> > > this ok for trunk?
> > 
> > Hmm.  I don't like this patch or its general idea too much.  Instead
> > I'd like us to move more of the cost model detail to the target, giving
> > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > posting the overall idea at some point, but let me repeat it here instead
> > of trying to find that e-mail.
> > 
> > The basic interface of the cost model should be, in targetm.vectorize
> > 
> >   /* Tell the target to start cost analysis of a loop or a basic-block
> >  (if the loop argument is NULL).  Returns an opaque pointer to
> >  target-private data.  */
> >   void *init_cost (struct loop *loop);
> > 
> >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> >   void add_stmt_cost (void *data, unsigned n,
> >   vectorized-stmt-kind,
> >   enum machine_mode vector_mode);
> > 
> >   /* Tell the target to compute and return the cost of the accumulated
> >  statements and free any target-private data.  */
> >   unsigned finish_cost (void *data);
> > 
> > with eventually slightly different signatures for add_stmt_cost
> > (like pass in the original scalar stmt?).
> > 
> > It allows the target, at finish_cost time, to evaluate things like
> > register pressure and resource utilization.
> 
> OK, I'm trying to understand how you would want this built into the
> present structure.  Taking just the loop case for now:
> 
> Judging by your suggested API, we would have to call add_stmt_cost ()
> everywhere that we now call stmt_vinfo_set_inside_of_loop_cost ().  For
> now this would be an additional call, not a replacement, though maybe
> the other goes away eventually.  This allows the target to save more
> data about the vectorized instructions than just an accumulated cost
> number (order and quantity of various kinds of instructions can be
> maintained for better modeling).  Presumably the call to finish_cost
> would be done within vect_estimate_min_profitable_iters () to produce
> the final value of inside_cost for the loop.

Yes.  I didn't look in det

Re: [PATCH] Add vector cost model density heuristic

2012-06-11 Thread David Edelsohn
On Mon, Jun 11, 2012 at 7:40 AM, Richard Guenther  wrote:

> Hmm.  I don't like this patch or its general idea too much.  Instead
> I'd like us to move more of the cost model detail to the target, giving
> it a chance to look at the whole loop before deciding on a cost.  ISTR
> posting the overall idea at some point, but let me repeat it here instead
> of trying to find that e-mail.

Richard,

Allowing the target to see the entire loop and have more control over
the cost model is a fine goal, but the current code is using incorrect
heuristics. Requiring a complete re-write of the auto-vectorizer cost
model should not be a requirement for fixing the current code.

Thanks, David


Re: [PATCH] Add vector cost model density heuristic

2012-06-11 Thread William J. Schmidt
On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> On Fri, 8 Jun 2012, William J. Schmidt wrote:
> 
> > This patch adds a heuristic to the vectorizer when estimating the
> > minimum profitable number of iterations.  The heuristic is
> > target-dependent, and is currently disabled for all targets except
> > PowerPC.  However, the intent is to make it general enough to be useful
> > for other targets that want to opt in.
> > 
> > A previous patch addressed some PowerPC SPEC degradations by modifying
> > the vector cost model values for vec_perm and vec_promote_demote.  The
> > values were set a little higher than their natural values because the
> > natural values were not sufficient to prevent a poor vectorization
> > choice.  However, this is not the right long-term solution, since it can
> > unnecessarily constrain other vectorization choices involving permute
> > instructions.
> > 
> > Analysis of the badly vectorized loop (in sphinx3) showed that the
> > problem was overcommitment of vector resources -- too many vector
> > instructions issued without enough non-vector instructions available to
> > cover the delays.  The vector cost model assumes that instructions
> > always have a constant cost, and doesn't have a way of judging this kind
> > of "density" of vector instructions.
> > 
> > The present patch adds a heuristic to recognize when a loop is likely to
> > overcommit resources, and adds a small penalty to the inside-loop cost
> > to account for the expected stalls.  The heuristic is parameterized with
> > three target-specific values:
> > 
> >  * Density threshold: The heuristic will apply only when the
> >percentage of inside-loop cost attributable to vectorized
> >instructions exceeds this value.
> > 
> >  * Size threshold: The heuristic will apply only when the
> >inside-loop cost exceeds this value.
> > 
> >  * Penalty: The inside-loop cost will be increased by this
> >percentage value when the heuristic applies.
> > 
> > Thus only reasonably large loop bodies that are mostly vectorized
> > instructions will be affected.
> > 
> > By applying only a small percentage bump to the inside-loop cost, the
> > heuristic will only turn off vectorization for loops that were
> > considered "barely profitable" to begin with (such as the sphinx3 loop).
> > So the heuristic is quite conservative and should not affect the vast
> > majority of vectorization decisions.
> > 
> > Together with the new heuristic, this patch reduces the vec_perm and
> > vec_promote_demote costs for PowerPC to their natural values.
> > 
> > I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu
> > and verified that no performance regressions occur on SPEC cpu2006.  Is
> > this ok for trunk?
> 
> Hmm.  I don't like this patch or its general idea too much.  Instead
> I'd like us to move more of the cost model detail to the target, giving
> it a chance to look at the whole loop before deciding on a cost.  ISTR
> posting the overall idea at some point, but let me repeat it here instead
> of trying to find that e-mail.
> 
> The basic interface of the cost model should be, in targetm.vectorize
> 
>   /* Tell the target to start cost analysis of a loop or a basic-block
>  (if the loop argument is NULL).  Returns an opaque pointer to
>  target-private data.  */
>   void *init_cost (struct loop *loop);
> 
>   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
>   void add_stmt_cost (void *data, unsigned n,
> vectorized-stmt-kind,
>   enum machine_mode vector_mode);
> 
>   /* Tell the target to compute and return the cost of the accumulated
>  statements and free any target-private data.  */
>   unsigned finish_cost (void *data);
> 
> with eventually slightly different signatures for add_stmt_cost
> (like pass in the original scalar stmt?).
> 
> It allows the target, at finish_cost time, to evaluate things like
> register pressure and resource utilization.

OK, I'm trying to understand how you would want this built into the
present structure.  Taking just the loop case for now:

Judging by your suggested API, we would have to call add_stmt_cost ()
everywhere that we now call stmt_vinfo_set_inside_of_loop_cost ().  For
now this would be an additional call, not a replacement, though maybe
the other goes away eventually.  This allows the target to save more
data about the vectorized instructions than just an accumulated cost
number (order and quantity of various kinds of instructions can be
maintained for better modeling).  Presumably the call to finish_cost
would be done within vect_estimate_min_profitable_iters () to produce
the final value of inside_cost for the loop.

The default target hook for add_stmt_cost would duplicate what we
currently do for calculating the inside_cost of a statement, and the
default target hook for finish_cost would just return the sum.

I'll have to go hunting where the similar code would fit for SL

Re: [PATCH] Add vector cost model density heuristic

2012-06-11 Thread Richard Guenther
On Fri, 8 Jun 2012, William J. Schmidt wrote:

> This patch adds a heuristic to the vectorizer when estimating the
> minimum profitable number of iterations.  The heuristic is
> target-dependent, and is currently disabled for all targets except
> PowerPC.  However, the intent is to make it general enough to be useful
> for other targets that want to opt in.
> 
> A previous patch addressed some PowerPC SPEC degradations by modifying
> the vector cost model values for vec_perm and vec_promote_demote.  The
> values were set a little higher than their natural values because the
> natural values were not sufficient to prevent a poor vectorization
> choice.  However, this is not the right long-term solution, since it can
> unnecessarily constrain other vectorization choices involving permute
> instructions.
> 
> Analysis of the badly vectorized loop (in sphinx3) showed that the
> problem was overcommitment of vector resources -- too many vector
> instructions issued without enough non-vector instructions available to
> cover the delays.  The vector cost model assumes that instructions
> always have a constant cost, and doesn't have a way of judging this kind
> of "density" of vector instructions.
> 
> The present patch adds a heuristic to recognize when a loop is likely to
> overcommit resources, and adds a small penalty to the inside-loop cost
> to account for the expected stalls.  The heuristic is parameterized with
> three target-specific values:
> 
>  * Density threshold: The heuristic will apply only when the
>percentage of inside-loop cost attributable to vectorized
>instructions exceeds this value.
> 
>  * Size threshold: The heuristic will apply only when the
>inside-loop cost exceeds this value.
> 
>  * Penalty: The inside-loop cost will be increased by this
>percentage value when the heuristic applies.
> 
> Thus only reasonably large loop bodies that are mostly vectorized
> instructions will be affected.
> 
> By applying only a small percentage bump to the inside-loop cost, the
> heuristic will only turn off vectorization for loops that were
> considered "barely profitable" to begin with (such as the sphinx3 loop).
> So the heuristic is quite conservative and should not affect the vast
> majority of vectorization decisions.
> 
> Together with the new heuristic, this patch reduces the vec_perm and
> vec_promote_demote costs for PowerPC to their natural values.
> 
> I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu
> and verified that no performance regressions occur on SPEC cpu2006.  Is
> this ok for trunk?

Hmm.  I don't like this patch or its general idea too much.  Instead
I'd like us to move more of the cost model detail to the target, giving
it a chance to look at the whole loop before deciding on a cost.  ISTR
posting the overall idea at some point, but let me repeat it here instead
of trying to find that e-mail.

The basic interface of the cost model should be, in targetm.vectorize

  /* Tell the target to start cost analysis of a loop or a basic-block
 (if the loop argument is NULL).  Returns an opaque pointer to
 target-private data.  */
  void *init_cost (struct loop *loop);

  /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
  void add_stmt_cost (void *data, unsigned n,
  vectorized-stmt-kind,
  enum machine_mode vector_mode);

  /* Tell the target to compute and return the cost of the accumulated
 statements and free any target-private data.  */
  unsigned finish_cost (void *data);

with eventually slightly different signatures for add_stmt_cost
(like pass in the original scalar stmt?).

It allows the target, at finish_cost time, to evaluate things like
register pressure and resource utilization.

Thanks,
Richard.

> Thanks,
> Bill
> 
> 
> 2012-06-08  Bill Schmidt  
> 
>   * doc/tm.texi.in: Add vectorization density hooks.
>   * doc/tm.texi: Regenerate.
>   * targhooks.c (default_density_pct_threshold): New.
>   (default_density_size_threshold): New.
>   (default_density_penalty): New.
>   * targhooks.h: New decls for new targhooks.c functions.
>   * target.def (density_pct_threshold): New DEF_HOOK.
>   (density_size_threshold): Likewise.
>   (density_penalty): Likewise.
>   * tree-vect-loop.c (accum_stmt_cost): New.
>   (vect_estimate_min_profitable_iters): Perform density test.
>   * config/rs6000/rs6000.c (TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD):
>   New macro definition.
>   (TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD): Likewise.
>   (TARGET_VECTORIZE_DENSITY_PENALTY): Likewise.
>   (rs6000_builtin_vectorization_cost): Reduce costs of vec_perm and
>   vec_promote_demote to correct values.
>   (rs6000_density_pct_threshold): New.
>   (rs6000_density_size_threshold): New.
>   (rs6000_density_penalty): New.
> 
> 
> Index: gcc/doc/tm.texi
> ==

[PATCH] Add vector cost model density heuristic

2012-06-08 Thread William J. Schmidt
This patch adds a heuristic to the vectorizer when estimating the
minimum profitable number of iterations.  The heuristic is
target-dependent, and is currently disabled for all targets except
PowerPC.  However, the intent is to make it general enough to be useful
for other targets that want to opt in.

A previous patch addressed some PowerPC SPEC degradations by modifying
the vector cost model values for vec_perm and vec_promote_demote.  The
values were set a little higher than their natural values because the
natural values were not sufficient to prevent a poor vectorization
choice.  However, this is not the right long-term solution, since it can
unnecessarily constrain other vectorization choices involving permute
instructions.

Analysis of the badly vectorized loop (in sphinx3) showed that the
problem was overcommitment of vector resources -- too many vector
instructions issued without enough non-vector instructions available to
cover the delays.  The vector cost model assumes that instructions
always have a constant cost, and doesn't have a way of judging this kind
of "density" of vector instructions.

The present patch adds a heuristic to recognize when a loop is likely to
overcommit resources, and adds a small penalty to the inside-loop cost
to account for the expected stalls.  The heuristic is parameterized with
three target-specific values:

 * Density threshold: The heuristic will apply only when the
   percentage of inside-loop cost attributable to vectorized
   instructions exceeds this value.

 * Size threshold: The heuristic will apply only when the
   inside-loop cost exceeds this value.

 * Penalty: The inside-loop cost will be increased by this
   percentage value when the heuristic applies.

Thus only reasonably large loop bodies that are mostly vectorized
instructions will be affected.

By applying only a small percentage bump to the inside-loop cost, the
heuristic will only turn off vectorization for loops that were
considered "barely profitable" to begin with (such as the sphinx3 loop).
So the heuristic is quite conservative and should not affect the vast
majority of vectorization decisions.

Together with the new heuristic, this patch reduces the vec_perm and
vec_promote_demote costs for PowerPC to their natural values.

I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu
and verified that no performance regressions occur on SPEC cpu2006.  Is
this ok for trunk?

Thanks,
Bill


2012-06-08  Bill Schmidt  

* doc/tm.texi.in: Add vectorization density hooks.
* doc/tm.texi: Regenerate.
* targhooks.c (default_density_pct_threshold): New.
(default_density_size_threshold): New.
(default_density_penalty): New.
* targhooks.h: New decls for new targhooks.c functions.
* target.def (density_pct_threshold): New DEF_HOOK.
(density_size_threshold): Likewise.
(density_penalty): Likewise.
* tree-vect-loop.c (accum_stmt_cost): New.
(vect_estimate_min_profitable_iters): Perform density test.
* config/rs6000/rs6000.c (TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD):
New macro definition.
(TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD): Likewise.
(TARGET_VECTORIZE_DENSITY_PENALTY): Likewise.
(rs6000_builtin_vectorization_cost): Reduce costs of vec_perm and
vec_promote_demote to correct values.
(rs6000_density_pct_threshold): New.
(rs6000_density_size_threshold): New.
(rs6000_density_penalty): New.


Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi (revision 188305)
+++ gcc/doc/tm.texi (working copy)
@@ -5798,6 +5798,27 @@ The default is @code{NULL_TREE} which means to not
 loads.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD (void)
+This hook should return the maximum density, expressed in percent, for
+which autovectorization of loops with large bodies should be constrained.
+See also @code{TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD}.  The default
+is to return 100, which disables the density test.
+@end deftypefn
+
+@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD (void)
+This hook should return the minimum estimated size of a vectorized
+loop body for which the density test should apply.  See also
+@code{TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD}.  The default is set
+to the unreasonable value of 100, which effectively disables 
+the density test.
+@end deftypefn
+
+@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_PENALTY (void)
+This hook should return the penalty, expressed in percent, to be applied
+to the inside-of-loop vectorization costs for a loop failing the density
+test.  The default is 10.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
Index: gcc/doc/tm.texi.in
===
--- gcc/doc/tm.te