Re: [PATCH] Add vector cost model density heuristic
On Tue, 2012-06-19 at 16:20 +0200, Richard Guenther wrote: > On Tue, 19 Jun 2012, William J. Schmidt wrote: > > > On Tue, 2012-06-19 at 14:48 +0200, Richard Guenther wrote: > > > On Tue, 19 Jun 2012, William J. Schmidt wrote: > > > > > > > I remember having this discussion, and I was looking for it to check on > > > > the details, but I can't seem to find it either in my inbox or in the > > > > archives. Can you please point me to that again? Sorry for the bother. > > > > > > It was in the "Correct cost model for strided loads" thread. > > > > Ah, right, thanks. I think it will be best to make that a separate > > patch in the series. Like so: > > > > (1) Add calls to the new interface without disturbing existing logic; > > modify the profitability algorithms to query the new model for inside > > costs. Default algorithm for the model is to just sum costs as is done > > today. Just FYI, this is not quite as straightforward as I thought. There is some code in tree-vect-data-refs.c that computes costs for various peeling options and picks one of them. In most other places we can just pass the instructions to the back end at the same place that the costs are currently calculated, but not here. This will require some more major surgery to save the instructions needed from each peeling option and only pass along the ones that end up being chosen. The upside is the same sort of "delayed emit" is needed for the SLP ordering problem, so the infrastructure for this will be reusable for that problem. Grumble. Bill > > (1a) Split up the cost hooks (one for loads/stores with misalign parm, > > one for vector_stmt with tree_code, etc.). > > (x) Add heuristics to target models as desired. > > (2) Handle the SLP ordering problem. > > (3) Handle outside costs in the target model. > > (4) Remove the now unnecessary cost fields and the calls that set them. > > > > I'll start work on this series of patches as I have time between other > > projects. > > Thanks! > Richard. >
Re: [PATCH] Add vector cost model density heuristic
On Tue, 19 Jun 2012, William J. Schmidt wrote: > On Tue, 2012-06-19 at 14:48 +0200, Richard Guenther wrote: > > On Tue, 19 Jun 2012, William J. Schmidt wrote: > > > > > I remember having this discussion, and I was looking for it to check on > > > the details, but I can't seem to find it either in my inbox or in the > > > archives. Can you please point me to that again? Sorry for the bother. > > > > It was in the "Correct cost model for strided loads" thread. > > Ah, right, thanks. I think it will be best to make that a separate > patch in the series. Like so: > > (1) Add calls to the new interface without disturbing existing logic; > modify the profitability algorithms to query the new model for inside > costs. Default algorithm for the model is to just sum costs as is done > today. > (1a) Split up the cost hooks (one for loads/stores with misalign parm, > one for vector_stmt with tree_code, etc.). > (x) Add heuristics to target models as desired. > (2) Handle the SLP ordering problem. > (3) Handle outside costs in the target model. > (4) Remove the now unnecessary cost fields and the calls that set them. > > I'll start work on this series of patches as I have time between other > projects. Thanks! Richard.
Re: [PATCH] Add vector cost model density heuristic
On Tue, 2012-06-19 at 14:48 +0200, Richard Guenther wrote: > On Tue, 19 Jun 2012, William J. Schmidt wrote: > > > I remember having this discussion, and I was looking for it to check on > > the details, but I can't seem to find it either in my inbox or in the > > archives. Can you please point me to that again? Sorry for the bother. > > It was in the "Correct cost model for strided loads" thread. Ah, right, thanks. I think it will be best to make that a separate patch in the series. Like so: (1) Add calls to the new interface without disturbing existing logic; modify the profitability algorithms to query the new model for inside costs. Default algorithm for the model is to just sum costs as is done today. (1a) Split up the cost hooks (one for loads/stores with misalign parm, one for vector_stmt with tree_code, etc.). (x) Add heuristics to target models as desired. (2) Handle the SLP ordering problem. (3) Handle outside costs in the target model. (4) Remove the now unnecessary cost fields and the calls that set them. I'll start work on this series of patches as I have time between other projects. Thanks, Bill > > Richard. >
Re: [PATCH] Add vector cost model density heuristic
On Tue, 19 Jun 2012, William J. Schmidt wrote: > On Tue, 2012-06-19 at 12:10 +0200, Richard Guenther wrote: > > On Mon, 18 Jun 2012, William J. Schmidt wrote: > > > > > On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote: > > > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > > > > > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > > > > > > > > > > > > > > > > > Hmm. I don't like this patch or its general idea too much. Instead > > > > > I'd like us to move more of the cost model detail to the target, > > > > > giving > > > > > it a chance to look at the whole loop before deciding on a cost. ISTR > > > > > posting the overall idea at some point, but let me repeat it here > > > > > instead > > > > > of trying to find that e-mail. > > > > > > > > > > The basic interface of the cost model should be, in targetm.vectorize > > > > > > > > > > /* Tell the target to start cost analysis of a loop or a basic-block > > > > > (if the loop argument is NULL). Returns an opaque pointer to > > > > > target-private data. */ > > > > > void *init_cost (struct loop *loop); > > > > > > > > > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. > > > > > */ > > > > > void add_stmt_cost (void *data, unsigned n, > > > > > vectorized-stmt-kind, > > > > > enum machine_mode vector_mode); > > > > > > > > > > /* Tell the target to compute and return the cost of the accumulated > > > > > statements and free any target-private data. */ > > > > > unsigned finish_cost (void *data); > > > > > > By the way, I don't see much point in passing the void *data around > > > here. Too many levels of interfaces that we'd have to pass it around in > > > the vectorizer, so it would just sit in a static variable. Might as > > > well let the data be wholly private to the target. > > > > Ok, so you'd have void init_cost (struct loop *) and > > unsigned finish_cost (void); then? Static variables are of couse > > not properly "abstracted" so we can't ever compute two set of costs > > at the same time ... but that's true all-over-the-place in GCC ... > > It's a fair point, and perhaps I'll decide to pass the data pointer > around anyway to keep that option open. We'll see which looks uglier. > > > > > With previous discussion the add_stmt_cost hook would be split up > > to also allow passing the operation code for example. > > I remember having this discussion, and I was looking for it to check on > the details, but I can't seem to find it either in my inbox or in the > archives. Can you please point me to that again? Sorry for the bother. It was in the "Correct cost model for strided loads" thread. Richard.
Re: [PATCH] Add vector cost model density heuristic
On Tue, 2012-06-19 at 12:10 +0200, Richard Guenther wrote: > On Mon, 18 Jun 2012, William J. Schmidt wrote: > > > On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote: > > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > > > > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > > > > > > > > > > > > > Hmm. I don't like this patch or its general idea too much. Instead > > > > I'd like us to move more of the cost model detail to the target, giving > > > > it a chance to look at the whole loop before deciding on a cost. ISTR > > > > posting the overall idea at some point, but let me repeat it here > > > > instead > > > > of trying to find that e-mail. > > > > > > > > The basic interface of the cost model should be, in targetm.vectorize > > > > > > > > /* Tell the target to start cost analysis of a loop or a basic-block > > > > (if the loop argument is NULL). Returns an opaque pointer to > > > > target-private data. */ > > > > void *init_cost (struct loop *loop); > > > > > > > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > > > > void add_stmt_cost (void *data, unsigned n, > > > > vectorized-stmt-kind, > > > > enum machine_mode vector_mode); > > > > > > > > /* Tell the target to compute and return the cost of the accumulated > > > > statements and free any target-private data. */ > > > > unsigned finish_cost (void *data); > > > > By the way, I don't see much point in passing the void *data around > > here. Too many levels of interfaces that we'd have to pass it around in > > the vectorizer, so it would just sit in a static variable. Might as > > well let the data be wholly private to the target. > > Ok, so you'd have void init_cost (struct loop *) and > unsigned finish_cost (void); then? Static variables are of couse > not properly "abstracted" so we can't ever compute two set of costs > at the same time ... but that's true all-over-the-place in GCC ... It's a fair point, and perhaps I'll decide to pass the data pointer around anyway to keep that option open. We'll see which looks uglier. > > With previous discussion the add_stmt_cost hook would be split up > to also allow passing the operation code for example. I remember having this discussion, and I was looking for it to check on the details, but I can't seem to find it either in my inbox or in the archives. Can you please point me to that again? Sorry for the bother. Thanks, Bill > > Richard. >
Re: [PATCH] Add vector cost model density heuristic
On Tue, 19 Jun 2012, William J. Schmidt wrote: > On Tue, 2012-06-19 at 12:08 +0200, Richard Guenther wrote: > > On Mon, 18 Jun 2012, William J. Schmidt wrote: > > > > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > > > > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > > > > > > > > > > > > > Hmm. I don't like this patch or its general idea too much. Instead > > > > I'd like us to move more of the cost model detail to the target, giving > > > > it a chance to look at the whole loop before deciding on a cost. ISTR > > > > posting the overall idea at some point, but let me repeat it here > > > > instead > > > > of trying to find that e-mail. > > > > > > > > The basic interface of the cost model should be, in targetm.vectorize > > > > > > > > /* Tell the target to start cost analysis of a loop or a basic-block > > > > (if the loop argument is NULL). Returns an opaque pointer to > > > > target-private data. */ > > > > void *init_cost (struct loop *loop); > > > > > > > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > > > > void add_stmt_cost (void *data, unsigned n, > > > > vectorized-stmt-kind, > > > > enum machine_mode vector_mode); > > > > > > > > /* Tell the target to compute and return the cost of the accumulated > > > > statements and free any target-private data. */ > > > > unsigned finish_cost (void *data); > > > > > > > > with eventually slightly different signatures for add_stmt_cost > > > > (like pass in the original scalar stmt?). > > > > > > > > It allows the target, at finish_cost time, to evaluate things like > > > > register pressure and resource utilization. > > > > > > > > Thanks, > > > > Richard. > > > > > > I've been looking at this in between other projects. I wanted to be > > > sure I understood the SLP infrastructure and whether it would cause any > > > problems. It looks to me like it will be mostly ok. One issue I > > > noticed is a possible difference in the order in which SLP instructions > > > are analyzed and the order in which the instructions are "issued" during > > > transformation. > > > > > > For both loop analysis and basic block analysis, SLP trees are > > > constructed and analyzed prior to examining other vectorizable > > > instructions. Their costs are calculated and stored in the SLP trees at > > > this time. Later, when transforming statements to their vector > > > equivalents, instructions in the block (or loop body) are processed in > > > order until the first instruction that's part of an SLP tree is > > > encountered. At that point, every instruction that's part of any SLP > > > tree is transformed; then the vectorizer continues with the remaining > > > non-SLP vectorizable statements. > > > > > > So if we do the natural and easy thing of placing calls to add_stmt_cost > > > everywhere that costs are calculated today, the order that those costs > > > are presented to the back end model will possibly be different than the > > > order they are actually "emitted." > > > > Interesting. But I suppose this is similar to how pattern statements > > are handled? Thus, the whole pattern sequence is processed when > > we encounter the "main" pattern statement? > > Yes, but the difference is that both vect_analyze_stmt and > vect_transform_loop handle the pattern statements in the same order > (thankfully -- I would hate to have to deal with the pattern mess). > With SLP, all SLP statements are analyzed ahead of time, but they aren't > transformed until one of them is encountered in the statement walk. Ah, ok. I suppose we can simply declare that when we register vectorized stmts with the backend they are in arbitrary oder. After all this is not supposed to be another machine dependent reorg phase (to quote David). > > > > > For a first cut at this, I suggest ignoring the problem other than to > > > document it as an opportunity for improvement. Later we could improve > > > it by using an add_stmt_slp_cost () interface (or adding an is_slp > > > flag), and another interface to be called at the time during analysis > > > when the SLP statements will be issued during transformation. This > > > would allow the back end model to queue up the SLP costs in a separate > > > vector and later place them in its internal structures at the > > > appropriate place. > > > > > > It should eventually be possible to remove these fields/accessors: > > > > > > * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST > > > * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST > > > * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST > > > > > > However, I think this should be delayed until we have the basic > > > infrastructure in place for the new model and well-tested. > > > > Indeed. > > > > > The other issue is that we should have the model track both the inside > > > and outside costs if we're going to get everything into the target > > > model. For a first pass we can ignore this
Re: [PATCH] Add vector cost model density heuristic
On Tue, 2012-06-19 at 12:08 +0200, Richard Guenther wrote: > On Mon, 18 Jun 2012, William J. Schmidt wrote: > > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > > > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > > > > > > > > > Hmm. I don't like this patch or its general idea too much. Instead > > > I'd like us to move more of the cost model detail to the target, giving > > > it a chance to look at the whole loop before deciding on a cost. ISTR > > > posting the overall idea at some point, but let me repeat it here instead > > > of trying to find that e-mail. > > > > > > The basic interface of the cost model should be, in targetm.vectorize > > > > > > /* Tell the target to start cost analysis of a loop or a basic-block > > > (if the loop argument is NULL). Returns an opaque pointer to > > > target-private data. */ > > > void *init_cost (struct loop *loop); > > > > > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > > > void add_stmt_cost (void *data, unsigned n, > > > vectorized-stmt-kind, > > > enum machine_mode vector_mode); > > > > > > /* Tell the target to compute and return the cost of the accumulated > > > statements and free any target-private data. */ > > > unsigned finish_cost (void *data); > > > > > > with eventually slightly different signatures for add_stmt_cost > > > (like pass in the original scalar stmt?). > > > > > > It allows the target, at finish_cost time, to evaluate things like > > > register pressure and resource utilization. > > > > > > Thanks, > > > Richard. > > > > I've been looking at this in between other projects. I wanted to be > > sure I understood the SLP infrastructure and whether it would cause any > > problems. It looks to me like it will be mostly ok. One issue I > > noticed is a possible difference in the order in which SLP instructions > > are analyzed and the order in which the instructions are "issued" during > > transformation. > > > > For both loop analysis and basic block analysis, SLP trees are > > constructed and analyzed prior to examining other vectorizable > > instructions. Their costs are calculated and stored in the SLP trees at > > this time. Later, when transforming statements to their vector > > equivalents, instructions in the block (or loop body) are processed in > > order until the first instruction that's part of an SLP tree is > > encountered. At that point, every instruction that's part of any SLP > > tree is transformed; then the vectorizer continues with the remaining > > non-SLP vectorizable statements. > > > > So if we do the natural and easy thing of placing calls to add_stmt_cost > > everywhere that costs are calculated today, the order that those costs > > are presented to the back end model will possibly be different than the > > order they are actually "emitted." > > Interesting. But I suppose this is similar to how pattern statements > are handled? Thus, the whole pattern sequence is processed when > we encounter the "main" pattern statement? Yes, but the difference is that both vect_analyze_stmt and vect_transform_loop handle the pattern statements in the same order (thankfully -- I would hate to have to deal with the pattern mess). With SLP, all SLP statements are analyzed ahead of time, but they aren't transformed until one of them is encountered in the statement walk. > > > For a first cut at this, I suggest ignoring the problem other than to > > document it as an opportunity for improvement. Later we could improve > > it by using an add_stmt_slp_cost () interface (or adding an is_slp > > flag), and another interface to be called at the time during analysis > > when the SLP statements will be issued during transformation. This > > would allow the back end model to queue up the SLP costs in a separate > > vector and later place them in its internal structures at the > > appropriate place. > > > > It should eventually be possible to remove these fields/accessors: > > > > * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST > > * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST > > * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST > > > > However, I think this should be delayed until we have the basic > > infrastructure in place for the new model and well-tested. > > Indeed. > > > The other issue is that we should have the model track both the inside > > and outside costs if we're going to get everything into the target > > model. For a first pass we can ignore this and keep the existing logic > > for the outside costs. Later we should add some interfaces analogous to > > add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so > > the model can track this stuff as carefully as it wants to. > > Outside costs are merely added to the niter * inner-cost metric to > be compared with the scalar cost niter * scalar-cost, right? Thus > they would be tracked completely separate - eventually similar to > how we compute
Re: [PATCH] Add vector cost model density heuristic
On Mon, 18 Jun 2012, William J. Schmidt wrote: > On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote: > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > > > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > > > > > > > > > Hmm. I don't like this patch or its general idea too much. Instead > > > I'd like us to move more of the cost model detail to the target, giving > > > it a chance to look at the whole loop before deciding on a cost. ISTR > > > posting the overall idea at some point, but let me repeat it here instead > > > of trying to find that e-mail. > > > > > > The basic interface of the cost model should be, in targetm.vectorize > > > > > > /* Tell the target to start cost analysis of a loop or a basic-block > > > (if the loop argument is NULL). Returns an opaque pointer to > > > target-private data. */ > > > void *init_cost (struct loop *loop); > > > > > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > > > void add_stmt_cost (void *data, unsigned n, > > > vectorized-stmt-kind, > > > enum machine_mode vector_mode); > > > > > > /* Tell the target to compute and return the cost of the accumulated > > > statements and free any target-private data. */ > > > unsigned finish_cost (void *data); > > By the way, I don't see much point in passing the void *data around > here. Too many levels of interfaces that we'd have to pass it around in > the vectorizer, so it would just sit in a static variable. Might as > well let the data be wholly private to the target. Ok, so you'd have void init_cost (struct loop *) and unsigned finish_cost (void); then? Static variables are of couse not properly "abstracted" so we can't ever compute two set of costs at the same time ... but that's true all-over-the-place in GCC ... With previous discussion the add_stmt_cost hook would be split up to also allow passing the operation code for example. Richard.
Re: [PATCH] Add vector cost model density heuristic
On Mon, 18 Jun 2012, William J. Schmidt wrote: > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > > > > > Hmm. I don't like this patch or its general idea too much. Instead > > I'd like us to move more of the cost model detail to the target, giving > > it a chance to look at the whole loop before deciding on a cost. ISTR > > posting the overall idea at some point, but let me repeat it here instead > > of trying to find that e-mail. > > > > The basic interface of the cost model should be, in targetm.vectorize > > > > /* Tell the target to start cost analysis of a loop or a basic-block > > (if the loop argument is NULL). Returns an opaque pointer to > > target-private data. */ > > void *init_cost (struct loop *loop); > > > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > > void add_stmt_cost (void *data, unsigned n, > > vectorized-stmt-kind, > > enum machine_mode vector_mode); > > > > /* Tell the target to compute and return the cost of the accumulated > > statements and free any target-private data. */ > > unsigned finish_cost (void *data); > > > > with eventually slightly different signatures for add_stmt_cost > > (like pass in the original scalar stmt?). > > > > It allows the target, at finish_cost time, to evaluate things like > > register pressure and resource utilization. > > > > Thanks, > > Richard. > > I've been looking at this in between other projects. I wanted to be > sure I understood the SLP infrastructure and whether it would cause any > problems. It looks to me like it will be mostly ok. One issue I > noticed is a possible difference in the order in which SLP instructions > are analyzed and the order in which the instructions are "issued" during > transformation. > > For both loop analysis and basic block analysis, SLP trees are > constructed and analyzed prior to examining other vectorizable > instructions. Their costs are calculated and stored in the SLP trees at > this time. Later, when transforming statements to their vector > equivalents, instructions in the block (or loop body) are processed in > order until the first instruction that's part of an SLP tree is > encountered. At that point, every instruction that's part of any SLP > tree is transformed; then the vectorizer continues with the remaining > non-SLP vectorizable statements. > > So if we do the natural and easy thing of placing calls to add_stmt_cost > everywhere that costs are calculated today, the order that those costs > are presented to the back end model will possibly be different than the > order they are actually "emitted." Interesting. But I suppose this is similar to how pattern statements are handled? Thus, the whole pattern sequence is processed when we encounter the "main" pattern statement? > For a first cut at this, I suggest ignoring the problem other than to > document it as an opportunity for improvement. Later we could improve > it by using an add_stmt_slp_cost () interface (or adding an is_slp > flag), and another interface to be called at the time during analysis > when the SLP statements will be issued during transformation. This > would allow the back end model to queue up the SLP costs in a separate > vector and later place them in its internal structures at the > appropriate place. > > It should eventually be possible to remove these fields/accessors: > > * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST > * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST > * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST > > However, I think this should be delayed until we have the basic > infrastructure in place for the new model and well-tested. Indeed. > The other issue is that we should have the model track both the inside > and outside costs if we're going to get everything into the target > model. For a first pass we can ignore this and keep the existing logic > for the outside costs. Later we should add some interfaces analogous to > add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so > the model can track this stuff as carefully as it wants to. Outside costs are merely added to the niter * inner-cost metric to be compared with the scalar cost niter * scalar-cost, right? Thus they would be tracked completely separate - eventually similar to how we compute the cost of the scalar loop. > So, I'd propose going at this in several phases: > > (1) Add calls to the new interface without disturbing existing logic; > modify the profitability algorithms to query the new model for inside > costs. Default algorithm for the model is to just sum costs as is done > today. Right. > (x) Add heuristics to target models as desired. > (2) Handle the SLP ordering problem. > (3) Handle outside costs in the target model. > (4) Remove the now unnecessary cost fields and the calls that set them. > > Item (x) can happen anytime after item (1). > >
Re: [PATCH] Add vector cost model density heuristic
On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote: > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > > > > > Hmm. I don't like this patch or its general idea too much. Instead > > I'd like us to move more of the cost model detail to the target, giving > > it a chance to look at the whole loop before deciding on a cost. ISTR > > posting the overall idea at some point, but let me repeat it here instead > > of trying to find that e-mail. > > > > The basic interface of the cost model should be, in targetm.vectorize > > > > /* Tell the target to start cost analysis of a loop or a basic-block > > (if the loop argument is NULL). Returns an opaque pointer to > > target-private data. */ > > void *init_cost (struct loop *loop); > > > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > > void add_stmt_cost (void *data, unsigned n, > > vectorized-stmt-kind, > > enum machine_mode vector_mode); > > > > /* Tell the target to compute and return the cost of the accumulated > > statements and free any target-private data. */ > > unsigned finish_cost (void *data); By the way, I don't see much point in passing the void *data around here. Too many levels of interfaces that we'd have to pass it around in the vectorizer, so it would just sit in a static variable. Might as well let the data be wholly private to the target. > > > > with eventually slightly different signatures for add_stmt_cost > > (like pass in the original scalar stmt?). > > > > It allows the target, at finish_cost time, to evaluate things like > > register pressure and resource utilization. > > > > Thanks, > > Richard.
Re: [PATCH] Add vector cost model density heuristic
On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > Hmm. I don't like this patch or its general idea too much. Instead > I'd like us to move more of the cost model detail to the target, giving > it a chance to look at the whole loop before deciding on a cost. ISTR > posting the overall idea at some point, but let me repeat it here instead > of trying to find that e-mail. > > The basic interface of the cost model should be, in targetm.vectorize > > /* Tell the target to start cost analysis of a loop or a basic-block > (if the loop argument is NULL). Returns an opaque pointer to > target-private data. */ > void *init_cost (struct loop *loop); > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > void add_stmt_cost (void *data, unsigned n, > vectorized-stmt-kind, > enum machine_mode vector_mode); > > /* Tell the target to compute and return the cost of the accumulated > statements and free any target-private data. */ > unsigned finish_cost (void *data); > > with eventually slightly different signatures for add_stmt_cost > (like pass in the original scalar stmt?). > > It allows the target, at finish_cost time, to evaluate things like > register pressure and resource utilization. > > Thanks, > Richard. I've been looking at this in between other projects. I wanted to be sure I understood the SLP infrastructure and whether it would cause any problems. It looks to me like it will be mostly ok. One issue I noticed is a possible difference in the order in which SLP instructions are analyzed and the order in which the instructions are "issued" during transformation. For both loop analysis and basic block analysis, SLP trees are constructed and analyzed prior to examining other vectorizable instructions. Their costs are calculated and stored in the SLP trees at this time. Later, when transforming statements to their vector equivalents, instructions in the block (or loop body) are processed in order until the first instruction that's part of an SLP tree is encountered. At that point, every instruction that's part of any SLP tree is transformed; then the vectorizer continues with the remaining non-SLP vectorizable statements. So if we do the natural and easy thing of placing calls to add_stmt_cost everywhere that costs are calculated today, the order that those costs are presented to the back end model will possibly be different than the order they are actually "emitted." For a first cut at this, I suggest ignoring the problem other than to document it as an opportunity for improvement. Later we could improve it by using an add_stmt_slp_cost () interface (or adding an is_slp flag), and another interface to be called at the time during analysis when the SLP statements will be issued during transformation. This would allow the back end model to queue up the SLP costs in a separate vector and later place them in its internal structures at the appropriate place. It should eventually be possible to remove these fields/accessors: * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST However, I think this should be delayed until we have the basic infrastructure in place for the new model and well-tested. The other issue is that we should have the model track both the inside and outside costs if we're going to get everything into the target model. For a first pass we can ignore this and keep the existing logic for the outside costs. Later we should add some interfaces analogous to add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so the model can track this stuff as carefully as it wants to. So, I'd propose going at this in several phases: (1) Add calls to the new interface without disturbing existing logic; modify the profitability algorithms to query the new model for inside costs. Default algorithm for the model is to just sum costs as is done today. (x) Add heuristics to target models as desired. (2) Handle the SLP ordering problem. (3) Handle outside costs in the target model. (4) Remove the now unnecessary cost fields and the calls that set them. Item (x) can happen anytime after item (1). I don't think this work is terribly difficult, just a bit tedious. The only really time-consuming aspect of it will be in very careful testing to keep from changing existing behavior. All comments welcome -- please let me know what you think. Thanks, Bill
Re: [PATCH] Add vector cost model density heuristic
On Mon, Jun 11, 2012 at 5:38 PM, William J. Schmidt wrote: > > > On Mon, 2012-06-11 at 11:09 -0400, David Edelsohn wrote: >> On Mon, Jun 11, 2012 at 10:55 AM, Richard Guenther wrote: >> >> > Well, they are at least magic numbers and heuristics that apply >> > generally and not only to the single issue in sphinx. And in >> > fact how it works for sphinx _is_ magic. >> > >> >> Second, I suggest that you need to rephrase "I can make you" and >> >> re-send your reply. >> > >> > Sorry for my bad english. Consider it meaning that I'd rather have >> > you think about a more proper solution. That's what patch review >> > is about after all, no? Sometimes a complete re-write (which gets >> > more difficult which each of the patches "enhancing" the not ideal >> > current state) is the best thing to do. >> >> Richard, >> >> The values of the heuristics may be "magic", but Bill believes the >> heuristics are testing the important characteristics. The heuristics >> themselves are controlled by hooks, so the target can set the correct >> values for their own requirements. >> >> The concern is that a general cost infrastructure is too general. >> And, based on history, all ports simply will copy the boilerplate from >> the first implementation. It also may cause more problems because the >> target has relatively little information to be able to judge >> heuristics at that point in the middle-end. If the targets start to >> get too "cute" or too complicated, it may cause more problems or more >> confusion about why more complicated heuristics are not effective and >> not producing the expected results. >> >> I worry about creating another machine dependent reorg catch-all pass. >> >> Maybe an incremental pre- and/or post- cost hook would be more >> effective. I will let Bill comment. > > Thanks David, > > I can see both sides of this, and it's hard to judge the future from > where I stand. My belief is that the number of heuristics targets will > implement will be fairly limited, since judgments about cycle-level > costs are not accurately predictable during the middle end. All we can > do is come up with a few things that seem to make sense. Doing too much > in the back end seems impractical. > > The interesting question to me is whether cost model heuristics are > general enough to be reusable. What I saw in this case was what I > considered to be a somewhat target-neutral problem: overwhelming those > assets of the processor that implement vectorization. It seemed > reasonable to provide hooks for others to use the idea if they encounter > similar issues. If reusing the heuristic is useful, then having to copy > the logic from one target to another isn't the best approach. If nobody > else will ever use it, then embedding it in the back end is reasonable. > Unfortunately my crystal ball has been on the fritz for several decades, > so I can't tell you for sure which is right... > > Richard, my biggest question is whether you think other targets are > likely to take advantage of a more general back-end interface, or > whether this will end up just being a PowerPC wart. If you know of ways > this will be useful for i386, that would be helpful to know. Perhaps > this requires your crystal ball as well; not sure how well yours > works... > > If we look at just this one issue in isolation, then changing all the > code in the vectorizer that calculates inside/outside loop costs and > moving it to targetm seems more invasive than adding the few hooks. But > if this will really be a useful feature for the community as a whole I > am certainly willing to tackle it. Well. At the moment the cost model is purely "local" in that the target receives no context about the cost it has to return for (part of) a vectorized operation. In fact the vectorizer already creates multiple instructions (and thus calls to the cost target hook) for a single scalar statement in some cases. There are two ways to increase the context the target can look at - invent more kinds of "statements" we pass to the cost model hook, and thus arrive at a point where for each group of stmts the vectorizer emits and that possibly has a more efficient target implementation we have a different statement kind (only for the cost model, we continue to generate multiple GIMPLE stmts). The other way is to give the target the ability to see context, thus relation of the (sub-)statements the vectorizer asks the target to compute costs for. I've run into the situation where there was missing context on x86, too (but I forgot the specific case), and that's where I came up with the general idea. Of course both work, but for your kind of heuristic only looking at context works. In the scheme I suggested the fini_cost hook would be able to do the counting (as would a simple counter - I don't believe that this would not work - you do counting, too, after all, simply make the fini hook able to reject the vectorization). If powerpc is issue-limited then you h
Re: [PATCH] Add vector cost model density heuristic
On Mon, 2012-06-11 at 11:09 -0400, David Edelsohn wrote: > On Mon, Jun 11, 2012 at 10:55 AM, Richard Guenther wrote: > > > Well, they are at least magic numbers and heuristics that apply > > generally and not only to the single issue in sphinx. And in > > fact how it works for sphinx _is_ magic. > > > >> Second, I suggest that you need to rephrase "I can make you" and > >> re-send your reply. > > > > Sorry for my bad english. Consider it meaning that I'd rather have > > you think about a more proper solution. That's what patch review > > is about after all, no? Sometimes a complete re-write (which gets > > more difficult which each of the patches "enhancing" the not ideal > > current state) is the best thing to do. > > Richard, > > The values of the heuristics may be "magic", but Bill believes the > heuristics are testing the important characteristics. The heuristics > themselves are controlled by hooks, so the target can set the correct > values for their own requirements. > > The concern is that a general cost infrastructure is too general. > And, based on history, all ports simply will copy the boilerplate from > the first implementation. It also may cause more problems because the > target has relatively little information to be able to judge > heuristics at that point in the middle-end. If the targets start to > get too "cute" or too complicated, it may cause more problems or more > confusion about why more complicated heuristics are not effective and > not producing the expected results. > > I worry about creating another machine dependent reorg catch-all pass. > > Maybe an incremental pre- and/or post- cost hook would be more > effective. I will let Bill comment. Thanks David, I can see both sides of this, and it's hard to judge the future from where I stand. My belief is that the number of heuristics targets will implement will be fairly limited, since judgments about cycle-level costs are not accurately predictable during the middle end. All we can do is come up with a few things that seem to make sense. Doing too much in the back end seems impractical. The interesting question to me is whether cost model heuristics are general enough to be reusable. What I saw in this case was what I considered to be a somewhat target-neutral problem: overwhelming those assets of the processor that implement vectorization. It seemed reasonable to provide hooks for others to use the idea if they encounter similar issues. If reusing the heuristic is useful, then having to copy the logic from one target to another isn't the best approach. If nobody else will ever use it, then embedding it in the back end is reasonable. Unfortunately my crystal ball has been on the fritz for several decades, so I can't tell you for sure which is right... Richard, my biggest question is whether you think other targets are likely to take advantage of a more general back-end interface, or whether this will end up just being a PowerPC wart. If you know of ways this will be useful for i386, that would be helpful to know. Perhaps this requires your crystal ball as well; not sure how well yours works... If we look at just this one issue in isolation, then changing all the code in the vectorizer that calculates inside/outside loop costs and moving it to targetm seems more invasive than adding the few hooks. But if this will really be a useful feature for the community as a whole I am certainly willing to tackle it. Thanks, Bill > > Thanks, David >
Re: [PATCH] Add vector cost model density heuristic
On Mon, 2012-06-11 at 16:58 +0200, Richard Guenther wrote: > On Mon, 11 Jun 2012, Richard Guenther wrote: > > > On Mon, 11 Jun 2012, William J. Schmidt wrote: > > > > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > > > > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > > > > > > > This patch adds a heuristic to the vectorizer when estimating the > > > > > minimum profitable number of iterations. The heuristic is > > > > > target-dependent, and is currently disabled for all targets except > > > > > PowerPC. However, the intent is to make it general enough to be > > > > > useful > > > > > for other targets that want to opt in. > > > > > > > > > > A previous patch addressed some PowerPC SPEC degradations by modifying > > > > > the vector cost model values for vec_perm and vec_promote_demote. The > > > > > values were set a little higher than their natural values because the > > > > > natural values were not sufficient to prevent a poor vectorization > > > > > choice. However, this is not the right long-term solution, since it > > > > > can > > > > > unnecessarily constrain other vectorization choices involving permute > > > > > instructions. > > > > > > > > > > Analysis of the badly vectorized loop (in sphinx3) showed that the > > > > > problem was overcommitment of vector resources -- too many vector > > > > > instructions issued without enough non-vector instructions available > > > > > to > > > > > cover the delays. The vector cost model assumes that instructions > > > > > always have a constant cost, and doesn't have a way of judging this > > > > > kind > > > > > of "density" of vector instructions. > > > > > > > > > > The present patch adds a heuristic to recognize when a loop is likely > > > > > to > > > > > overcommit resources, and adds a small penalty to the inside-loop cost > > > > > to account for the expected stalls. The heuristic is parameterized > > > > > with > > > > > three target-specific values: > > > > > > > > > > * Density threshold: The heuristic will apply only when the > > > > >percentage of inside-loop cost attributable to vectorized > > > > >instructions exceeds this value. > > > > > > > > > > * Size threshold: The heuristic will apply only when the > > > > >inside-loop cost exceeds this value. > > > > > > > > > > * Penalty: The inside-loop cost will be increased by this > > > > >percentage value when the heuristic applies. > > > > > > > > > > Thus only reasonably large loop bodies that are mostly vectorized > > > > > instructions will be affected. > > > > > > > > > > By applying only a small percentage bump to the inside-loop cost, the > > > > > heuristic will only turn off vectorization for loops that were > > > > > considered "barely profitable" to begin with (such as the sphinx3 > > > > > loop). > > > > > So the heuristic is quite conservative and should not affect the vast > > > > > majority of vectorization decisions. > > > > > > > > > > Together with the new heuristic, this patch reduces the vec_perm and > > > > > vec_promote_demote costs for PowerPC to their natural values. > > > > > > > > > > I've regstrapped this with no regressions on > > > > > powerpc64-unknown-linux-gnu > > > > > and verified that no performance regressions occur on SPEC cpu2006. > > > > > Is > > > > > this ok for trunk? > > > > > > > > Hmm. I don't like this patch or its general idea too much. Instead > > > > I'd like us to move more of the cost model detail to the target, giving > > > > it a chance to look at the whole loop before deciding on a cost. ISTR > > > > posting the overall idea at some point, but let me repeat it here > > > > instead > > > > of trying to find that e-mail. > > > > > > > > The basic interface of the cost model should be, in targetm.vectorize > > > > > > > > /* Tell the target to start cost analysis of a loop or a basic-block > > > > (if the loop argument is NULL). Returns an opaque pointer to > > > > target-private data. */ > > > > void *init_cost (struct loop *loop); > > > > > > > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > > > > void add_stmt_cost (void *data, unsigned n, > > > > vectorized-stmt-kind, > > > > enum machine_mode vector_mode); > > > > > > > > /* Tell the target to compute and return the cost of the accumulated > > > > statements and free any target-private data. */ > > > > unsigned finish_cost (void *data); > > > > > > > > with eventually slightly different signatures for add_stmt_cost > > > > (like pass in the original scalar stmt?). > > > > > > > > It allows the target, at finish_cost time, to evaluate things like > > > > register pressure and resource utilization. > > > > > > OK, I'm trying to understand how you would want this built into the > > > present structure. Taking just the loop case for now: > > > > > > Judging by your suggested API, we would have to call add_stmt_cost
Re: [PATCH] Add vector cost model density heuristic
On Mon, Jun 11, 2012 at 10:55 AM, Richard Guenther wrote: > Well, they are at least magic numbers and heuristics that apply > generally and not only to the single issue in sphinx. And in > fact how it works for sphinx _is_ magic. > >> Second, I suggest that you need to rephrase "I can make you" and >> re-send your reply. > > Sorry for my bad english. Consider it meaning that I'd rather have > you think about a more proper solution. That's what patch review > is about after all, no? Sometimes a complete re-write (which gets > more difficult which each of the patches "enhancing" the not ideal > current state) is the best thing to do. Richard, The values of the heuristics may be "magic", but Bill believes the heuristics are testing the important characteristics. The heuristics themselves are controlled by hooks, so the target can set the correct values for their own requirements. The concern is that a general cost infrastructure is too general. And, based on history, all ports simply will copy the boilerplate from the first implementation. It also may cause more problems because the target has relatively little information to be able to judge heuristics at that point in the middle-end. If the targets start to get too "cute" or too complicated, it may cause more problems or more confusion about why more complicated heuristics are not effective and not producing the expected results. I worry about creating another machine dependent reorg catch-all pass. Maybe an incremental pre- and/or post- cost hook would be more effective. I will let Bill comment. Thanks, David
Re: [PATCH] Add vector cost model density heuristic
On Mon, 11 Jun 2012, Richard Guenther wrote: > On Mon, 11 Jun 2012, William J. Schmidt wrote: > > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > > > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > > > > > This patch adds a heuristic to the vectorizer when estimating the > > > > minimum profitable number of iterations. The heuristic is > > > > target-dependent, and is currently disabled for all targets except > > > > PowerPC. However, the intent is to make it general enough to be useful > > > > for other targets that want to opt in. > > > > > > > > A previous patch addressed some PowerPC SPEC degradations by modifying > > > > the vector cost model values for vec_perm and vec_promote_demote. The > > > > values were set a little higher than their natural values because the > > > > natural values were not sufficient to prevent a poor vectorization > > > > choice. However, this is not the right long-term solution, since it can > > > > unnecessarily constrain other vectorization choices involving permute > > > > instructions. > > > > > > > > Analysis of the badly vectorized loop (in sphinx3) showed that the > > > > problem was overcommitment of vector resources -- too many vector > > > > instructions issued without enough non-vector instructions available to > > > > cover the delays. The vector cost model assumes that instructions > > > > always have a constant cost, and doesn't have a way of judging this kind > > > > of "density" of vector instructions. > > > > > > > > The present patch adds a heuristic to recognize when a loop is likely to > > > > overcommit resources, and adds a small penalty to the inside-loop cost > > > > to account for the expected stalls. The heuristic is parameterized with > > > > three target-specific values: > > > > > > > > * Density threshold: The heuristic will apply only when the > > > >percentage of inside-loop cost attributable to vectorized > > > >instructions exceeds this value. > > > > > > > > * Size threshold: The heuristic will apply only when the > > > >inside-loop cost exceeds this value. > > > > > > > > * Penalty: The inside-loop cost will be increased by this > > > >percentage value when the heuristic applies. > > > > > > > > Thus only reasonably large loop bodies that are mostly vectorized > > > > instructions will be affected. > > > > > > > > By applying only a small percentage bump to the inside-loop cost, the > > > > heuristic will only turn off vectorization for loops that were > > > > considered "barely profitable" to begin with (such as the sphinx3 loop). > > > > So the heuristic is quite conservative and should not affect the vast > > > > majority of vectorization decisions. > > > > > > > > Together with the new heuristic, this patch reduces the vec_perm and > > > > vec_promote_demote costs for PowerPC to their natural values. > > > > > > > > I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu > > > > and verified that no performance regressions occur on SPEC cpu2006. Is > > > > this ok for trunk? > > > > > > Hmm. I don't like this patch or its general idea too much. Instead > > > I'd like us to move more of the cost model detail to the target, giving > > > it a chance to look at the whole loop before deciding on a cost. ISTR > > > posting the overall idea at some point, but let me repeat it here instead > > > of trying to find that e-mail. > > > > > > The basic interface of the cost model should be, in targetm.vectorize > > > > > > /* Tell the target to start cost analysis of a loop or a basic-block > > > (if the loop argument is NULL). Returns an opaque pointer to > > > target-private data. */ > > > void *init_cost (struct loop *loop); > > > > > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > > > void add_stmt_cost (void *data, unsigned n, > > > vectorized-stmt-kind, > > > enum machine_mode vector_mode); > > > > > > /* Tell the target to compute and return the cost of the accumulated > > > statements and free any target-private data. */ > > > unsigned finish_cost (void *data); > > > > > > with eventually slightly different signatures for add_stmt_cost > > > (like pass in the original scalar stmt?). > > > > > > It allows the target, at finish_cost time, to evaluate things like > > > register pressure and resource utilization. > > > > OK, I'm trying to understand how you would want this built into the > > present structure. Taking just the loop case for now: > > > > Judging by your suggested API, we would have to call add_stmt_cost () > > everywhere that we now call stmt_vinfo_set_inside_of_loop_cost (). For > > now this would be an additional call, not a replacement, though maybe > > the other goes away eventually. This allows the target to save more > > data about the vectorized instructions than just an accumulated cost > > number (order and quantity of various kinds of instr
Re: [PATCH] Add vector cost model density heuristic
On Mon, 11 Jun 2012, David Edelsohn wrote: > On Mon, Jun 11, 2012 at 10:48 AM, Richard Guenther wrote: > > On Mon, 11 Jun 2012, David Edelsohn wrote: > > > >> On Mon, Jun 11, 2012 at 7:40 AM, Richard Guenther > >> wrote: > >> > >> > Hmm. I don't like this patch or its general idea too much. Instead > >> > I'd like us to move more of the cost model detail to the target, giving > >> > it a chance to look at the whole loop before deciding on a cost. ISTR > >> > posting the overall idea at some point, but let me repeat it here instead > >> > of trying to find that e-mail. > >> > >> Richard, > >> > >> Allowing the target to see the entire loop and have more control over > >> the cost model is a fine goal, but the current code is using incorrect > >> heuristics. Requiring a complete re-write of the auto-vectorizer cost > >> model should not be a requirement for fixing the current code. > > > > If fixing the current code adds more "magic" hooks at the wrong spot > > then yes, in stage1 I can make you think about a more proper solution > > to the issue. > > First, these are not magic hooks. Well, they are at least magic numbers and heuristics that apply generally and not only to the single issue in sphinx. And in fact how it works for sphinx _is_ magic. > Second, I suggest that you need to rephrase "I can make you" and > re-send your reply. Sorry for my bad english. Consider it meaning that I'd rather have you think about a more proper solution. That's what patch review is about after all, no? Sometimes a complete re-write (which gets more difficult which each of the patches "enhancing" the not ideal current state) is the best thing to do. Richard.
Re: [PATCH] Add vector cost model density heuristic
On Mon, Jun 11, 2012 at 10:48 AM, Richard Guenther wrote: > On Mon, 11 Jun 2012, David Edelsohn wrote: > >> On Mon, Jun 11, 2012 at 7:40 AM, Richard Guenther wrote: >> >> > Hmm. I don't like this patch or its general idea too much. Instead >> > I'd like us to move more of the cost model detail to the target, giving >> > it a chance to look at the whole loop before deciding on a cost. ISTR >> > posting the overall idea at some point, but let me repeat it here instead >> > of trying to find that e-mail. >> >> Richard, >> >> Allowing the target to see the entire loop and have more control over >> the cost model is a fine goal, but the current code is using incorrect >> heuristics. Requiring a complete re-write of the auto-vectorizer cost >> model should not be a requirement for fixing the current code. > > If fixing the current code adds more "magic" hooks at the wrong spot > then yes, in stage1 I can make you think about a more proper solution > to the issue. First, these are not magic hooks. Second, I suggest that you need to rephrase "I can make you" and re-send your reply. Thanks, David
Re: [PATCH] Add vector cost model density heuristic
On Mon, 11 Jun 2012, David Edelsohn wrote: > On Mon, Jun 11, 2012 at 7:40 AM, Richard Guenther wrote: > > > Hmm. I don't like this patch or its general idea too much. Instead > > I'd like us to move more of the cost model detail to the target, giving > > it a chance to look at the whole loop before deciding on a cost. ISTR > > posting the overall idea at some point, but let me repeat it here instead > > of trying to find that e-mail. > > Richard, > > Allowing the target to see the entire loop and have more control over > the cost model is a fine goal, but the current code is using incorrect > heuristics. Requiring a complete re-write of the auto-vectorizer cost > model should not be a requirement for fixing the current code. If fixing the current code adds more "magic" hooks at the wrong spot then yes, in stage1 I can make you think about a more proper solution to the issue. Richard.
Re: [PATCH] Add vector cost model density heuristic
On Mon, 11 Jun 2012, William J. Schmidt wrote: > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > > > This patch adds a heuristic to the vectorizer when estimating the > > > minimum profitable number of iterations. The heuristic is > > > target-dependent, and is currently disabled for all targets except > > > PowerPC. However, the intent is to make it general enough to be useful > > > for other targets that want to opt in. > > > > > > A previous patch addressed some PowerPC SPEC degradations by modifying > > > the vector cost model values for vec_perm and vec_promote_demote. The > > > values were set a little higher than their natural values because the > > > natural values were not sufficient to prevent a poor vectorization > > > choice. However, this is not the right long-term solution, since it can > > > unnecessarily constrain other vectorization choices involving permute > > > instructions. > > > > > > Analysis of the badly vectorized loop (in sphinx3) showed that the > > > problem was overcommitment of vector resources -- too many vector > > > instructions issued without enough non-vector instructions available to > > > cover the delays. The vector cost model assumes that instructions > > > always have a constant cost, and doesn't have a way of judging this kind > > > of "density" of vector instructions. > > > > > > The present patch adds a heuristic to recognize when a loop is likely to > > > overcommit resources, and adds a small penalty to the inside-loop cost > > > to account for the expected stalls. The heuristic is parameterized with > > > three target-specific values: > > > > > > * Density threshold: The heuristic will apply only when the > > >percentage of inside-loop cost attributable to vectorized > > >instructions exceeds this value. > > > > > > * Size threshold: The heuristic will apply only when the > > >inside-loop cost exceeds this value. > > > > > > * Penalty: The inside-loop cost will be increased by this > > >percentage value when the heuristic applies. > > > > > > Thus only reasonably large loop bodies that are mostly vectorized > > > instructions will be affected. > > > > > > By applying only a small percentage bump to the inside-loop cost, the > > > heuristic will only turn off vectorization for loops that were > > > considered "barely profitable" to begin with (such as the sphinx3 loop). > > > So the heuristic is quite conservative and should not affect the vast > > > majority of vectorization decisions. > > > > > > Together with the new heuristic, this patch reduces the vec_perm and > > > vec_promote_demote costs for PowerPC to their natural values. > > > > > > I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu > > > and verified that no performance regressions occur on SPEC cpu2006. Is > > > this ok for trunk? > > > > Hmm. I don't like this patch or its general idea too much. Instead > > I'd like us to move more of the cost model detail to the target, giving > > it a chance to look at the whole loop before deciding on a cost. ISTR > > posting the overall idea at some point, but let me repeat it here instead > > of trying to find that e-mail. > > > > The basic interface of the cost model should be, in targetm.vectorize > > > > /* Tell the target to start cost analysis of a loop or a basic-block > > (if the loop argument is NULL). Returns an opaque pointer to > > target-private data. */ > > void *init_cost (struct loop *loop); > > > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > > void add_stmt_cost (void *data, unsigned n, > > vectorized-stmt-kind, > > enum machine_mode vector_mode); > > > > /* Tell the target to compute and return the cost of the accumulated > > statements and free any target-private data. */ > > unsigned finish_cost (void *data); > > > > with eventually slightly different signatures for add_stmt_cost > > (like pass in the original scalar stmt?). > > > > It allows the target, at finish_cost time, to evaluate things like > > register pressure and resource utilization. > > OK, I'm trying to understand how you would want this built into the > present structure. Taking just the loop case for now: > > Judging by your suggested API, we would have to call add_stmt_cost () > everywhere that we now call stmt_vinfo_set_inside_of_loop_cost (). For > now this would be an additional call, not a replacement, though maybe > the other goes away eventually. This allows the target to save more > data about the vectorized instructions than just an accumulated cost > number (order and quantity of various kinds of instructions can be > maintained for better modeling). Presumably the call to finish_cost > would be done within vect_estimate_min_profitable_iters () to produce > the final value of inside_cost for the loop. Yes. I didn't look in det
Re: [PATCH] Add vector cost model density heuristic
On Mon, Jun 11, 2012 at 7:40 AM, Richard Guenther wrote: > Hmm. I don't like this patch or its general idea too much. Instead > I'd like us to move more of the cost model detail to the target, giving > it a chance to look at the whole loop before deciding on a cost. ISTR > posting the overall idea at some point, but let me repeat it here instead > of trying to find that e-mail. Richard, Allowing the target to see the entire loop and have more control over the cost model is a fine goal, but the current code is using incorrect heuristics. Requiring a complete re-write of the auto-vectorizer cost model should not be a requirement for fixing the current code. Thanks, David
Re: [PATCH] Add vector cost model density heuristic
On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > This patch adds a heuristic to the vectorizer when estimating the > > minimum profitable number of iterations. The heuristic is > > target-dependent, and is currently disabled for all targets except > > PowerPC. However, the intent is to make it general enough to be useful > > for other targets that want to opt in. > > > > A previous patch addressed some PowerPC SPEC degradations by modifying > > the vector cost model values for vec_perm and vec_promote_demote. The > > values were set a little higher than their natural values because the > > natural values were not sufficient to prevent a poor vectorization > > choice. However, this is not the right long-term solution, since it can > > unnecessarily constrain other vectorization choices involving permute > > instructions. > > > > Analysis of the badly vectorized loop (in sphinx3) showed that the > > problem was overcommitment of vector resources -- too many vector > > instructions issued without enough non-vector instructions available to > > cover the delays. The vector cost model assumes that instructions > > always have a constant cost, and doesn't have a way of judging this kind > > of "density" of vector instructions. > > > > The present patch adds a heuristic to recognize when a loop is likely to > > overcommit resources, and adds a small penalty to the inside-loop cost > > to account for the expected stalls. The heuristic is parameterized with > > three target-specific values: > > > > * Density threshold: The heuristic will apply only when the > >percentage of inside-loop cost attributable to vectorized > >instructions exceeds this value. > > > > * Size threshold: The heuristic will apply only when the > >inside-loop cost exceeds this value. > > > > * Penalty: The inside-loop cost will be increased by this > >percentage value when the heuristic applies. > > > > Thus only reasonably large loop bodies that are mostly vectorized > > instructions will be affected. > > > > By applying only a small percentage bump to the inside-loop cost, the > > heuristic will only turn off vectorization for loops that were > > considered "barely profitable" to begin with (such as the sphinx3 loop). > > So the heuristic is quite conservative and should not affect the vast > > majority of vectorization decisions. > > > > Together with the new heuristic, this patch reduces the vec_perm and > > vec_promote_demote costs for PowerPC to their natural values. > > > > I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu > > and verified that no performance regressions occur on SPEC cpu2006. Is > > this ok for trunk? > > Hmm. I don't like this patch or its general idea too much. Instead > I'd like us to move more of the cost model detail to the target, giving > it a chance to look at the whole loop before deciding on a cost. ISTR > posting the overall idea at some point, but let me repeat it here instead > of trying to find that e-mail. > > The basic interface of the cost model should be, in targetm.vectorize > > /* Tell the target to start cost analysis of a loop or a basic-block > (if the loop argument is NULL). Returns an opaque pointer to > target-private data. */ > void *init_cost (struct loop *loop); > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > void add_stmt_cost (void *data, unsigned n, > vectorized-stmt-kind, > enum machine_mode vector_mode); > > /* Tell the target to compute and return the cost of the accumulated > statements and free any target-private data. */ > unsigned finish_cost (void *data); > > with eventually slightly different signatures for add_stmt_cost > (like pass in the original scalar stmt?). > > It allows the target, at finish_cost time, to evaluate things like > register pressure and resource utilization. OK, I'm trying to understand how you would want this built into the present structure. Taking just the loop case for now: Judging by your suggested API, we would have to call add_stmt_cost () everywhere that we now call stmt_vinfo_set_inside_of_loop_cost (). For now this would be an additional call, not a replacement, though maybe the other goes away eventually. This allows the target to save more data about the vectorized instructions than just an accumulated cost number (order and quantity of various kinds of instructions can be maintained for better modeling). Presumably the call to finish_cost would be done within vect_estimate_min_profitable_iters () to produce the final value of inside_cost for the loop. The default target hook for add_stmt_cost would duplicate what we currently do for calculating the inside_cost of a statement, and the default target hook for finish_cost would just return the sum. I'll have to go hunting where the similar code would fit for SL
Re: [PATCH] Add vector cost model density heuristic
On Fri, 8 Jun 2012, William J. Schmidt wrote: > This patch adds a heuristic to the vectorizer when estimating the > minimum profitable number of iterations. The heuristic is > target-dependent, and is currently disabled for all targets except > PowerPC. However, the intent is to make it general enough to be useful > for other targets that want to opt in. > > A previous patch addressed some PowerPC SPEC degradations by modifying > the vector cost model values for vec_perm and vec_promote_demote. The > values were set a little higher than their natural values because the > natural values were not sufficient to prevent a poor vectorization > choice. However, this is not the right long-term solution, since it can > unnecessarily constrain other vectorization choices involving permute > instructions. > > Analysis of the badly vectorized loop (in sphinx3) showed that the > problem was overcommitment of vector resources -- too many vector > instructions issued without enough non-vector instructions available to > cover the delays. The vector cost model assumes that instructions > always have a constant cost, and doesn't have a way of judging this kind > of "density" of vector instructions. > > The present patch adds a heuristic to recognize when a loop is likely to > overcommit resources, and adds a small penalty to the inside-loop cost > to account for the expected stalls. The heuristic is parameterized with > three target-specific values: > > * Density threshold: The heuristic will apply only when the >percentage of inside-loop cost attributable to vectorized >instructions exceeds this value. > > * Size threshold: The heuristic will apply only when the >inside-loop cost exceeds this value. > > * Penalty: The inside-loop cost will be increased by this >percentage value when the heuristic applies. > > Thus only reasonably large loop bodies that are mostly vectorized > instructions will be affected. > > By applying only a small percentage bump to the inside-loop cost, the > heuristic will only turn off vectorization for loops that were > considered "barely profitable" to begin with (such as the sphinx3 loop). > So the heuristic is quite conservative and should not affect the vast > majority of vectorization decisions. > > Together with the new heuristic, this patch reduces the vec_perm and > vec_promote_demote costs for PowerPC to their natural values. > > I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu > and verified that no performance regressions occur on SPEC cpu2006. Is > this ok for trunk? Hmm. I don't like this patch or its general idea too much. Instead I'd like us to move more of the cost model detail to the target, giving it a chance to look at the whole loop before deciding on a cost. ISTR posting the overall idea at some point, but let me repeat it here instead of trying to find that e-mail. The basic interface of the cost model should be, in targetm.vectorize /* Tell the target to start cost analysis of a loop or a basic-block (if the loop argument is NULL). Returns an opaque pointer to target-private data. */ void *init_cost (struct loop *loop); /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ void add_stmt_cost (void *data, unsigned n, vectorized-stmt-kind, enum machine_mode vector_mode); /* Tell the target to compute and return the cost of the accumulated statements and free any target-private data. */ unsigned finish_cost (void *data); with eventually slightly different signatures for add_stmt_cost (like pass in the original scalar stmt?). It allows the target, at finish_cost time, to evaluate things like register pressure and resource utilization. Thanks, Richard. > Thanks, > Bill > > > 2012-06-08 Bill Schmidt > > * doc/tm.texi.in: Add vectorization density hooks. > * doc/tm.texi: Regenerate. > * targhooks.c (default_density_pct_threshold): New. > (default_density_size_threshold): New. > (default_density_penalty): New. > * targhooks.h: New decls for new targhooks.c functions. > * target.def (density_pct_threshold): New DEF_HOOK. > (density_size_threshold): Likewise. > (density_penalty): Likewise. > * tree-vect-loop.c (accum_stmt_cost): New. > (vect_estimate_min_profitable_iters): Perform density test. > * config/rs6000/rs6000.c (TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD): > New macro definition. > (TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD): Likewise. > (TARGET_VECTORIZE_DENSITY_PENALTY): Likewise. > (rs6000_builtin_vectorization_cost): Reduce costs of vec_perm and > vec_promote_demote to correct values. > (rs6000_density_pct_threshold): New. > (rs6000_density_size_threshold): New. > (rs6000_density_penalty): New. > > > Index: gcc/doc/tm.texi > ==
[PATCH] Add vector cost model density heuristic
This patch adds a heuristic to the vectorizer when estimating the minimum profitable number of iterations. The heuristic is target-dependent, and is currently disabled for all targets except PowerPC. However, the intent is to make it general enough to be useful for other targets that want to opt in. A previous patch addressed some PowerPC SPEC degradations by modifying the vector cost model values for vec_perm and vec_promote_demote. The values were set a little higher than their natural values because the natural values were not sufficient to prevent a poor vectorization choice. However, this is not the right long-term solution, since it can unnecessarily constrain other vectorization choices involving permute instructions. Analysis of the badly vectorized loop (in sphinx3) showed that the problem was overcommitment of vector resources -- too many vector instructions issued without enough non-vector instructions available to cover the delays. The vector cost model assumes that instructions always have a constant cost, and doesn't have a way of judging this kind of "density" of vector instructions. The present patch adds a heuristic to recognize when a loop is likely to overcommit resources, and adds a small penalty to the inside-loop cost to account for the expected stalls. The heuristic is parameterized with three target-specific values: * Density threshold: The heuristic will apply only when the percentage of inside-loop cost attributable to vectorized instructions exceeds this value. * Size threshold: The heuristic will apply only when the inside-loop cost exceeds this value. * Penalty: The inside-loop cost will be increased by this percentage value when the heuristic applies. Thus only reasonably large loop bodies that are mostly vectorized instructions will be affected. By applying only a small percentage bump to the inside-loop cost, the heuristic will only turn off vectorization for loops that were considered "barely profitable" to begin with (such as the sphinx3 loop). So the heuristic is quite conservative and should not affect the vast majority of vectorization decisions. Together with the new heuristic, this patch reduces the vec_perm and vec_promote_demote costs for PowerPC to their natural values. I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu and verified that no performance regressions occur on SPEC cpu2006. Is this ok for trunk? Thanks, Bill 2012-06-08 Bill Schmidt * doc/tm.texi.in: Add vectorization density hooks. * doc/tm.texi: Regenerate. * targhooks.c (default_density_pct_threshold): New. (default_density_size_threshold): New. (default_density_penalty): New. * targhooks.h: New decls for new targhooks.c functions. * target.def (density_pct_threshold): New DEF_HOOK. (density_size_threshold): Likewise. (density_penalty): Likewise. * tree-vect-loop.c (accum_stmt_cost): New. (vect_estimate_min_profitable_iters): Perform density test. * config/rs6000/rs6000.c (TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD): New macro definition. (TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD): Likewise. (TARGET_VECTORIZE_DENSITY_PENALTY): Likewise. (rs6000_builtin_vectorization_cost): Reduce costs of vec_perm and vec_promote_demote to correct values. (rs6000_density_pct_threshold): New. (rs6000_density_size_threshold): New. (rs6000_density_penalty): New. Index: gcc/doc/tm.texi === --- gcc/doc/tm.texi (revision 188305) +++ gcc/doc/tm.texi (working copy) @@ -5798,6 +5798,27 @@ The default is @code{NULL_TREE} which means to not loads. @end deftypefn +@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD (void) +This hook should return the maximum density, expressed in percent, for +which autovectorization of loops with large bodies should be constrained. +See also @code{TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD}. The default +is to return 100, which disables the density test. +@end deftypefn + +@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD (void) +This hook should return the minimum estimated size of a vectorized +loop body for which the density test should apply. See also +@code{TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD}. The default is set +to the unreasonable value of 100, which effectively disables +the density test. +@end deftypefn + +@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_PENALTY (void) +This hook should return the penalty, expressed in percent, to be applied +to the inside-of-loop vectorization costs for a loop failing the density +test. The default is 10. +@end deftypefn + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses Index: gcc/doc/tm.texi.in === --- gcc/doc/tm.te