Re: [PING] Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-07-25 Thread Sandra Loosemore

On 07/17/2012 05:22 AM, Richard Guenther wrote:

On Wed, Jul 4, 2012 at 6:35 PM, Sandra Loosemore
san...@codesourcery.com  wrote:


Ping?  Original post with patch is here:

http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00319.html


Can you update the patch and numbers based on what Bill did for
straight-line strength reduction which re-uses this analysis/caching part?


I will try to take another look at this once Bill has finished his work 
that touches on this; it's been hard for me to track a moving target.  I 
was wondering if it might be more consistent with Bill's work to defer 
some of the address cost computation to new target hooks, after all.


-Sandra



Re: [PING] Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-07-25 Thread William J. Schmidt
On Wed, 2012-07-25 at 13:39 -0600, Sandra Loosemore wrote:
 On 07/17/2012 05:22 AM, Richard Guenther wrote:
  On Wed, Jul 4, 2012 at 6:35 PM, Sandra Loosemore
  san...@codesourcery.com  wrote:
 
  Ping?  Original post with patch is here:
 
  http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00319.html
 
  Can you update the patch and numbers based on what Bill did for
  straight-line strength reduction which re-uses this analysis/caching part?
 
 I will try to take another look at this once Bill has finished his work 
 that touches on this; it's been hard for me to track a moving target.  I 
 was wondering if it might be more consistent with Bill's work to defer 
 some of the address cost computation to new target hooks, after all.
 
 -Sandra
 

Hi Sandra,

I apologize for the mess.  I should be done causing distress to this
part of the code as soon as the patch I submitted today is committed.

Sorry!
Bill



Re: [PING] Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-07-17 Thread Richard Guenther
On Wed, Jul 4, 2012 at 6:35 PM, Sandra Loosemore
san...@codesourcery.com wrote:
 On 06/05/2012 10:34 AM, Sandra Loosemore wrote:

 2012-06-05  Sandra Loosemoresan...@codesourcery.com

 gcc/
 * tree-ssa-loop-ivopts.c (comp_cost): Make complexity field
 signed.
 Update comments to indicate this is for addressing mode
 complexity.
 (new_cost): Make signedness of parameters match comp_cost fields.
 (compare_costs): Prefer higher complexity, not lower, per
 documentation
 of TARGET_ADDRESS_COST.
 (multiplier_allowed_in_address_p): Use (+ (* reg1 ratio) reg2) to
 probe for valid ratios, rather than just (* reg1 ratio).
 (get_address_cost): Rewrite to eliminate precomputation and
 caching.
 Use target's address cost for autoinc forms if possible.  Only
 attempt
 sym_present -  var_present cost conversion if the sym_present
 form
 is not legitimate; amortize setup cost over loop iterations.
 Adjust complexity computation.
 (get_computation_cost_at): Adjust call to get_address_cost.  Do
 not
 mess with complexity for non-address expressions.
 (determine_use_iv_cost_address): Initialize can_autoinc.
 (autoinc_possible_for_pair): Likewise.


 On 06/13/2012 01:52 PM, Sandra Loosemore wrote:


 Might somebody be willing to review the patch as posted?


 Ping?  Original post with patch is here:

 http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00319.html

Can you update the patch and numbers based on what Bill did for
straight-line strength reduction which re-uses this analysis/caching part?

Thanks,
Richard.

 -Sandra



Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-07-05 Thread Jiangning Liu
Hi,

For the following code change,

@@ -4212,11 +4064,6 @@ get_computation_cost_at (struct ivopts_d
 cost.cost += adjust_setup_cost (data,
add_cost (TYPE_MODE (ctype), speed));

-  /* Having offset does not affect runtime cost in case it is added to
- symbol, but it increases complexity.  */
-  if (offset)
-cost.complexity++;
-
   cost.cost += add_cost (TYPE_MODE (ctype), speed);

   aratio = ratio  0 ? ratio : -ratio;

I think this shouldn't be removed. The offset may be affected by the
position of inserting reduction variable accumulation statement. There
will be different cases between before and after reduction variable
accumulation. The cost of replacing use point with reduction variable
should be different accordingly.

BTW, I personally think the current ivopt cost modelling basically
works fine, although there might be some tunings needed. The most
difficult part is the choice of reduction variable candidates has
something to do with register pressure cost, while the register cost
estimate is not accurate enough at this stage because we don't have
back-end live range interference graph at all. we are always able to
find holes on some particular cases or benchmarks, but we can't only
want to find a optimal result for them, and the tuning needs to be
backed by more comprehensive result.

Thanks,
-Jiangning

2012/6/6 Sandra Loosemore san...@codesourcery.com:
 My colleagues and I have been working on the GCC port for the Qualcomm
 Hexagon.  Along the way I noticed that we were getting poor results
 from the ivopts pass no matter how we adjusted the target-specific RTX
 costs.  In many cases ivopts was coming up with candidate use costs
 that seemed completely inconsistent with the target cost model.  On
 further inspection, I found what appears to be a whole bunch of bugs
 in the way ivopts is computing address costs:

 (1) While the address cost computation is assuming in some situations
 that pre/post increment/decrement addressing will be used if
 supported by the target, it isn't actually using the target's address
 cost for such forms -- instead, just the cost of the form that would
 be used if autoinc weren't available/applicable.

 (2) The computation to determine which multiplier values are supported
 by target addressing modes is constructing an address rtx of the form
 (reg * ratio) to do the tests.  This isn't a valid address RTX on
 Hexagon, although both (reg + reg * ratio) and (sym + reg * ratio)
 are.  Because it's choosing the wrong address form to probe with, it
 thinks that the target doesn't support multipliers at all and is
 incorrectly tacking on an extra cost for them.  I also note that it's
 assuming that the same set of ratios are supported by all three
 address forms that can potentially include them, and that all valid
 ratios have the same cost.

 (3) The computation to determine the range of valid constant offsets
 for address forms that can include them is probing the upper end of
 the range using constants of the form ((1n) - 1).  On Hexagon, the
 offsets have to be aligned appropriately for the mode, so it's
 incorrectly rejecting all positive offsets for non-char modes.  And
 again, it's assuming that the same range of offsets are supported by
 all address forms that can legitimately include them, and that all
 valid offsets have the same cost.  The latter isn't true on Hexagon.

 (4) The cost adjustment for converting the symbol_present address to a
 var_present address seems overly optimistic in assuming that the
 symbol load will be hoisted outside the loop.  I looked at a lot of
 code where this was not happening no matter how expensive I made the
 absolute addressing forms in the target-specific costs.  Also, if
 subsequent passes actually do hoist the symbol load, this requires an
 additional register to be available over the entire loop, which ivopts
 isn't accounting for in any way.  It seems to me that this adjustment
 shouldn't be attempted when the symbol_present form is a legitimate
 address (because subsequent passes are not doing the anticipated
 optimization in that case).  There's also a bug present in the cost
 accounting: it's adding the full cost of the symbol + var addition,
 whereas it should be pro-rated across iterations (see the way this is
 handled for the non-address cases in get_computation_cost_at).

 (5) The documentation of TARGET_ADDRESS_COST says that when two
 (legitimate) address expressions have the same lowest cost, the one
 with the higher complexity is used.  But, the ivopts code is doing the
 opposite of this and choosing the lower complexity as the tie-breaker.
 (Actually, it's also computing the complexity without regard to
 whether the address rtx is even legitimate on the target.)

 (6) The way get_address_costs is precomputing and caching a complete
 set of cost data for each addressing mode seems incorrect to me, in
 the general case.  For example, consider MIPS where MIPS16 

[PING] Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-07-04 Thread Sandra Loosemore

On 06/05/2012 10:34 AM, Sandra Loosemore wrote:


2012-06-05  Sandra Loosemoresan...@codesourcery.com

gcc/
* tree-ssa-loop-ivopts.c (comp_cost): Make complexity field signed.
Update comments to indicate this is for addressing mode complexity.
(new_cost): Make signedness of parameters match comp_cost fields.
(compare_costs): Prefer higher complexity, not lower, per documentation
of TARGET_ADDRESS_COST.
(multiplier_allowed_in_address_p): Use (+ (* reg1 ratio) reg2) to
probe for valid ratios, rather than just (* reg1 ratio).
(get_address_cost): Rewrite to eliminate precomputation and caching.
Use target's address cost for autoinc forms if possible.  Only attempt
sym_present -  var_present cost conversion if the sym_present form
is not legitimate; amortize setup cost over loop iterations.
Adjust complexity computation.
(get_computation_cost_at): Adjust call to get_address_cost.  Do not
mess with complexity for non-address expressions.
(determine_use_iv_cost_address): Initialize can_autoinc.
(autoinc_possible_for_pair): Likewise.


On 06/13/2012 01:52 PM, Sandra Loosemore wrote:


Might somebody be willing to review the patch as posted?


Ping?  Original post with patch is here:

http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00319.html

-Sandra



Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-06-13 Thread Sandra Loosemore

On 06/06/2012 02:29 AM, Richard Guenther wrote:


Pre-computing and caching things is to avoid creating RTXen over and over.
As you have discarded this completely did you try to measure the cost
of doing so in terms of produced garbage and compile-time cost?  Did you
consider changing the target interface of IVOPTs to a (bunch of) new
target hooks that avoid the RTX generation (which in fact we are not sure
that we'll end up producing anyways in exactly that form due to subsequent
optimizations)?


Since there seemed to be resistance to removing the pre-computing of 
costs, I've spent much of the last week trying to glue fixes on the 
existing code while preserving the caching, and just am not happy with 
the result.  It makes the code too complicated, adds additional overhead 
by precomputing more things, and still does not fix the lurking bugs WRT 
differing costs for different values of constant offsets and the like. 
Basically, I don't want to put my name on anything that ugly.  :-P  So, 
I went back and did some compile time benchmarking on my previously 
posted patch instead.


I used the bzip2 and gcc test programs available here:

http://people.csail.mit.edu/smcc/projects/single-file-programs/

These are respectively large and gigantic single-file programs, so you 
would expect the performance effects of caching to be particularly 
evident here as there would likely be many loops involving the same 
modes in each compilation unit.  I compiled them using a native x86_64 
build with time gcc -c -O3, and ran each set of timings 3 times with 
an unmodified build and with my previously-posted patch.  And, it turns 
out there is no obvious difference in the results.


bzip2 base: 5.82, 5.82, 5.75
bzip2 patched: 5.73, 5.71, 5.85

gcc base: 4m44.390, 4m44.270, 4m44.060
gcc patched: 4m44.210, 4m44.530, 4m44.040


Can you split the patch into pieces fixing the above bugs separately?
Removing the pre-compute and caching is the most questionable change,
the others look like real bugs (the symbol cost might be questionable as
well).


Given that most of the bugs were in the same function and were fixed by 
rewriting it completely, trying to split up the patch seems kind of 
pointless, to me.



CC-ing Zdenek for his opinions (disclaimer: I didn't look at the actual patch).


Might somebody be willing to review the patch as posted?

FWIW, I was doing some digging around in the mail archives to see if 
there was any discussion that would help me understand the rationale for 
the current cost model better.  What I found was that when the ivopts 
pass was originally added back in 2004, the costs computation was one of 
the things that was specifically mentioned as needing work at least to 
document it better, but the patch was rushed through and approved 
without any thorough review because the Stage 3 deadline was looming. 
Anyway, given that get_address_cost has had a big FIXME on it all these 
years, it seems to me like maybe it's time to try to fix it?


-Sandra



Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-06-10 Thread Oleg Endo
On Fri, 2012-06-08 at 22:33 -0400, Hans-Peter Nilsson wrote:
 On Tue, 5 Jun 2012, Sandra Loosemore wrote:
 
  (1) While the address cost computation is assuming in some situations
  that pre/post increment/decrement addressing will be used if
  supported by the target, it isn't actually using the target's address
  cost for such forms -- instead, just the cost of the form that would
  be used if autoinc weren't available/applicable.
 
 There are lots of bugzilla entries complaining about bad
 autoinc/dec generation.  Maybe your patch solves some of them?
 

I've tried some of the cases mentioned in 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749
with Sandra's patch applied.  Unfortunately it didn't help much.  There
seem to be other things going wrong with auto-inc-dec.

BTW, auto-inc-dec uses 'set_src_cost' in 'attempt_change' to determine
the address costs.  At least the SH target will not respond to that
properly.  I was thinking of adding something to sh_rtx_costs to invoke
sh_address_cost as a fix for that, but on the other hand I was wondering
why the target's address cost function isn't used in auto-inc-dec
directly ... 

Cheers,
Oleg



inc-dec (was: Re: [RFC, ivopts] fix bugs in ivopts address cost computation)

2012-06-10 Thread Hans-Peter Nilsson
On Sun, 10 Jun 2012, Oleg Endo wrote:
 I've tried some of the cases mentioned in
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749
 with Sandra's patch applied.  Unfortunately it didn't help much.

But thanks for checking!

  There
 seem to be other things going wrong with auto-inc-dec.

Yeah, probably in more than one place.

 BTW, auto-inc-dec uses 'set_src_cost' in 'attempt_change' to determine
 the address costs.  At least the SH target will not respond to that
 properly.  I was thinking of adding something to sh_rtx_costs to invoke
 sh_address_cost as a fix for that, but on the other hand I was wondering
 why the target's address cost function isn't used in auto-inc-dec
 directly ...

Sounds like a bug.

TBH, I haven't dug into the real reason why
auto-inc-dec-generation is still poor (or whether it by magic
has improved dramatically recently), because every so often
there's some effort to improve that, alas I don't remember
seeing any improvement mentioned for any target I have interest
in.

brgds, H-P


Re: inc-dec (was: Re: [RFC, ivopts] fix bugs in ivopts address cost computation)

2012-06-10 Thread Oleg Endo
On Sun, 2012-06-10 at 13:50 -0400, Hans-Peter Nilsson wrote:
 On Sun, 10 Jun 2012, Oleg Endo wrote:
  I've tried some of the cases mentioned in
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749
  with Sandra's patch applied.  Unfortunately it didn't help much.
 
 But thanks for checking!
 

Sure thing.
I forgot to mention that when trying it, I also played a bit with the
address cost function, but it didn't have any effect either.

  BTW, auto-inc-dec uses 'set_src_cost' in 'attempt_change' to determine
  the address costs.  At least the SH target will not respond to that
  properly.  I was thinking of adding something to sh_rtx_costs to invoke
  sh_address_cost as a fix for that, but on the other hand I was wondering
  why the target's address cost function isn't used in auto-inc-dec
  directly ...
 
 Sounds like a bug.

Depends on the perspective ;)
I don't know on/for which target it was developed originally.  If this
target's rtx cost function meets the expectations, then there's a bug in
any other target ;)

 TBH, I haven't dug into the real reason why
 auto-inc-dec-generation is still poor (or whether it by magic
 has improved dramatically recently), because every so often
 there's some effort to improve that, alas I don't remember
 seeing any improvement mentioned for any target I have interest
 in.

At the time when I filed the aforementioned PR it was at least able to
find and generate the first post-inc addr (see the original description
of the PR).  Now it seems that it fails to do so.  So I guess some of
the pre-conditions for the auto-inc-dec pass have slightly changed since
then...

Cheers,
Oleg





Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-06-08 Thread Hans-Peter Nilsson
On Tue, 5 Jun 2012, Sandra Loosemore wrote:

 (1) While the address cost computation is assuming in some situations
 that pre/post increment/decrement addressing will be used if
 supported by the target, it isn't actually using the target's address
 cost for such forms -- instead, just the cost of the form that would
 be used if autoinc weren't available/applicable.

There are lots of bugzilla entries complaining about bad
autoinc/dec generation.  Maybe your patch solves some of them?

 (2) The computation to determine which multiplier values are supported
 by target addressing modes is constructing an address rtx of the form
 (reg * ratio) to do the tests.  This isn't a valid address RTX on
 Hexagon, although both (reg + reg * ratio) and (sym + reg * ratio)
 are.

Yeah, I've spotted this one and (7), funny in a bad way.  It's
not a sane addressing mode except as a corner-case of (reg*ratio
+ constant) (e.g. constant=sym).  A value in a register, and
just multiply that by a constant to use as an address?  When
would that be useful?  Should a target include the corner-case
as a special-case addressing-mode just to appease ivopts?  Made
me think less of ivopts.  Dunno if I entered a PR, mea culpa
...doesn't seem so.

 I bootstrapped and regression-tested the patch on x86_64.  I haven't
 tried to benchmark the performance effect of the patch on anything
 other than Hexagon; there I found that, once ivopts actually started
 paying attention to the target address costs function, it needed to be
 re-tuned.  So, it wouldn't surprise me if other back ends could
 benefit from some tweaking as well, depending on the extent to which
 they're affected by the bugs I listed above.

Right, but the lesson learned is to just ignore effects on other
targets...  In all fairness, I don't think there's anything to
do regarding this patch in the default cost function, but it'd
nice with a heads-up before committing the final version of this
patch for a change though, maybe even with rtx cost
tweaking-examples from a target of your choice (in the tree) if
I could wish.

 Comments, complaints, proposals for alternate fixes, etc?  Or OK to
 commit?

Thank you!  Others mentioned benchmarking on some major target,
so I'll just add a wish for some PR annotations, any target with
ivopts-related PR's.

brgds, H-P


Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-06-06 Thread Richard Guenther
On Tue, Jun 5, 2012 at 6:34 PM, Sandra Loosemore
san...@codesourcery.com wrote:
 My colleagues and I have been working on the GCC port for the Qualcomm
 Hexagon.  Along the way I noticed that we were getting poor results
 from the ivopts pass no matter how we adjusted the target-specific RTX
 costs.  In many cases ivopts was coming up with candidate use costs
 that seemed completely inconsistent with the target cost model.  On
 further inspection, I found what appears to be a whole bunch of bugs
 in the way ivopts is computing address costs:

 (1) While the address cost computation is assuming in some situations
 that pre/post increment/decrement addressing will be used if
 supported by the target, it isn't actually using the target's address
 cost for such forms -- instead, just the cost of the form that would
 be used if autoinc weren't available/applicable.

 (2) The computation to determine which multiplier values are supported
 by target addressing modes is constructing an address rtx of the form
 (reg * ratio) to do the tests.  This isn't a valid address RTX on
 Hexagon, although both (reg + reg * ratio) and (sym + reg * ratio)
 are.  Because it's choosing the wrong address form to probe with, it
 thinks that the target doesn't support multipliers at all and is
 incorrectly tacking on an extra cost for them.  I also note that it's
 assuming that the same set of ratios are supported by all three
 address forms that can potentially include them, and that all valid
 ratios have the same cost.

 (3) The computation to determine the range of valid constant offsets
 for address forms that can include them is probing the upper end of
 the range using constants of the form ((1n) - 1).  On Hexagon, the
 offsets have to be aligned appropriately for the mode, so it's
 incorrectly rejecting all positive offsets for non-char modes.  And
 again, it's assuming that the same range of offsets are supported by
 all address forms that can legitimately include them, and that all
 valid offsets have the same cost.  The latter isn't true on Hexagon.

 (4) The cost adjustment for converting the symbol_present address to a
 var_present address seems overly optimistic in assuming that the
 symbol load will be hoisted outside the loop.  I looked at a lot of
 code where this was not happening no matter how expensive I made the
 absolute addressing forms in the target-specific costs.  Also, if
 subsequent passes actually do hoist the symbol load, this requires an
 additional register to be available over the entire loop, which ivopts
 isn't accounting for in any way.  It seems to me that this adjustment
 shouldn't be attempted when the symbol_present form is a legitimate
 address (because subsequent passes are not doing the anticipated
 optimization in that case).  There's also a bug present in the cost
 accounting: it's adding the full cost of the symbol + var addition,
 whereas it should be pro-rated across iterations (see the way this is
 handled for the non-address cases in get_computation_cost_at).

 (5) The documentation of TARGET_ADDRESS_COST says that when two
 (legitimate) address expressions have the same lowest cost, the one
 with the higher complexity is used.  But, the ivopts code is doing the
 opposite of this and choosing the lower complexity as the tie-breaker.
 (Actually, it's also computing the complexity without regard to
 whether the address rtx is even legitimate on the target.)

 (6) The way get_address_costs is precomputing and caching a complete
 set of cost data for each addressing mode seems incorrect to me, in
 the general case.  For example, consider MIPS where MIPS16 and MIPS32
 functions can be freely intermixed in the same compilation unit.  On
 Hexagon, as I've noted, the assumption that all valid multiplier and
 offset values have the same cost is also invalid.  The current code is
 already precomputing 16 sets of address costs for each mode, plus
 probing 128 addresses with multipliers for validity, and similarly
 probing up to 32 or so constant offsets for validity, so I'm pretty
 skeptical that precomputing and caching even more permutations would
 be worthwhile.

Pre-computing and caching things is to avoid creating RTXen over and over.
As you have discarded this completely did you try to measure the cost
of doing so in terms of produced garbage and compile-time cost?  Did you
consider changing the target interface of IVOPTs to a (bunch of) new
target hooks that avoid the RTX generation (which in fact we are not sure
that we'll end up producing anyways in exactly that form due to subsequent
optimizations)?

 (7) If the computed address cost turns out to be 0, the current code
 (for some unknown reason) is turning that into 1, which can screw up
 the relative costs of address computations vs other operations like
 addition.

 I've come up with the attached patch to try to fix these things.  The
 biggest change is that I have discarded the code for precomputing and
 caching costs and instead 

Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-06-06 Thread Zdenek Dvorak
Hi,

  (7) If the computed address cost turns out to be 0, the current code
  (for some unknown reason) is turning that into 1, which can screw up
  the relative costs of address computations vs other operations like
  addition.
 
  I've come up with the attached patch to try to fix these things.  The
  biggest change is that I have discarded the code for precomputing and
  caching costs and instead go straight to querying the target back end
  for the cost for the specific address computation we're handed; this
  makes the code a lot simpler.  I would kind of like to get rid of
  multiplier_allowed_in_address_p too, but it's being used in a couple
  places other than the address computation and it seemed better not to
  mess with that for now.  The other fixes are pretty straightforward.
 
 Can you split the patch into pieces fixing the above bugs separately?
 Removing the pre-compute and caching is the most questionable change,
 the others look like real bugs (the symbol cost might be questionable as
 well).

the changes seem reasonable to me (with the caveat about caching).  On the
other hand, an important thing to keep in mind is that all these are just
heuristics that cannot model the actual costs very realistically.  Furthermore,
due to interference from further optimizations, there is no guarantee that say
the choices of the addressing modes by ivopts will actually be respected.

Consequently, improving the cost computation will not necessarily improve
the resulting code quality, and beyond some point, any such improvements
become essentially irrelevant.  So, any changes complicating the model
should be backed by benchmarking, as otherwise there is no way to say
whether there are actually benefitial or not,

Zdenek


[RFC, ivopts] fix bugs in ivopts address cost computation

2012-06-05 Thread Sandra Loosemore
My colleagues and I have been working on the GCC port for the Qualcomm
Hexagon.  Along the way I noticed that we were getting poor results
from the ivopts pass no matter how we adjusted the target-specific RTX
costs.  In many cases ivopts was coming up with candidate use costs
that seemed completely inconsistent with the target cost model.  On
further inspection, I found what appears to be a whole bunch of bugs
in the way ivopts is computing address costs:

(1) While the address cost computation is assuming in some situations
that pre/post increment/decrement addressing will be used if
supported by the target, it isn't actually using the target's address
cost for such forms -- instead, just the cost of the form that would
be used if autoinc weren't available/applicable.

(2) The computation to determine which multiplier values are supported
by target addressing modes is constructing an address rtx of the form
(reg * ratio) to do the tests.  This isn't a valid address RTX on
Hexagon, although both (reg + reg * ratio) and (sym + reg * ratio)
are.  Because it's choosing the wrong address form to probe with, it
thinks that the target doesn't support multipliers at all and is
incorrectly tacking on an extra cost for them.  I also note that it's
assuming that the same set of ratios are supported by all three
address forms that can potentially include them, and that all valid
ratios have the same cost.

(3) The computation to determine the range of valid constant offsets
for address forms that can include them is probing the upper end of
the range using constants of the form ((1n) - 1).  On Hexagon, the
offsets have to be aligned appropriately for the mode, so it's
incorrectly rejecting all positive offsets for non-char modes.  And
again, it's assuming that the same range of offsets are supported by
all address forms that can legitimately include them, and that all
valid offsets have the same cost.  The latter isn't true on Hexagon.

(4) The cost adjustment for converting the symbol_present address to a
var_present address seems overly optimistic in assuming that the
symbol load will be hoisted outside the loop.  I looked at a lot of
code where this was not happening no matter how expensive I made the
absolute addressing forms in the target-specific costs.  Also, if
subsequent passes actually do hoist the symbol load, this requires an
additional register to be available over the entire loop, which ivopts
isn't accounting for in any way.  It seems to me that this adjustment
shouldn't be attempted when the symbol_present form is a legitimate
address (because subsequent passes are not doing the anticipated
optimization in that case).  There's also a bug present in the cost
accounting: it's adding the full cost of the symbol + var addition,
whereas it should be pro-rated across iterations (see the way this is
handled for the non-address cases in get_computation_cost_at).

(5) The documentation of TARGET_ADDRESS_COST says that when two
(legitimate) address expressions have the same lowest cost, the one
with the higher complexity is used.  But, the ivopts code is doing the
opposite of this and choosing the lower complexity as the tie-breaker.
(Actually, it's also computing the complexity without regard to
whether the address rtx is even legitimate on the target.)

(6) The way get_address_costs is precomputing and caching a complete
set of cost data for each addressing mode seems incorrect to me, in
the general case.  For example, consider MIPS where MIPS16 and MIPS32
functions can be freely intermixed in the same compilation unit.  On
Hexagon, as I've noted, the assumption that all valid multiplier and
offset values have the same cost is also invalid.  The current code is
already precomputing 16 sets of address costs for each mode, plus
probing 128 addresses with multipliers for validity, and similarly
probing up to 32 or so constant offsets for validity, so I'm pretty
skeptical that precomputing and caching even more permutations would
be worthwhile.

(7) If the computed address cost turns out to be 0, the current code
(for some unknown reason) is turning that into 1, which can screw up
the relative costs of address computations vs other operations like
addition.  

I've come up with the attached patch to try to fix these things.  The
biggest change is that I have discarded the code for precomputing and
caching costs and instead go straight to querying the target back end
for the cost for the specific address computation we're handed; this
makes the code a lot simpler.  I would kind of like to get rid of
multiplier_allowed_in_address_p too, but it's being used in a couple
places other than the address computation and it seemed better not to
mess with that for now.  The other fixes are pretty straightforward.

I bootstrapped and regression-tested the patch on x86_64.  I haven't
tried to benchmark the performance effect of the patch on anything
other than Hexagon; there I found that, once ivopts actually started
paying