http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52272
--- Comment #4 from Richard Guenther rguenth at gcc dot gnu.org 2012-02-16
12:40:06 UTC ---
Before the patch we choose
Improved to:
cost: 128 (complexity 0)
cand_cost: 19
cand_use_cost: 28 (complexity 0)
candidates: 2, 4, 7
use:0 -- iv_cand:4, cost=(2,0)
use:1 -- iv_cand:4, cost=(2,0)
use:2 -- iv_cand:2, cost=(0,0)
use:3 -- iv_cand:7, cost=(0,0)
use:4 -- iv_cand:7, cost=(4,0)
use:5 -- iv_cand:7, cost=(4,0)
use:6 -- iv_cand:7, cost=(4,0)
use:7 -- iv_cand:7, cost=(4,0)
use:8 -- iv_cand:7, cost=(4,0)
use:9 -- iv_cand:7, cost=(4,0)
and now we do not consider for example candidate 7 for use 4:
candidate 7
var_before ivtmp.190
var_after ivtmp.190
incremented before exit test
type character(kind=4)
base (character(kind=4)) (a_296(D) + (((sizetype) stride.88_9 + (sizetype)
pretmp.141_661) + 1) * 8)
step 8
base object (void *) a_296(D)
use 4
generic
in statement D.2322_387 = axp_318(D) + D.2321_367;
at position
type real(kind=8)[0:D.1963] * restrict
base axp_318(D) + (((sizetype) stride.88_9 + (sizetype) pretmp.141_661) + 1)
* 8
step 8
base object (void *) axp_318(D)
related candidates
and we really do not want to do that because of the wrong-code issue.
We instead end up with
Improved to:
cost: 133 (complexity 7)
cand_cost: 13
cand_use_cost: 39 (complexity 7)
candidates: 4, 5
use:0 -- iv_cand:4, cost=(2,0)
use:1 -- iv_cand:4, cost=(2,0)
use:2 -- iv_cand:5, cost=(0,0)
use:3 -- iv_cand:5, cost=(5,1)
use:4 -- iv_cand:5, cost=(5,1)
use:5 -- iv_cand:5, cost=(5,1)
use:6 -- iv_cand:5, cost=(5,1)
use:7 -- iv_cand:5, cost=(5,1)
use:8 -- iv_cand:5, cost=(5,1)
use:9 -- iv_cand:5, cost=(5,1)
where
candidate 5 (important)
var_before ivtmp.188
var_after ivtmp.188
incremented before exit test
type sizetype
base 0
step 8
I think what we miss to relate uses 4 to 9 which all are of the form
base parameter + (((sizetype) stride.88_9 + (sizetype) pretmp.141_661) + 1)
* 8
is to have a candidate which has the base object stripped and thus
only tracks
(((sizetype) stride.88_9 + (sizetype) pretmp.141_661) + 1) * 8
which we have as IV at least:
ssa name D.2332_451
type sizetype
base (((sizetype) stride.88_9 + (sizetype) pretmp.141_661) + 1) * 8
step 8
and redundant:
ssa name D.2354_680
type sizetype
base (((sizetype) stride.88_9 + (sizetype) pretmp.141_661) + 1) * 8
step 8
ssa name D.2343_692
type sizetype
base (((sizetype) stride.88_9 + (sizetype) pretmp.141_661) + 1) * 8
step 8
ssa name D.2365_752
type sizetype
base (((sizetype) stride.88_9 + (sizetype) pretmp.141_661) + 1) * 8
step 8
ssa name D.2376_763
type sizetype
base (((sizetype) stride.88_9 + (sizetype) pretmp.141_661) + 1) * 8
step 8
but no associated candidate(s). If we add a candidate for it (9) we
end up with
Improved to:
cost: 131 (complexity 0)
cand_cost: 15
cand_use_cost: 35 (complexity 0)
candidates: 4, 9
use:0 -- iv_cand:4, cost=(2,0)
use:1 -- iv_cand:4, cost=(2,0)
use:2 -- iv_cand:9, cost=(3,0)
use:3 -- iv_cand:9, cost=(4,0)
use:4 -- iv_cand:9, cost=(4,0)
use:5 -- iv_cand:9, cost=(4,0)
use:6 -- iv_cand:9, cost=(4,0)
use:7 -- iv_cand:9, cost=(4,0)
use:8 -- iv_cand:9, cost=(4,0)
use:9 -- iv_cand:9, cost=(4,0)
but with that change we now unroll the innermost loop twice, so I'm not
sure it will pay off. The code generation differences even for the
originally patch that caused the regression are only in scheduling
and register allocation (so -fschedule-insns may recover it, or
-fsched-pressure).