--- Comment #26 from hubicka at gcc dot gnu dot org 2008-09-06 12:00
---
IRA seems to fix the remaining problem with spill in internal loop on 32bit
nicely, so we produce good scores for gzip compared to older GCC versions.
--- Comment #27 from hubicka at gcc dot gnu dot org 2008-09-06 12:02
---
Also just noticed that offline copy of longest-match get extra move:
.L15:
movzbl 2(%eax), %edi #, tmp87
leal2(%eax), %ecx #, scan.158
movl%edi, %edx # tmp87,
--- Comment #25 from hubicka at gcc dot gnu dot org 2008-02-08 15:39
---
-fno-tree-dominator-opts -fno-tree-copyrename solves the coalescing problem
(name is introduced by second, the actual problematic pattern by first pass),
saving roughly 1s at both -O2 and 2s at -O3, -O3 is still
--- Comment #24 from hubicka at gcc dot gnu dot org 2008-02-08 15:11
---
Hi,
the tonight runs with continue heuristics shows again improvements on 64bit
scores , but degradation on 32bit scores. Looking into the loop, the real
trouble seems to be that the main loop has 6 loop carried
--- Comment #23 from hubicka at gcc dot gnu dot org 2008-02-07 12:30
---
Created an attachment (id=15115)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15115action=view)
Annotated profile
I am attaching dump with profile read in. It shows the hot spots in
longest_match at least:
--- Comment #17 from hubicka at gcc dot gnu dot org 2008-02-06 13:28
---
One problem is the following:
do {
;
match = window + cur_match;
if (match[best_len] != scan_end ||
match[best_len-1] != scan_end1 ||
*match != *scan ||
--- Comment #18 from hubicka at gcc dot gnu dot org 2008-02-06 16:44
---
Created an attachment (id=15107)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15107action=view)
Path to predict_paths_leading_to
Hi,
I've revived the continue heuristic patch. By itself it does not help
--- Comment #19 from hubicka at gcc dot gnu dot org 2008-02-06 16:56
---
Created an attachment (id=15108)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15108action=view)
Complete continue heuristic patch
Hi,
this is the complete patch. With this patch we produce profile sane
--- Comment #20 from ubizjak at gmail dot com 2008-02-06 18:42 ---
Whoa, adding -fomit-frame-pointer brings us from
(gcc -O3 -m32)
user0m41.031s
to
(gcc -O3 -m32 -fomit-frame-pointer)
user0m30.006s
Since -fo-f-p adds another free reg, it looks that since inlining increases
--- Comment #21 from ubizjak at gmail dot com 2008-02-06 19:10 ---
(In reply to comment #20)
Since -fo-f-p adds another free reg, it looks that since inlining increases
register pressure some unlucky heavy-used variable gets allocated to the stack
slot.
It is best_len (and
--- Comment #22 from hubicka at gcc dot gnu dot org 2008-02-06 19:22
---
Yes, there are number of unlucky variables. However the real source is here
seems to be always wrong profile guiding regalloc to optimize for cold portions
of the function rather than real increase of register
--- Comment #15 from hubicka at gcc dot gnu dot org 2008-02-05 13:36
---
Thanks, looks comparable to K8 scores, except that -O3 is not actually that
worse there. So it looks there is more than just random effect of code layout
involved, I will try to look into the assembly produced
--- Comment #16 from hubicka at gcc dot gnu dot org 2008-02-05 13:55
---
Thanks, looks comparable to K8 scores, except that -O3 is not actually that
worse there. So it looks there is more than just random effect of code layout
involved, I will try to look into the assembly produced
--- Comment #13 from hubicka at gcc dot gnu dot org 2008-02-03 13:39
---
Tonight runs on haydn with patch in shows regression on gzip: 950-901 in
32bit. FDO 64bit runs are not affected.
This is same score as we had in December, we improved a bit since then but not
enough to match
--- Comment #14 from ubizjak at gmail dot com 2008-02-03 17:35 ---
(In reply to comment #13)
Uros, would be possible to give it a try on Core? That would help to figure
out if it is code layout problem of K8.
Hm, the patch doesn't seem to help:
-m32 -O2: 32.434
-m32 -O2 (patched):
--- Comment #12 from hubicka at gcc dot gnu dot org 2008-02-02 16:22
---
Created an attachment (id=15079)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15079action=view)
address accumulation patch
While working on PR17863 I wrote the attached patch to make fwprop to combine
code
--- Comment #11 from hubicka at gcc dot gnu dot org 2008-01-16 16:46
---
Last time I looked into it, it was code
alignment affected by inlining in the string matching loop (longest_match).
This code is very atypical, since the internal loop
--- Comment #3 from rguenth at gcc dot gnu dot org 2007-12-10 10:52 ---
I don't think this qualifies as a 4.3 regression -
http://www.suse.de/~gcctest/SPEC/CINT/sb-haydn-head-64-32o-32bit/index.html
shows that while there were jumps, the numbers close to the 4.2 release are
actually
--- Comment #4 from ubizjak at gmail dot com 2007-12-10 12:31 ---
(In reply to comment #3)
I don't think this qualifies as a 4.3 regression -
Fair enough. It looks that this problem is specific to Core2.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761
--- Comment #5 from ubizjak at gmail dot com 2007-12-10 17:12 ---
(In reply to comment #4)
Fair enough. It looks that this problem is specific to Core2.
Here are timings with 'gcc version 4.3.0 20071201 (experimental) [trunk
revision 130554] (GCC)' on
vendor_id : GenuineIntel
--- Comment #6 from rguenther at suse dot de 2007-12-10 17:13 ---
Subject: Re: non-optimal inlining heuristics
pessimizes gzip SPEC score at -O3
On Mon, 10 Dec 2007, ubizjak at gmail dot com wrote:
(In reply to comment #4)
Fair enough. It looks that this problem is specific to
--- Comment #7 from ubizjak at gmail dot com 2007-12-10 17:26 ---
(In reply to comment #6)
FSF GCC 4.1 does not have -mtune=generic.
OK, OK. Now with 'gcc version 4.1.3 20070716 (prerelease)':
-m32 -O2: 29.306s
-m32 -O3: 29.582s
I don't have 4.2 here.
--
22 matches
Mail list logo