[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
--- Comment #41 from rguenth at gcc dot gnu dot org 2008-12-07 13:00 --- There's not much to be done for aliasing - everything points to global memory and thus aliases. There may be some opportunities for offset-based disambiguations via pointers, but I didn't investigate in detail. Whoever wants someone to work on specific details needs to provide way shorter testcases ;) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
--- Comment #39 from lucier at math dot purdue dot edu 2008-12-06 16:37 --- I may have narrowed down the problem a bit. With this compiler (revision 118491): pythagoras-277% /tmp/lucier/install/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --enable-checking=release --prefix=/tmp/lucier/install --enable-languages=c Thread model: posix gcc version 4.3.0 20061105 (experimental) one gets (on a faster machine than previous reports) (time (direct-fft-recursive-4 a table)) 133 ms real time 140 ms cpu time (140 user, 0 system) no collections 64 bytes allocated no minor faults no major faults With this compiler (revision 118474): pythagoras-24% /tmp/lucier/install/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --enable-checking=release --prefix=/tmp/lucier/install --enable-languages=c Thread model: posix gcc version 4.3.0 20061104 (experimental) one gets (time (direct-fft-recursive-4 a table)) 116 ms real time 108 ms cpu time (108 user, 0 system) no collections 64 bytes allocated no minor faults no major faults and you see the typical problem with assembly code from direct.i with the later compiler. Paolo may have been right about fwprop, this patch was installed that day: Author: bonzini Date: Sat Nov 4 08:36:45 2006 New Revision: 118475 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=118475 Log: 2006-11-03 Paolo Bonzini [EMAIL PROTECTED] Steven Bosscher [EMAIL PROTECTED] * fwprop.c: New file. * Makefile.in: Add fwprop.o. * tree-pass.h (pass_rtl_fwprop, pass_rtl_fwprop_with_addr): New. * passes.c (init_optimization_passes): Schedule forward propagation. * rtlanal.c (loc_mentioned_in_p): Support NULL value of the second parameter. * timevar.def (TV_FWPROP): New. * common.opt (-fforward-propagate): New. * opts.c (decode_options): Enable forward propagation at -O2. * gcse.c (one_cprop_pass): Do not run local cprop unless touching jumps. * cse.c (fold_rtx_subreg, fold_rtx_mem, fold_rtx_mem_1, find_best_addr, canon_for_address, table_size): Remove. (new_basic_block, insert, remove_from_table): Remove references to table_size. (fold_rtx): Process SUBREGs and MEMs with equiv_constant, make simplification loop more straightforward by not calling fold_rtx recursively. (equiv_constant): Move here a small part of fold_rtx_subreg, do not call fold_rtx. Call avoid_constant_pool_reference to process MEMs. * recog.c (canonicalize_change_group): New. * recog.h (canonicalize_change_group): New. * doc/invoke.texi (Optimization Options): Document fwprop. * doc/passes.texi (RTL passes): Document fwprop. Added: trunk/gcc/fwprop.c Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/common.opt trunk/gcc/cse.c trunk/gcc/doc/invoke.texi trunk/gcc/doc/passes.texi trunk/gcc/gcse.c trunk/gcc/opts.c trunk/gcc/passes.c trunk/gcc/recog.c trunk/gcc/recog.h trunk/gcc/rtlanal.c trunk/gcc/timevar.def trunk/gcc/tree-pass.h -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
--- Comment #40 from bonzini at gnu dot org 2008-12-07 02:55 --- IIUC this is a typical case in which CSE was fixing something that earlier passes messed up. Unfortunately fwprop does (better) what CSE was meant to do, but does not do what I assumed was already done before CSE. If the problem is aliasing/FRE, then I think Richi is the one who could fix it for good in the tree passes. If there is more to it, however, I can take a look at why fwprop is generating the ugly code. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
--- Comment #36 from lucier at math dot purdue dot edu 2008-09-04 20:39 --- I don't really understand the status of this bug. Before 4.3.0, it was P!, and Mark said he said he'd like to see us start to explain these kinds of dramatic performance changes. There was quite a bit of detective work that ended with for some reason gcc-4.3 transforms only _some_ instructions (line 708+ in _.085t.fre dump) Richard opined that it was an alias partitioning problem, but Uros noted that for the original code instead of the reduced testcase expanding some parameter to its maximum still doesn't fix the problem. So (a) we don't know what the current code is doing wrong, and (b) we don't know why 4.2 got it right. So I don't think Mark got what he wanted, and now it's P2, and each release the target release for fixing it gets pushed back. I've been testing mainline on this bug sporadically, especially when an entry in gcc-patches mentions some words that also appear on this PR, to see if it's fixed. I'm a bit concerned that the target of 4.3.* is becoming increasingly out of reach, as changes committed to that branch seem to be more and more conservative because it's a release branch. I don't think the code for this bug is terribly atypical for machine-generated code; it would be nice to be able to remove this performance regression. Unfortunately, I'm in no position to do so. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
--- Comment #37 from rguenth at gcc dot gnu dot org 2008-09-04 20:43 --- We have to admit that this bug is unlikely to get fixed in the 4.3 series. It still lacks proper analysis, as unfortunately that done on the shorter testcase was not valid. Analysis takes time, and honestly at this point I rather spend time fixing wrong-code or ice-on-valid bugs. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
--- Comment #38 from lucier at math dot purdue dot edu 2008-09-04 20:49 --- OK, but I was moved to write because Jakub's latest 4.4 status report requests Please concentrate now on fixing bugs, especially the performance regressions. and this is a definite 4.3/4.4 performance regression from 4.2. (How many of the P1 PRs are performance regressions?) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
--- Comment #35 from jsm28 at gcc dot gnu dot org 2008-08-27 22:02 --- 4.3.2 is released, changing milestones to 4.3.3. -- jsm28 at gcc dot gnu dot org changed: What|Removed |Added Target Milestone|4.3.2 |4.3.3 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
--- Comment #34 from lucier at math dot purdue dot edu 2008-07-09 16:05 --- Problem still exists with euler-18% /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --enable-checking=release --with-gmp=/pkgs/gmp-4.2.2/ --with-mpfr=/pkgs/gmp-4.2.2/ --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-gather-detailed-mem-stats Thread model: posix gcc version 4.4.0 20080708 (experimental) [trunk revision 137644] (GCC) Just checking whether recent changes happened to fix it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
--- Comment #32 from lucier at math dot purdue dot edu 2008-05-30 16:01 --- I've decided to test the current ira branch with this problem. I used the build instructions in comment 24. With -fno-ira I get the same results as with 4.3.0 (no surprise there). With -fira I get the time (time (direct-fft-recursive-4 a table)) 422 ms real time 421 ms cpu time (421 user, 0 system) no collections 64 bytes allocated no minor faults no major faults which is an improvement, and the code at the beginning of the loop is .L7262: movq%rdx, %rcx addq(%rsi), %rcx leaq4(%rdx), %r15 movq%rcx, (%rbx) addq$4, %rcx movq%rcx, (%rbp) movq(%rbx), %rcx addq(%rsi), %rcx movq%rcx, (%rdi) addq$4, %rcx movq%rcx, (%r8) movq(%rdi), %rcx addq(%rsi), %rcx leaq4(%rcx), %r10 movq%rcx, (%r9) movq%r10, (%r13) movq(%rax), %rcx addq$7, %rcx movsd (%rcx,%r10,2), %xmm4 movq(%r9), %r10 leaq(%rcx,%rdx,2), %r11 addq$8, %rdx movsd (%r11), %xmm11 movsd (%rcx,%r10,2), %xmm5 movq(%r8), %r10 movsd (%rcx,%r10,2), %xmm6 movq(%rdi), %r10 movsd (%rcx,%r10,2), %xmm7 movq(%rbp), %r10 movsd (%rcx,%r10,2), %xmm8 movq(%rbx), %r10 movapd %xmm8, %xmm14 movsd (%rcx,%r10,2), %xmm9 leaq(%r15,%r15), %r10 movsd (%rcx,%r10), %xmm10 movq(%r12), %rcx movapd %xmm9, %xmm15 movsd 15(%rcx), %xmm1 movsd 7(%rcx), %xmm2 movapd %xmm1, %xmm13 movsd 31(%rcx), %xmm3 movapd %xmm2, %xmm12 which is also an improvement, but it still is nowhere near the result for 4.2.2. So, whatever is causing this problem, it appears the new register allocator isn't going to fix it. The code generated by today's mainline (136210) isn't better than 4.3.0; the time is (time (direct-fft-recursive-4 a table)) 469 ms real time 469 ms cpu time (469 user, 0 system) no collections 64 bytes allocated no minor faults no major faults and code is essentially the same as for 4.3.0 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
-- rguenth at gcc dot gnu dot org changed: What|Removed |Added Known to fail||4.3.0 Target Milestone|4.3.0 |4.3.1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928