[Bug target/32523] disastrous scheduling for POWER5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523 Christian Cornelssen ccorn at cs dot tu-berlin.de changed: What|Removed |Added CC||ccorn at cs dot ||tu-berlin.de --- Comment #14 from Christian Cornelssen ccorn at cs dot tu-berlin.de 2010-11-03 19:32:24 UTC --- Reproduced the problem on a PowerMac G5 with 2 PPC970MP (4 cores) under MacOS X 10.4.11 (Darwin 8.11.0). Using the attachment of the original bug report, I compared a) Apple's version of GCC-4.0 as provied by Xcode 2.5 as /usr/bin/gcc: powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5370) b) GCC-4.4.5 as provided by MacPorts: gcc-mp-4.4 (GCC) 4.4.5 simply by issuing the command make double GCC3=gcc-4.0 GCC4=gcc-mp-4.4 Performance drop is about one third with GCC-4.4.5 instead of Apple's version of GCC-4.0.1, but is almost restored when using -fno-schedule-insns -fno-rerun-loop-opt with GCC-4.4.5.
[Bug target/32523] disastrous scheduling for POWER5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523 --- Comment #9 from David Fang fang at csl dot cornell.edu 2010-09-29 21:36:02 UTC --- Out of curiosity, any benchmark updates on more recent releases?
[Bug target/32523] disastrous scheduling for POWER5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523 --- Comment #10 from R. Clint Whaley whaley at cs dot utsa.edu 2010-09-29 22:22:22 UTC --- Out of curiosity, any benchmark updates on more recent releases? Nope, after several rough experiences I've stopped reporting gcc bugs and problems. It usually takes weeks of my time, and I think only once or twice has the problem been fixed because of my report, which is typically reported as invalid by Pinski right up until it is fixed. Usually the problem gets fixed accidentally by other updates if it is ever fixed at all. I've started to just rewrite things to ameliorate gcc problems. I'll only report problems if I can't get anything workable with this approach, since rewriting whole code generators is faster than getting anyone here to confirm, much less fix gcc problems. I've largely insulated myself from all the gcc performance regressions that used to cripple my library by extensive use of assembly, which allows me to help my users even while gcc remains terribly slow. I don't think I'm the only developer who has been forced to take this path. Cheers, Clint
[Bug target/32523] disastrous scheduling for POWER5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523 --- Comment #11 from Andrew Pinski pinskia at gcc dot gnu.org 2010-09-29 22:39:14 UTC --- (In reply to comment #10) which is typically reported as invalid by Pinski right up until it is fixed. I just looked into the bugs which you have filed and saw a different pattern. I think you are putting too much blame on me. This is ok as I am the one who normally touches almost every bug. In the bugs you filed, I noticed one where I made a comment which was supposed to be interrupted as an internal developer comment rather than one about your code. In another one (PR 30599), the problem was in your code as you were requesting a truncation to happen; yes we went back and forth on that one but you requested the truncation and GCC actually did it in that case. In another it was about a warning generated because of glibc marking a function to be warned about. In another one, GCC did not build because of an older version of Xcode in Mac OS X. In another the bug was marked as won't fix in the end but not by me. So please be more careful when you saying I close bugs as invalid right until they are fixed. Yes it has happened to one bug in the past (though I think I still say that bug was invalid; I cannot remember the number right now). Really I should have ignored this trolling really.
[Bug target/32523] disastrous scheduling for POWER5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523 --- Comment #12 from R. Clint Whaley whaley at cs dot utsa.edu 2010-09-29 23:10:50 UTC --- Andrew, I'm certainly unsurprised that you disagree with me, since I don't think we have ever agreed on anything in something like 5 years. To get an idea of what I'm talking about, scope: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827 where I have to show the problem affects almost every x86 architecture in use at that time until someone admits it is a problem (somewhere around comment #25, I think). I don't believe you ever said it was a problem. How about this bug, still unconfirmed 3 years after I posted the benchmark showing it? How about this beauty: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38496 Obviously, I disagree with your summary of our interactions on the x87 gcc arbitrary rounding bug, but at least people can scope the link to see if they agree with your description. Unfortunately, several similar things I reported years ago have aged out of the system. If you could point out any report that I sent in where you agreed that it was a bug or a problem before someone else did, maybe we can dispel my feeling that you are someone who just routinely marks things as unimportant regardless of the facts. Regards, Clint
[Bug target/32523] disastrous scheduling for POWER5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523 --- Comment #13 from Andrew Pinski pinskia at gcc dot gnu.org 2010-09-29 23:20:29 UTC --- (In reply to comment #12) Andrew, I'm certainly unsurprised that you disagree with me, since I don't think we have ever agreed on anything in something like 5 years. To get an idea of what I'm talking about, scope: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827 And if you look at the history of those two bugs, you will notice I did not close them as invalid at all. I might have suggested they were but I never closed them as such. I had left them for people who would analysis them better. So you first said I marked it as invalid which was not true as the history on the bug report does not lie. For this bug, the problem of the first pass of the scheduler increases life range of variables which causes the register allocator not to do a good job. There are other bugs which record that fact already too (I don't know them currently but you can find them via searching for -fno-schedule-insns). It is a well known issue which has been improved. Which I mentioned exactly in comment #2. Nobody might have tested your testcase again which is why someone finally decided to ask you if you want to test it. As I mentioned in this bug report you were testing a heavily modified 3.3.3 (I know because unit-at-a-time was included in SUSE's 3.3).
[Bug target/32523] disastrous scheduling for POWER5
--- Comment #8 from whaley at cs dot utsa dot edu 2007-06-28 14:18 --- I've been doing further testing on the g5 (the only machine where I have local and root access), and this problem does not occur with stock gcc 4.1.1 either. Therefore, whatever problem is avoided by throwing -fno-schedule-insns was not in 4.1.1. BTW, as on the Power5, the best kernel does not get all it's performance back by throwing this flag, even though the simplified example does. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523
[Bug target/32523] disastrous scheduling for POWER5
--- Comment #2 from pinskia at gcc dot gnu dot org 2007-06-27 16:25 --- PowerPC970FX is not a direct descendent of Power5. It is a descendent of the 970 which is a heavily modified Power4. Power5 is the direct descendent of the Power4 though, at least in terms of scheduling (I don't know if in terms of the hardware itself). So at best they are siblings rather than descendents of one another. The main thing is that you turned off the first scheduling pass which is before the register allocator so I think the case is the register allocator is messing up (which is already known). The other thing is what options are you using to invoke GCC with? Power5 support inside GCC was not added until at least 3.4 (maybe it was 4.0). -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Component|c |target http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523
[Bug target/32523] disastrous scheduling for POWER5
--- Comment #3 from pinskia at gcc dot gnu dot org 2007-06-27 16:27 --- I have been trying to install gcc 4.2 on PowerPC970FX, but so far no luck (it doesn't seem to like MacOSX). I have no problems installing GCC on Mac OS X 10.4.8/9/10. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523
[Bug target/32523] disastrous scheduling for POWER5
--- Comment #5 from pinskia at gcc dot gnu dot org 2007-06-27 17:05 --- Well the 3.3.3 you are using is a heavy modified 3.3.3 which has the power5 backported and many other stuff. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Component|c |target http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523
[Bug target/32523] disastrous scheduling for POWER5
--- Comment #6 from whaley at cs dot utsa dot edu 2007-06-27 19:09 --- Andrew, OK, I installed stock gnu gcc 3.4.6: 78n04 TEST/MMBENCH_PPC ~/local/gcc-3.4.6/bin/gcc -v Reading specs from /u/noibm122/local/gcc-3.4.6/lib/gcc/powerpc64-unknown-linux-gnu/3.4.6/specs Configured with: ../configure --prefix=/u/noibm122/local/gcc-3.4.6 --enable-languages=c Thread model: posix gcc version 3.4.6 and I get the exact same behavior as with the modified gcc 3 (it accepts the power5 flags and everything). So, it would seem something that used to work in the stock gcc is now broken . . . Thanks, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523
[Bug target/32523] disastrous scheduling for POWER5
--- Comment #7 from whaley at cs dot utsa dot edu 2007-06-28 05:25 --- This problem affects the g5/970 as well: Darwin. uname -a Darwin etl-g52.cs.utsa.edu 8.10.0 Darwin Kernel Version 8.10.0: Wed May 23 16:50:59 PDT 2007; root:xnu-792.21.3~1/RELEASE_PPC Power Macintosh powerpc Darwin. make all /usr/bin/gcc-3.3 -DREPS=1000 -DWALL -O3 -c mmbench.c /usr/bin/gcc-3.3 -DREPS=1000 -DWALL -O3 -c dgemm_atlas.c /usr/bin/gcc-3.3 -DREPS=1000 -DWALL -O3 -o xdmm_gcc3 mmbench.o dgemm_atlas.o rm -f *.o /Users/whaley/local/gcc-4.2/bin/gcc -DREPS=1000 -DWALL -mcpu=970 -mtune=970 -O3 -m64 -c mmbench.c /Users/whaley/local/gcc-4.2/bin/gcc -DREPS=1000 -DWALL -mcpu=970 -mtune=970 -O3 -m64 -c dgemm_atlas.c /Users/whaley/local/gcc-4.2/bin/gcc -DREPS=1000 -DWALL -mcpu=970 -mtune=970 -O3 -m64 -o xdmm_gcc4 mmbench.o dgemm_atlas.o rm -f *.o /Users/whaley/local/gcc-4.2/bin/gcc -DREPS=1000 -DWALL -mcpu=970 -mtune=970 -O3 -m64 -c mmbench.c /Users/whaley/local/gcc-4.2/bin/gcc -DREPS=1000 -DWALL -mcpu=970 -mtune=970 -O3 -m64 -fno-schedule-insns -fno-rerun-loop-opt -c \ dgemm_atlas.c /Users/whaley/local/gcc-4.2/bin/gcc -DREPS=1000 -DWALL -mcpu=970 -mtune=970 -O3 -m64 -o xdmm_gcc4_nosched mmbench.o dgemm_atlas.o rm -f *.o echo GCC 3.x performance: GCC 3.x performance: ./xdmm_gcc3 ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 40 1000 0.021 6212.39 echo GCC 4.2 performance: GCC 4.2 performance: ./xdmm_gcc4 ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 40 1000 0.026 4905.34 echo GCC 4.2 w/o scheduling performance: GCC 4.2 w/o scheduling performance: ./xdmm_gcc4_nosched ALGORITHM NB REPSTIME MFLOPS = = = == == atlasmm 40 1000 0.020 6291.78 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523