[Bug bootstrap/37632] Darwin bootstrap failure, ld: bl out of range
--- Comment #11 from lucier at math dot purdue dot edu 2010-04-21 01:17 --- Thank you for your way to build a 64-bit gcc, it has now worked for me using Apple's gcc-4.0.1 as you say, so I'll close this bug as WORKSFORME. Brad -- lucier at math dot purdue dot edu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||WORKSFORME http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37632
[Bug bootstrap/37632] Darwin bootstrap failure, ld: bl out of range
--- Comment #10 from lucier at math dot purdue dot edu 2010-04-12 13:17 --- Subject: Re: Darwin bootstrap failure, ld: bl out of range On Sun, 2010-04-11 at 10:29 +, iains at gcc dot gnu dot org wrote: 2. As a matter of curiosity - do you see a big improvement in performance from building gcc 64bit? I normally build ppc-apple-darwin9 - since this is quite capable of generating m64 code should I have an app that requires it. I build a 64-bit gcc so that I can compile codes that require gcc to use more than 4GB of memory. It will take me a day or two before I can get back to your other comments. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37632
[Bug bootstrap/37632] Darwin bootstrap failure, ld: bl out of range
--- Comment #4 from lucier at math dot purdue dot edu 2010-04-10 20:43 --- I can't get it to bootstrap with the following: [monster-mac:~/programs/gcc/gcc-4_4-branch] lucier% cat build-gcc #!/bin/tcsh /bin/rm -rf *; ../../gcc-4_4-branch/configure CC='/pkgs/gcc-4.3.2-64/bin/gcc -mcpu=970 -m64' --build=powerpc64-apple-darwin9.8.0 --host=powerpc64-apple-darwin9.8.0 --target=powerpc64-apple-darwin9.8.0 --prefix=/pkgs/gcc-4.4.4-64 --with-libiconv-prefix=/usr --with-system-zlib; make bootstrap BOOT_LDFLAGS='-Wl,-search_paths_first' build.log (make install) (make -k -j 8 check RUNTESTFLAGS=--target_board 'unix{-mcpu=970/-m64}' check.log ; make mail-report.log) The error is checking for flex... flex checking lex output file root... configure: error: cannot find output from flex; giving up make[2]: *** [configure-stage1-gmp] Error 1 make[1]: *** [stage1-bubble] Error 2 make: *** [bootstrap] Error 2 And I get the same error if I use your configure line. So I can't reproduce this working with [monster-mac:~/programs/gcc/gcc-4_4-branch] lucier% head LAST_UPDATED gcc/BASE-VER == LAST_UPDATED == Sat Apr 10 16:26:49 EDT 2010 Sat Apr 10 20:26:49 UTC 2010 (revision 158195) == gcc/BASE-VER == 4.4.4 and with in-source gmp, mpfr, and mpc directories. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37632
[Bug bootstrap/37632] Darwin bootstrap failure, ld: bl out of range
--- Comment #6 from lucier at math dot purdue dot edu 2010-04-10 21:18 --- I wrote And I get the same error if I use your configure line. which means using gcc-4.0.1; I used *exactly* your configure line. Did you have the gmp and mpfr sources in the gcc-4_4-branch source directory? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37632
[Bug tree-optimization/26854] [4.3/4.4/4.5 Regression] Inordinate compile times on large routines
--- Comment #117 from lucier at math dot purdue dot edu 2010-03-27 16:38 --- Subject: Re: [4.3/4.4/4.5 Regression] Inordinate compile times on large routines On Mar 27, 2010, at 7:14 AM, rguenth at gcc dot gnu dot org wrote: I wonder if the parsing numbers are accurate as the initial report has like 9s parsing while the current ones are 200s. Can you explain that difference? (like, were you testing different source?) Yes, different source (compiler.i instead of all.i), different (faster) machine. Perhaps gathering the detailed memory stats affect the parser time. Here are times for the original source file all.i using the same machine and compiler as in the immediately previous report for compiler.i: df liveinitialized regs: 45.00 ( 8%) usr 0.00 ( 0%) sys 45.04 ( 8%) wall 0 kB ( 0%) ggc parser: 19.60 ( 3%) usr 1.22 ( 7%) sys 21.25 ( 4%) wall 70217 kB ( 2%) ggc scheduling: 301.86 (52%) usr 0.00 ( 0%) sys 301.87 (51%) wall8739 kB ( 0%) ggc TOTAL : 579.8817.55 597.653393985 kB Glancing at top, the maximum reported memory usage was 13GB. I'll attach the detailed results for all.i next As is the testcase(s) are an interesting source of information - maybe we should gather those up on a page in the wiki just in case we end up closing this bug at some point (I suggest not to at the moment, the parsing times look odd and 20GB memory use doesn't sound reasonable). Did you ever test other compilers and see how they perform with respect to memory usage and compile time? No, none that were not a gcc derivative. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug tree-optimization/26854] [4.3/4.4/4.5 Regression] Inordinate compile times on large routines
--- Comment #118 from lucier at math dot purdue dot edu 2010-03-27 16:44 --- Created an attachment (id=20224) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20224action=view) time/memory report compiling all.i with -O3 These are the detailed time and memory statistics reported when compiling all.i with -O3 -fschedule-insns on x86-64. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug tree-optimization/26854] [4.3/4.4/4.5 Regression] Inordinate compile times on large routines
--- Comment #113 from lucier at math dot purdue dot edu 2010-03-27 04:27 --- Created an attachment (id=20220) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20220action=view) time/mem report compiling compiler.i This is the time and detailed memory report for 20100302 compiling compiler.i above with main optimization options -O1 -fschedule-insns2 (precise command line and configuration options are given at the top of the file). With these optimization levels cpu time and memory don't look too bad to me. The main routines are parser: 320.93 (59%) usr 1.40 (27%) sys 322.62 (59%) wall 103143 kB (15%) ggc tree CFG cleanup : 73.43 (14%) usr 0.01 ( 0%) sys 73.46 (13%) wall 1388 kB ( 0%) ggc Nothing else is above 3%. I'm building today's gcc on an X86-64 RHEL5 machine with more memory to test with -O3 -fschedule-insns, as this set of options now gives about 20% speedup on some of my codes of this type. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug tree-optimization/26854] [4.3/4.4/4.5 Regression] Inordinate compile times on large routines
--- Comment #114 from lucier at math dot purdue dot edu 2010-03-27 04:59 --- Created an attachment (id=20221) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20221action=view) time/mem report compiling compiler.i This is the time and detailed memory report for compiling compiler.i with today's gcc and optimization level -O3 -fschedule-insns. Again, the detailed configuration information and command line are contained at the beginning of the file. Except for taking 20GB of RAM, this doesn't look too bad, either. The passes taking the most time are: parser: 222.18 (21%) usr 2.95 (11%) sys 225.37 (21%) wall 103148 kB (11%) ggc tree CFG cleanup : 63.67 ( 6%) usr 0.00 ( 0%) sys 63.60 ( 6%) wall 2467 kB ( 0%) ggc scheduling: 394.04 (37%) usr 0.00 ( 0%) sys 394.04 (36%) wall 5824 kB ( 1%) ggc TOTAL :1056.6926.47 1083.41 916872 kB -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug tree-optimization/26854] [4.3/4.4/4.5 Regression] Inordinate compile times on large routines
--- Comment #115 from lucier at math dot purdue dot edu 2010-03-27 05:20 --- Created an attachment (id=20222) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20222action=view) time/mem report compiling compiler.i with -O1 Here is the time and memory report with -O1 -fschedule-insns2 on the same machine as the -O3 -fschedule-insns report. The biggest times are: parser: 224.89 (54%) usr 2.61 (24%) sys 226.97 (53%) wall 103148 kB (15%) ggc tree CFG cleanup : 60.61 (15%) usr 0.00 ( 0%) sys 60.58 (14%) wall 1388 kB ( 0%) ggc reload: 19.17 ( 5%) usr 0.00 ( 0%) sys 19.17 ( 5%) wall 4694 kB ( 1%) ggc TOTAL : 413.2910.95 424.28 709657 kB -- lucier at math dot purdue dot edu changed: What|Removed |Added Attachment #20220|0 |1 is obsolete|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug bootstrap/42002] Bootstrap failure: ld doesn't find 64-bit libelf on Fedora 11
--- Comment #2 from lucier at math dot purdue dot edu 2009-11-11 13:52 --- Thanks a lot for the explanation! I'm looking through the list of packages on Fedora with elfutils in the title; there is no elfutils-libelf-devel.ppc64, but the only ppc64 packages I can find are elfutils-devel-0.142-1.fc11 (ppc64) with file list /usr/include/dwarf.h /usr/include/elfutils /usr/include/elfutils/elf-knowledge.h /usr/include/elfutils/libasm.h /usr/include/elfutils/libdw.h /usr/include/elfutils/libdwfl.h /usr/include/elfutils/libebl.h /usr/include/elfutils/version.h /usr/lib64/libasm.so /usr/lib64/libdw.so /usr/lib64/libebl.a and elfutils-libelf-0.142-1.fc11 (ppc64) with file list /usr/lib64/libelf-0.142.so /usr/lib64/libelf.so.1 So I put in the link from libelf.so to libelf.so.1 by hand and the bootstrap is proceeding. Should I file a bug report with Fedora? I was told Fedora 12 won't support ppc64, so maybe there's no point. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42002
[Bug bootstrap/42002] New: Bootstrap failure: ld doesn't find 64-bit libelf on Fedora 11
I configured today's mainline with ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c,c++ --enable-stage1-languages=c,c++ --with-cpu=default64 --enable-checking=release and bootstrap fails with /home/lucier/programs/gcc/objdirs/mainline/./prev-gcc/xgcc -B/home/lucier/programs/gcc/objdirs/mainline/./prev-gcc/ -B/pkgs/gcc-mainline/powerpc64-unknown-linux-gnu/bin/ -B/pkgs/gcc-mainline/powerpc64-unknown-linux-gnu/bin/ -B/pkgs/gcc-mainline/powerpc64-unknown-linux-gnu/lib/ -isystem /pkgs/gcc-mainline/powerpc64-unknown-linux-gnu/include -isystem /pkgs/gcc-mainline/powerpc64-unknown-linux-gnu/sys-include -g -O2 -gtoggle -DIN_GCC -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -Wold-style-definition -Wc++-compat -DHAVE_CONFIG_H -o cc1-dummy c-lang.o stub-objc.o attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o c-aux-info.o c-common.o c-opts.o c-format.o c-semantics.o c-ppoutput.o c-cppbuiltin.o c-objc-common.o c-dump.o c-pch.o c-parser.o rs6000-c.o c-gimplify.o tree-mudflap.o c-pretty-print.o c-omp.o \ dummy-checksum.o main.o libbackend.a ../libcpp/libcpp.a ../libdecnumber/libdecnumber.a ../libcpp/libcpp.a ../libiberty/libiberty.a ../libdecnumber/libdecnumber.a -L/home/lucier/programs/gcc/objdirs/mainline/./gmp/.libs -L/home/lucier/programs/gcc/objdirs/mainline/./gmp/_libs -L/home/lucier/programs/gcc/objdirs/mainline/./mpfr/.libs -L/home/lucier/programs/gcc/objdirs/mainline/./mpfr/_libs -lmpfr -lgmp -rdynamic -ldl -L../zlib -lz -lelf /usr/bin/ld: skipping incompatible /usr/lib/libelf.so when searching for -lelf /usr/bin/ld: cannot find -lelf collect2: ld returned 1 exit status The object files are 64-bit: [luc...@lambda-head mainline]$ file gcc/rs6000-c.o gcc/rs6000-c.o: ELF 64-bit MSB relocatable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), not stripped and a 64-bit libelf is installed: [luc...@lambda-head mainline]$ file /usr/lib64/libelf* /usr/lib64/libelf-0.142.so: ELF 64-bit MSB shared object, 64-bit PowerPC or cisco 7500, version 1 (SYSV), dynamically linked, stripped /usr/lib64/libelf.so.1: symbolic link to `libelf-0.142.so' but I don't know why it isn't being found. -- Summary: Bootstrap failure: ld doesn't find 64-bit libelf on Fedora 11 Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lucier at math dot purdue dot edu GCC build triplet: powerpc64-unknown-linux-gnu/ GCC host triplet: powerpc64-unknown-linux-gnu GCC target triplet: powerpc64-unknown-linux-gnu/ http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42002
[Bug bootstrap/40968] [4.5 Regression] ICE when compiling O2g.gch; problem with --enable-gather-detailed-mem-stats
--- Comment #4 from lucier at math dot purdue dot edu 2009-11-10 00:28 --- This is fixed, at least by the time of gcc version 4.5.0 20091109 (experimental) [trunk revision 154037] (GCC) -- lucier at math dot purdue dot edu changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40968
[Bug rtl-optimization/41891] [4.5 Regression] ICE in move_loop_invariants
--- Comment #3 from lucier at math dot purdue dot edu 2009-11-01 23:55 --- This one works: frying-pan:~/programs/gambc-v4_5_2-devel /pkgs/gcc-mainline/bin/gcc -march=core2 -msse4 -save-temps -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -c -o _io.o _io.i frying-pan:~/programs/gambc-v4_5_2-devel /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-checking=release Thread model: posix gcc version 4.5.0 20091014 (experimental) [trunk revision 152748] (GCC) This one fails: frying-pan:~/programs/gambc-v4_5_2-devel /pkgs/gcc-mainline/bin/gcc -march=core2 -msse4 -save-temps -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -c -o _io.o _io.i _io.i: In function â: _io.i:15174:1: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. frying-pan:~/programs/gambc-v4_5_2-devel /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-checking=release Thread model: posix gcc version 4.5.0 20091015 (experimental) [trunk revision 152797] (GCC) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41891
[Bug c/41891] New: ICE in move_loop_invariants
With this compiler frying-pan:~/programs/gambc-v4_5_2-devel /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-checking=release --enable-languages=c Thread model: posix gcc version 4.5.0 20091031 (experimental) [trunk revision 153773] (GCC) I get an ICE: frying-pan:~/programs/gambc-v4_5_2-devel /pkgs/gcc-mainline/bin/gcc -march=core2 -msse4 -save-temps -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -c -o _io.o _io.i _io.i: In function ___H__23__23_read_2d_six_2d_datum_2d_or_2d_eof: _io.i:15174:1: internal compiler error: Segmentation fault In gdb I get frying-pan:~/programs/gambc-v4_5_2-devel gdb /pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/cc1 GNU gdb (GDB) 7.0-ubuntu Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-linux-gnu. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/cc1...done. (gdb) run -march=core2 -msse4 -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp _io.i Starting program: /pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/cc1 -march=core2 -msse4 -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp _io.i btowc wctob mbrlen __signbitf __signbit __signbitl ___H__23__23_read_2d_six_2d_datum_2d_or_2d_eof Analyzing compilation unit Performing interprocedural optimizations visibility early_local_cleanups whole-program inline static-var pure-constAssembling functions: ___H__23__23_read_2d_six_2d_datum_2d_or_2d_eof Program received signal SIGSEGV, Segmentation fault. bitmap_clear (head=0x78) at ../../../mainline/gcc/bitmap.c:297 297 if (head-first) (gdb) where #0 bitmap_clear (head=0x78) at ../../../mainline/gcc/bitmap.c:297 #1 0x00622c78 in free_loop_data () at ../../../mainline/gcc/loop-invariant.c:1568 #2 move_loop_invariants () at ../../../mainline/gcc/loop-invariant.c:1906 #3 0x006206d7 in rtl_move_loop_invariants () at ../../../mainline/gcc/loop-init.c:254 #4 0x006544f0 in execute_one_pass (pass=0xf8fc60) at ../../../mainline/gcc/passes.c:1518 #5 0x00654705 in execute_pass_list (pass=0xf8fc60) at ../../../mainline/gcc/passes.c:1567 #6 0x00654717 in execute_pass_list (pass=0xf8fb40) at ../../../mainline/gcc/passes.c:1568 #7 0x00654717 in execute_pass_list (pass=0x1010d60) at ../../../mainline/gcc/passes.c:1568 #8 0x007263dc in tree_rest_of_compilation (fndecl=0x7713fe00) at ../../../mainline/gcc/tree-optimize.c:392 #9 0x00851b7c in cgraph_expand_function (node=0x7713fd00) at ../../../mainline/gcc/cgraphunit.c:1160 #10 0x00853485 in cgraph_expand_all_functions () at ../../../mainline/gcc/cgraphunit.c:1219 #11 cgraph_optimize () at ../../../mainline/gcc/cgraphunit.c:1465 #12 0x0085383f in cgraph_finalize_compilation_unit () at ../../../mainline/gcc/cgraphunit.c:1089 #13 0x0048e45b in c_write_global_declarations () at ../../../mainline/gcc/c-decl.c:9489 #14 0x006e98ac in compile_file (argc=15, argv=0x7fffe5d8) at ../../../mainline/gcc/toplev.c:1061 #15 do_compile (argc=15, argv=0x7fffe5d8) at ../../../mainline/gcc/toplev.c:2408 #16 toplev_main (argc=15, argv=0x7fffe5d8) at ../../../mainline/gcc/toplev.c:2450 #17 0x773d8abd in __libc_start_main () from /lib/libc.so.6 #18 0x0047af09 in _start () at ../sysdeps/x86_64/elf/start.S:113 (gdb) print head $1 = (bitmap) 0x78 I'll add the (unfortunately very long) input file as an attachment. Brad -- Summary: ICE in move_loop_invariants Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lucier at math dot purdue dot edu GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41891
[Bug c/41891] ICE in move_loop_invariants
--- Comment #1 from lucier at math dot purdue dot edu 2009-10-31 16:56 --- Created an attachment (id=18942) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18942action=view) test case This is the test case. BTW, this works in 4.4.1. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41891
[Bug middle-end/41891] ICE in move_loop_invariants
--- Comment #2 from lucier at math dot purdue dot edu 2009-10-31 17:32 --- There is no ICE with heine:~/Desktop /pkgs/gcc-mainline/bin/gcc -vUsing built-in specs. COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline --enable-languages=c --disable-multilib Thread model: posix gcc version 4.5.0 20091005 (experimental) [trunk revision 152459] (GCC) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41891
[Bug bootstrap/40968] [4.5 Regression] ICE when compiling O2g.gch; problem with --enable-gather-detailed-mem-stats
--- Comment #3 from lucier at math dot purdue dot edu 2009-10-06 00:51 --- Now I'm getting comparison errors with [trunk revision 152459] and the same configuration: Comparing stages 2 and 3 warning: gcc/cc1plus-checksum.o differs warning: gcc/cc1-checksum.o differs Bootstrap comparison failure! x86_64-unknown-linux-gnu/libstdc++-v3/src/basic_file.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/future.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/basic_file.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/future.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/pool_allocator.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/debug.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/mt_allocator.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/locale_init.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/atomic.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/system_error.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/locale.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/pool_allocator.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/debug.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/mt_allocator.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/locale_init.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/atomic.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/system_error.o differs x86_64-unknown-linux-gnu/libstdc++-v3/src/locale.o differs x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/eh_alloc.o differs x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/vec.o differs x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/eh_globals.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/basic_file.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/future.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/basic_file.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/future.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/pool_allocator.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/debug.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/mt_allocator.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/locale_init.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/atomic.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/system_error.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/locale.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/pool_allocator.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/debug.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/mt_allocator.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/locale_init.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/atomic.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/system_error.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/src/locale.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/libsupc++/guard.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/libsupc++/eh_alloc.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/libsupc++/vec.o differs x86_64-unknown-linux-gnu/32/libstdc++-v3/libsupc++/eh_globals.o differs -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40968
[Bug target/41531] -O1 -fschedule-insns swscale error
--- Comment #3 from lucier at math dot purdue dot edu 2009-10-01 13:19 --- This is not the same problem as 24319. Vlad thinks he fixed 24319, and indeed the problem in this bug report from 4.4 is gone. The reported problem in 4.5 is different. Don't turn 234319 into a grab bag of any problem that arises when using -fschedule-insns. And, again, I can't reopen this bug. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41531
[Bug target/41176] ICE in reload_cse_simplify_operands at postreload.c:396
--- Comment #5 from lucier at math dot purdue dot edu 2009-10-01 19:43 --- No ICE with 4.3.3, either, but there is an ICE with Target: ppc64-redhat-linux gcc version 4.4.1 20090725 (Red Hat 4.4.1-2) (GCC) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41176
[Bug rtl-optimization/24319] [4.3/4.4/4.5 regression] amd64 register spill error with -fschedule-insns
--- Comment #23 from lucier at math dot purdue dot edu 2009-09-03 18:04 --- The gprof output on the _num.i example, with and without -fschedule-insns is at http://www.math.purdue.edu/~lucier/bugzilla/11/gprof.out-fschedule-insns.gz http://www.math.purdue.edu/~lucier/bugzilla/11/gprof.out-fnoschedule-insns.gz -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24319
[Bug rtl-optimization/24319] [4.3/4.4/4.5 regression] amd64 register spill error with -fschedule-insns
--- Comment #20 from lucier at math dot purdue dot edu 2009-09-02 16:52 --- Vlad: Thank you for your reply. The times I reported are for -fschedule-insns without -fpressure-sched. The times with the addition of -fpressure-sched are not much greater than with -fschedule-insns by itself: With -fschedule-insns scheduling: 22.89 (41%) usr 0.02 ( 2%) sys 22.93 (40%) wall 2125 kB ( 1%) ggc integrated RA : 9.15 (16%) usr 0.06 ( 6%) sys 9.21 (16%) wall 5488 kB ( 3%) ggc scheduling 2 : 0.60 ( 1%) usr 0.00 ( 0%) sys 0.62 ( 1%) wall 422 kB ( 0%) ggc TOTAL : 55.67 0.9356.66 180793 kB with -fschedule-insns -fsched-pressure scheduling: 23.31 (42%) usr 0.02 ( 2%) sys 23.36 (41%) wall 2125 kB ( 1%) ggc integrated RA : 9.18 (16%) usr 0.04 ( 4%) sys 9.22 (16%) wall 5517 kB ( 3%) ggc scheduling 2 : 0.58 ( 1%) usr 0.01 ( 1%) sys 0.58 ( 1%) wall 251 kB ( 0%) ggc TOTAL : 55.77 1.0056.89 179606 kB and with neither -fschedule-insns nor -fsched-pressure: integrated RA : 6.40 (21%) usr 0.05 ( 5%) sys 6.41 (21%) wall 5087 kB ( 3%) ggc scheduling 2 : 0.58 ( 2%) usr 0.01 ( 1%) sys 0.60 ( 2%) wall 244 kB ( 0%) ggc TOTAL : 29.84 0.9830.83 176587 kB So pre--register allocation instruction scheduling even without the new register pressure--aware algorithm takes quite a bit of time. I'll try to build a profiled gcc, and then if I find something I'll put it in a new PR. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24319
[Bug rtl-optimization/24319] [4.3/4.4/4.5 regression] amd64 register spill error with -fschedule-insns
--- Comment #22 from lucier at math dot purdue dot edu 2009-09-02 17:24 --- The output of gprof on this example is at http://www.math.purdue.edu/~lucier/bugzilla/11/gprof.out.gz Everything that takes more than a second is Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls s/call s/call name 10.73 4.45 4.4515565 0.00 0.00 pop_scope 7.28 7.47 3.02 314259938 0.00 0.00 free_list 7.04 10.39 2.92 5575 0.00 0.00 dfs_enumerate_from 5.62 12.72 2.33 314988148 0.00 0.00 alloc_INSN_LIST 5.28 14.91 2.19 5292 0.00 0.00 get_loop_exit_edges 5.14 17.04 2.13 331244515 0.00 0.00 bitmap_set_bit 3.28 18.40 1.36 135329 0.00 0.00 sched_analyze_insn 3.09 19.68 1.2829650 0.00 0.00 free_deps 2.75 20.82 1.14 21773210 0.00 0.00 bitmap_bit_p 2.35 21.80 0.98 14093247 0.00 0.00 dominated_by_p 1.99 22.62 0.83 5357385 0.00 0.00 bitmap_ior_into 1.88 23.40 0.78 199 0.00 0.00 inverted_post_order_compute 1.57 24.05 0.65 342 0.00 0.01 df_worklist_dataflow 1.37 24.62 0.57 51278357 0.00 0.00 decl_jump_unsafe 1.35 25.18 0.56 26181017 0.00 0.00 flow_bb_inside_loop_p 1.13 25.65 0.47 201 0.00 0.00 post_order_compute Nothing immediate jumps out at me. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24319
[Bug target/41176] ICE in reload_cse_simplify_operands at postreload.c:396
--- Comment #2 from lucier at math dot purdue dot edu 2009-09-03 02:37 --- I thought Vlad's scheduling/register allocation patch here http://gcc.gnu.org/ml/gcc-patches/2009-09/msg3.html which solves PR24319, might fix this problem, but it does not. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41176
[Bug rtl-optimization/24319] [4.3/4.4/4.5 regression] amd64 register spill error with -fschedule-insns
--- Comment #18 from lucier at math dot purdue dot edu 2009-09-02 02:54 --- Vlad: The patch works great in my tests so far, thanks. After installing your patch on today's trunk so that -fschedule-insns actually works, I find it is quite expensive on large files. For example, with today's trunk with your patches applied, for the file http://www.math.purdue.edu/~lucier/bugzilla/8/_num.i.gz and the options /pkgs/gcc-mainline-schedule/bin/gcc -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -ftime-report -c _num.i total CPU time on my x86-64 box is TOTAL : 29.60 0.9230.54 176587 kB while with -fschedule-insns it is scheduling: 23.03 (42%) usr 0.02 ( 2%) sys 23.07 (41%) wall 2125 kB ( 1%) ggc TOTAL : 55.47 1.0356.57 180793 kB I don't know whether you can make it go faster now, or whether that's unreasonable and I should just wait and file another PR. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24319
[Bug rtl-optimization/24319] [4.3/4.4/4.5 regression] amd64 register spill error with -fschedule-insns
--- Comment #16 from lucier at math dot purdue dot edu 2009-08-28 16:54 --- Re: Comment 7: Since end users will gain little benefit from being able to run the sched1 pass on x86 code, I don't think this is a serious problem. PR33928 (comments 108 and 111) give an example where -fschedule-insns on x64-64 gives a 14% speedup on some direct and inverse FFT codes, certainly not a trivial difference. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24319
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #111 from lucier at math dot purdue dot edu 2009-08-27 17:02 --- I can compile gambit 4.1.2 with -fschedule-insns except for the function noted in PR41164. On model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz with gcc version 4.5.0 20090803 (experimental) [trunk revision 150373] (GCC) the times with -fschedule-insns are (time (direct-fft-recursive-4 a table)) 144 ms cpu time (144 user, 0 system) (time (inverse-fft-recursive-4 a table)) 136 ms cpu time (136 user, 0 system) and the times without -fschedule-insns are (time (direct-fft-recursive-4 a table)) 168 ms cpu time (168 user, 0 system) (time (inverse-fft-recursive-4 a table)) 172 ms cpu time (172 user, 0 system) That's a pretty big improvement. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug target/41176] New: ICE in reload_cse_simplify_operands at postreload.c:396
with this compiler: [luc...@lambda-head lib]$ /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: powerpc64-unknown-linux-gnu Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c,c++ --enable-stage1-languages=c,c++ --with-cpu=default64 Thread model: posix gcc version 4.5.0 20090825 (experimental) [trunk revision 151108] (GCC) and this command line /pkgs/gcc-mainline/libexec/gcc/powerpc64-unknown-linux-gnu/4.5.0/cc1 -fpreprocessed thread.i -quiet -mcpu=970 -m64 -O1 -Wno-unused -version -fschedule-insns -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common I get the following error: thread.i: In function ___H_make_2d_thread: thread.i:719:1: error: insn does not satisfy its constraints: (insn 625 411 219 26 thread.i:625 (set (reg:DF 19 19) (mem:DF (plus:DI (reg:DI 22 22 [orig:197 D.3836 ] [197]) (const_int 23 [0x17])) [0 S8 A64])) 357 {*movdf_hardfloat64} (nil)) thread.i:719:1: internal compiler error: in reload_cse_simplify_operands, at postreload.c:396 I apologize in advance for the size of the test case, which I will post next. -- Summary: ICE in reload_cse_simplify_operands at postreload.c:396 Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lucier at math dot purdue dot edu GCC build triplet: powerpc64-unknown-linux-gnu GCC host triplet: powerpc64-unknown-linux-gnu GCC target triplet: powerpc64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41176
[Bug target/41176] ICE in reload_cse_simplify_operands at postreload.c:396
--- Comment #1 from lucier at math dot purdue dot edu 2009-08-27 00:14 --- Created an attachment (id=18431) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18431action=view) preprocessed source file I'm not having much luck cutting this down more, sorry. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41176
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #108 from lucier at math dot purdue dot edu 2009-08-27 01:18 --- direct.c contains a direct FFT; I've compiled the direct and inverse fft and I ran it on arrays with 2^23 double-precision complex elements and heine:~/programs/gcc/objdirs/bench-mainline-on-fft /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline --enable-languages=c,c++ -enable-stage1-languages=c,c++ Thread model: posix gcc version 4.5.0 20090803 (experimental) [trunk revision 150373] (GCC) The compile options were /pkgs/gcc-mainline/bin/gcc -save-temps -c -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -rdynamic -shared -fschedule-insns and the same without -fschedule-insns. The runtime for direct+inverse FFT with instruction scheduling was 1.264 seconds and the time for direct+inverse FFT without -fschedule-insns was 1.444 seconds, which is a 14% speedup for that one compiler option. This is on a 2.33GHz Core 2 quad machine. I'll attach the inner loops of direct.c with and with -fschedule-insns. I haven't been able to compile the complete Gambit runtime with -fschedule-insns on either x86-64 or ppc64; I've filed PR41164 and PR41176 for those two different failures. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #109 from lucier at math dot purdue dot edu 2009-08-27 01:22 --- Created an attachment (id=18432) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18432action=view) inner loop of direct.c with -fschedule-insns -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #110 from lucier at math dot purdue dot edu 2009-08-27 01:22 --- Created an attachment (id=18433) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18433action=view) inner loop of direct.c without -fschedule-insns -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug rtl-optimization/41164] New: Unable to find spill register
With this compiler: heine:~/programs/gambc-v4_5_1-devel /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline --enable-languages=c,c++ -enable-stage1-languages=c,c++ Thread model: posix gcc version 4.5.0 20090803 (experimental) [trunk revision 150373] (GCC) with this command: heine:~/programs/gambc-v4_5_1-devel/lib /pkgs/gcc-mainline/bin/gcc -fschedule-insns -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -save-temps -c os_test.i fails with this error: os_base.c: In function ___os_err_code_to_string: os_base.c:1247:1: error: unable to find a register to spill in class DREG os_base.c:1247:1: error: this is the insn: (insn 264 280 266 41 os_test.i:158 (parallel [ (set (reg:SI 37 r8 [133]) (truncate:SI (lshiftrt:DI (mult:DI (sign_extend:DI (reg:SI 2 cx [130])) (sign_extend:DI (reg:SI 6 bp [185]))) (const_int 32 [0x20] (clobber (scratch:SI)) (clobber (reg:CC 17 flags)) ]) 347 {*smulsi3_highpart_insn} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (truncate:SI (lshiftrt:DI (mult:DI (sign_extend:DI (reg:SI 2 cx [130])) (const_int 1717986919 [0x6667])) (const_int 32 [0x20]))) (nil os_base.c:1247: confused by earlier errors, bailing out I'll add the .i file next. What's interesting is that I get similar errors with heine:~/programs/gambc-v4_5_1-devel/lib gcc -v Using built-in specs. Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.3.3-5ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) heine:~/programs/gambc-v4_5_1-devel/lib /pkgs/gcc-4.4-branch/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../gcc-4.4-branch/configure --prefix=/pkgs/gcc-4.4-branch --enable-languages=c --enable-checking=release --disable-multilib Thread model: posix gcc version 4.4.1 20090522 (prerelease) (GCC) and heine:~/programs/gambc-v4_5_1-devel/lib /pkgs/gcc-4.2.4/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../gcc-4.2.4/configure --prefix=/pkgs/gcc-4.2.4 --enable-languages=c --enable-checking=release --disable-multilib Thread model: posix gcc version 4.2.4 -- Summary: Unable to find spill register Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lucier at math dot purdue dot edu GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41164
[Bug rtl-optimization/41164] Unable to find spill register
--- Comment #1 from lucier at math dot purdue dot edu 2009-08-25 14:57 --- Created an attachment (id=18423) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18423action=view) test file that illustrates failure -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41164
[Bug libstdc++/40968] New: ICE including fenv.h when compiling O2g.gch
with this compiler: Mon Aug 3 16:57:15 UTC 2009 (revision 150373) with this configure and build: /bin/rm -rf *; ../../mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline-mem-stats --enable-languages=c,c++ --enable-gather-detailed-mem-stats -enable-stage1-languages=c,c++; make -j 6 bootstrap build.log bootstrap fails with /home/lucier/programs/gcc/objdirs/mainline/./gcc/xgcc -shared-libgcc -B/home/lucier/programs/gcc/objdirs/mainline/./gcc -nostdinc++ -L/home/lucier/programs/gcc/objdirs/mainline/x86_64-unknown-linux-gnu/32/libstdc++-v3/src -L/home/lucier/programs/gcc/objdirs/mainline/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs -B/pkgs/gcc-mainline-mem-stats/x86_64-unknown-linux-gnu/bin/ -B/pkgs/gcc-mainline-mem-stats/x86_64-unknown-linux-gnu/lib/ -isystem /pkgs/gcc-mainline-mem-stats/x86_64-unknown-linux-gnu/include -isystem /pkgs/gcc-mainline-mem-stats/x86_64-unknown-linux-gnu/sys-include -m32 -x c++-header -D_GNU_SOURCE -m32 -I/home/lucier/programs/gcc/objdirs/mainline/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/x86_64-unknown-linux-gnu -I/home/lucier/programs/gcc/objdirs/mainline/x86_64-unknown-linux-gnu/32/libstdc++-v3/include -I/home/lucier/programs/gcc/mainline/libstdc++-v3/libsupc++ -O2 -g /home/lucier/programs/gcc/mainline/libstdc++-v3/include/precompiled/stdtr1c++.h -o x86_64-unknown-linux-gnu/bits/stdtr1c++.h.gch/O2g.gch In file included from /home/lucier/programs/gcc/objdirs/mainline/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/tr1/cfenv:36:0, from /home/lucier/programs/gcc/mainline/libstdc++-v3/include/precompiled/stdtr1c++.h:33: /home/lucier/programs/gcc/objdirs/mainline/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/fenv.h:32:9: internal compiler error: Segmentation fault I'm sorry, but I don't really know how to go further in diagnosing this. -- Summary: ICE including fenv.h when compiling O2g.gch Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lucier at math dot purdue dot edu GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40968
[Bug libstdc++/40968] ICE when compiling O2g.gch; problem with --enable-gather-detailed-mem-stats
--- Comment #1 from lucier at math dot purdue dot edu 2009-08-04 23:15 --- bootstrap completes without --enable-gather-detailed-mem-stats -- lucier at math dot purdue dot edu changed: What|Removed |Added Summary|ICE including fenv.h when |ICE when compiling O2g.gch; |compiling O2g.gch |problem with --enable- ||gather-detailed-mem-stats http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40968
[Bug bootstrap/40950] New: Bootstrap fails with in-tree gmp and without system C++ compiler
With this build script #!/bin/tcsh /bin/rm -rf *; ../../mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline-mem-stats --enable-languages=c --enable-gather-detailed-mem-stats ; make -j 6 bootstrap build.log on this OS: heine:~/programs/gcc/objdirs/mainline uname -a Linux heine.math.purdue.edu 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25 01:19:55 UTC 2009 x86_64 GNU/Linux with gmp 4.2.4 and mpfr-2.3.2 added to the mainline tree with revision Mon Aug 3 12:57:15 EDT 2009 Mon Aug 3 16:57:15 UTC 2009 (revision 150373) bootstrap fails when configuring gmp with the stage1 compiler with the message checking how to run the C++ preprocessor... /lib/cpp configure: error: C++ preprocessor /lib/cpp fails sanity check See `config.log' for more details. make[2]: *** [configure-stage2-gmp] Error 1 make[2]: Leaving directory `/home/lucier/programs/gcc/objdirs/mainline' make[1]: *** [stage2-bubble] Error 2 make[1]: Leaving directory `/home/lucier/programs/gcc/objdirs/mainline' make: *** [bootstrap] Error 2 I'll attach build.log and gmp/config.log and without a -- Summary: Bootstrap fails with in-tree gmp and without system C++ compiler Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lucier at math dot purdue dot edu GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40950
[Bug bootstrap/40950] Bootstrap fails with in-tree gmp and without system C++ compiler
--- Comment #1 from lucier at math dot purdue dot edu 2009-08-03 17:15 --- Created an attachment (id=18291) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18291action=view) Build log of failed bootstrap -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40950
[Bug bootstrap/40950] Bootstrap fails with in-tree gmp and without system C++ compiler
--- Comment #2 from lucier at math dot purdue dot edu 2009-08-03 17:16 --- Created an attachment (id=18292) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18292action=view) log of failed gmp configuration -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40950
[Bug bootstrap/40950] Bootstrap fails with in-tree gmp and without system C++ compiler
--- Comment #3 from lucier at math dot purdue dot edu 2009-08-03 17:17 --- Created an attachment (id=18293) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18293action=view) build log with right content type -- lucier at math dot purdue dot edu changed: What|Removed |Added Attachment #18291|0 |1 is obsolete|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40950
[Bug bootstrap/37739] [4.4 Regression] bootstrap broken with core gcc gcc-4.2.x
--- Comment #16 from lucier at math dot purdue dot edu 2009-07-02 16:35 --- OK, so we've had several reliable reports that this bug still exists, but I'm not high enough in the GCC bugzilla hierarchy to reopen this bug (I just tried), perhaps Andreas or Jakub would like to do so. (Jakub, I've added your e-mail as a CC to this bug, sorry if that isn't appropriate. -- lucier at math dot purdue dot edu changed: What|Removed |Added CC||jakub at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37739
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #106 from lucier at math dot purdue dot edu 2009-06-16 07:24 --- This machine has 4ms ticks, so we're getting down to a few ticks difference with a benchmark of this size. It's 156ms with 4.2.4, 168ms with 4.5.0, and 164 ms when -frename-registers is added to the command line. It's not just scheduling, there are more memory accesses with 4.5.0. With a problem roughly 10 times as large, the times are 4.2.4: 2912ms 4.5.0: 3204ms 4.5.0: 3120ms (adding -frename-registers) So there's a 7% difference with -frename-registers. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #98 from lucier at math dot purdue dot edu 2009-06-15 16:11 --- I don't quite understand how you would like me to configure and run the test. First, I've applied your patches to speed up computing DF to my tree; do you want them included in the test, or should I use a pristine mainline? Second, when configuring mainline, should I include, or not include 1. --enable-gather-detailed-mem-stats 2. --enable-checking=release After that, I think you just want to run two compiles with and without -ftime-report, is that right? (Nothing about -fmem-report.) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug rtl-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #102 from lucier at math dot purdue dot edu 2009-06-15 19:57 --- Subject: Re: [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475 On Mon, 2009-06-15 at 16:20 +, paolo dot bonzini at gmail dot com wrote: Yes, and the output of -ftime-report is not needed. Just the time ./cc1 ... output for the two. Thanks! The two commands: time /pkgs/gcc-mainline/bin/gcc -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -c compiler.i 261.424u 1.184s 4:22.76 99.9% 0+0k 0+28456io 0pf+0w time /pkgs/gcc-mainline/bin/gcc -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -c compiler.i -ftime-report 263.424u 4.900s 4:28.68 99.8% 0+0k 0+28480io 0pf+0w -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #103 from lucier at math dot purdue dot edu 2009-06-15 20:21 --- Regarding comment #101 ... With heine:~/programs/gcc/objdirs/gsc-fft-tests/gambc-v4_1_2 /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --disable-multilib --enable-checking=release Thread model: posix gcc version 4.5.0 20090608 (experimental) [trunk revision 148276] (GCC) (and including Paolo's patch to speed up DF), the routine in direct.c takes 168 ms cpu time (168 user, 0 system) As reported here http://www.math.purdue.edu/~lucier/bugzilla/9/ with gcc-4.2.4, this routine takes 156 ms on the same machine. Comment #9 gives the code that 4.2.4 generates at the start of the main loop; the start of the main loop with the version of 4.5.0 I gave above is: .L2938: movq%rcx, %rdx addq8(%rax), %rdx leaq4(%rcx), %rbx movq%rdx, -8(%rax) leaq4(%rdx), %rdi addq8(%rax), %rdx movq%rdi, -16(%rax) movq%rdx, -24(%rax) leaq4(%rdx), %rdi addq8(%rax), %rdx movq%rdi, -32(%rax) movq%rdx, -40(%rax) leaq4(%rdx), %rdi movq40(%rax), %rdx movq%rdi, -48(%rax) movsd 7(%rdx,%rdi,2), %xmm7 movq-40(%rax), %rdi leaq7(%rdx,%rcx,2), %r8 addq$8, %rcx movsd (%r8), %xmm4 cmpq%rcx, %r13 movsd 7(%rdx,%rdi,2), %xmm10 movq-32(%rax), %rdi movsd 7(%rdx,%rdi,2), %xmm5 movq-24(%rax), %rdi movsd 7(%rdx,%rdi,2), %xmm6 movq-16(%rax), %rdi movsd 7(%rdx,%rdi,2), %xmm13 movq-8(%rax), %rdi movsd 7(%rdx,%rdi,2), %xmm11 leaq(%rbx,%rbx), %rdi movsd 7(%rdi,%rdx), %xmm9 movq24(%rax), %rdx movapd %xmm11, %xmm14 movsd 15(%rdx), %xmm1 movsd 7(%rdx), %xmm2 movapd %xmm1, %xmm8 movsd 31(%rdx), %xmm3 movapd %xmm2, %xmm12 mulsd %xmm10, %xmm8 mulsd %xmm7, %xmm12 mulsd %xmm2, %xmm10 mulsd %xmm1, %xmm7 movsd 23(%rdx), %xmm0 So, to my mind, this is still a 4.5 regression, as there is still a slow-down and the code is still much less optimized by 4.5.0 than by 4.2.4. 168/156 ~ 1.08, so if you want to change the Summary of this bug to 8% regression, or some other things, that's fine, but I've changed this PR back to being a 4.5 regression. I was not really thrilled when Richard marked PR 39157 as a duplicate of this PR. To my mind, there are three more or less independent things---run time of Gambit-generated code, compile time of the code, and the space required to compile the code. This PR is about run time; PR 39157 was about space needed by the compiler; PR 26854 is about compile time. They seem to have all been mushed together. -- lucier at math dot purdue dot edu changed: What|Removed |Added Known to work|4.5.0 | Summary|[4.3/4.4 Regression] 30%|[4.3/4.4/4.5 Regression] 30% |performance slowdown in |performance slowdown in |floating-point code caused |floating-point code caused |by r118475 |by r118475 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #95 from lucier at math dot purdue dot edu 2009-06-14 14:59 --- The test case is compiler.i.gz -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #96 from lucier at math dot purdue dot edu 2009-06-14 15:02 --- Sorry, the gcc options are in comment 87 (the -fforward-propagate is now redundant), and without Paolo's recently proposed patch it requires about 9GB of memory to compile. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #91 from lucier at math dot purdue dot edu 2009-06-08 18:19 --- Created an attachment (id=17968) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17968action=view) time and memory report for compiler.i after Paolo's patch The patch cut the total bitmaps used compiling compiler.i from 60GB to 3GB; maximum memory (just from top) was 1631MB. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115
--- Comment #13 from lucier at math dot purdue dot edu 2009-05-16 14:37 --- Subject: Re: ICE in register_overhead, at bitmap.c:115 On May 13, 2009, at 9:32 PM, bje at gcc dot gnu dot org wrote: The test case does not run in a GB of RAM on my x86-64 system. It sends the system deep into swap until the out-of-memory manager kicks in. Ah, now that -fforward-propagate has been added to -O1 on mainline it takes a bit over 8GB of RAM to run instead of a GB. Sorry. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301
[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115
--- Comment #15 from lucier at math dot purdue dot edu 2009-05-17 01:09 --- Fixed by http://gcc.gnu.org/viewcvs?root=gccview=revrev=147624 -- lucier at math dot purdue dot edu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301
[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115
--- Comment #8 from lucier at math dot purdue dot edu 2009-05-15 21:55 --- Created an attachment (id=17876) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17876action=view) patch to use HOST_WIDEST_INT for bitmap statistics Here's a hack to use HOST_WIDEST_INT for bitmap statistics. I'll attach the report from the compiler.i test case. If you think the report is useful, perhaps you can use this as a starting point for a real patch and I'll bootstrap and test it. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301
[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115
--- Comment #9 from lucier at math dot purdue dot edu 2009-05-15 21:57 --- Created an attachment (id=17877) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17877action=view) memory and time report for compiler.i test case Here's the output for the test case. See if you like it. I used the following configure command and compiler version: pythagoras-147% /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /tmp/lucier/gcc/mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-gather-detailed-mem-stats --disable-bootstrap Thread model: posix gcc version 4.5.0 20090515 (experimental) [trunk revision 147594] (GCC) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #85 from lucier at math dot purdue dot edu 2009-05-16 00:20 --- Created an attachment (id=17878) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17878action=view) Large test file for testing time and memory usage This is the file compiler.i used in the previous tests. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115
--- Comment #6 from lucier at math dot purdue dot edu 2009-05-08 20:27 --- Just for more information, I now hit this on x86_64-unknown-linux-gnu with the compiler pythagoras-32% /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /tmp/lucier/gcc/mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-gather-detailed-mem-stats Thread model: posix gcc version 4.5.0 20090508 (experimental) [trunk revision 147288] (GCC) on the compiler.i test case with /pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -frename-registers -fno-move-loop-invariants -fforward-propagate -DHAVE_CONFIG_H -D___PRIMAL -D___LIBRARY -c compiler.i -ftime-report -fmem-report rename-no-move-loop-invariants-forward-propagate-report-new -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #71 from lucier at math dot purdue dot edu 2009-05-07 16:02 --- Created an attachment (id=17820) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17820action=view) time for 31957, with rename-registers -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #75 from lucier at math dot purdue dot edu 2009-05-07 16:31 --- Subject: Re: [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475 On May 7, 2009, at 12:21 PM, bonzini at gnu dot org wrote: --- Comment #74 from bonzini at gnu dot org 2009-05-07 16:21 --- Ok. One step at a time. :-) To recap, here is the situation: - that scheduling is necessary now and not in 4.2.x, probably is just a matter of luck If you mean -fschedule-insns2, it has always been part of the options list. - at least we have a set of options providing good performance on this testcase, and guidance towards better tuning of the various problematic optimizations OK, but -fforward-propagate is not viable in general for these machine-generated codes. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #63 from lucier at math dot purdue dot edu 2009-05-06 19:57 --- Was the patch in comment 55 meant for me to bootstrap and test with today's mainline? It crashes at the gcc_assert at /* Subroutine of canon_reg. Pass *XLOC through canon_reg, and validate the result if necessary. INSN is as for canon_reg. */ static void validate_canon_reg (rtx *xloc, rtx insn) { if (*xloc) { rtx new_rtx = canon_reg (*xloc, insn); /* If replacing pseudo with hard reg or vice versa, ensure the insn remains valid. Likewise if the insn has MATCH_DUPs. */ gcc_assert (insn new_rtx); validate_change (insn, xloc, new_rtx, 1); } } when building libgcc: /tmp/lucier/gcc/objdirs/mainline/./gcc/xgcc -B/tmp/lucier/gcc/objdirs/mainline/./gcc/ -B/pkgs/gcc-mainline/x86_64-unknown-linux-gnu/bin/ -B/pkgs/gcc-mainline/x86_64-unknown-linux-gnu/lib/ -isystem /pkgs/gcc-mainline/x86_64-unknown-linux-gnu/include -isystem /pkgs/gcc-mainline/x86_64-unknown-linux-gnu/sys-include -g -O2 -m32 -O2 -g -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wcast-qual -Wold-style-definition -isystem ./include -fPIC -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -I. -I. -I../../.././gcc -I../../../../../mainline/libgcc -I../../../../../mainline/libgcc/. -I../../../../../mainline/libgcc/../gcc -I../../../../../mainline/libgcc/../include -I../../../../../mainline/libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS -DUSE_TLS -o _moddi3.o -MT _moddi3.o -MD -MP -MF _moddi3.dep -DL_moddi3 -c ../../../../../mainline/libgcc/../gcc/libgcc2.c \ -fexceptions -fnon-call-exceptions -fvisibility=hidden -DHIDE_EXPORTS ../../../../../mainline/libgcc/../gcc/libgcc2.c: In function â: ../../../../../mainline/libgcc/../gcc/libgcc2.c:1121: internal compiler error: in validate_canon_reg, at cse.c:2730 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #64 from lucier at math dot purdue dot edu 2009-05-06 20:43 --- In answer to comment 60, here's the command line where I added -fforward-propagate -fno-move-loop-invariants: /pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fforward-propagate -fno-move-loop-invariants -DHAVE_CONFIG_H -D___PRIMAL -D___LIBRARY -D___GAMBCDIR=\/usr/local/Gambit-C/v4.1.2\ -D___SYS_TYPE_CPU=\x86_64\ -D___SYS_TYPE_VENDOR=\unknown\ -D___SYS_TYPE_OS=\linux-gnu\ -c _num.c here's the compiler: /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /tmp/lucier/gcc/mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline --enable-languages=c Thread model: posix gcc version 4.5.0 20090506 (experimental) [trunk revision 147199] (GCC) and the runtime didn't change (substantially) 132 ms cpu time (132 user, 0 system) and the loop looks pretty much just as bad (it's 117 instructions long, by my count): .L2752: movq%rcx, %rdx addq8(%rax), %rdx leaq4(%rcx), %rdi movq%rdx, -8(%rax) leaq4(%rdx), %rbx addq8(%rax), %rdx movq%rbx, -16(%rax) movq%rdx, -24(%rax) leaq4(%rdx), %rbx addq8(%rax), %rdx movq%rbx, -32(%rax) movq%rdx, -40(%rax) leaq4(%rdx), %rbx movq40(%rax), %rdx movq%rbx, -48(%rax) movsd 7(%rdx,%rbx,2), %xmm9 movq-40(%rax), %rbx leaq7(%rdx,%rcx,2), %r8 addq$8, %rcx movsd (%r8), %xmm4 cmpq%rcx, %r13 movsd 7(%rdx,%rbx,2), %xmm11 movq-32(%rax), %rbx movsd 7(%rdx,%rbx,2), %xmm5 movq-24(%rax), %rbx movsd 7(%rdx,%rbx,2), %xmm7 movq-16(%rax), %rbx movsd 7(%rdx,%rbx,2), %xmm14 movq-8(%rax), %rbx movsd 7(%rdx,%rbx,2), %xmm6 leaq(%rdi,%rdi), %rbx movsd 7(%rbx,%rdx), %xmm8 movq24(%rax), %rdx movapd %xmm6, %xmm13 movsd 15(%rdx), %xmm1 movsd 7(%rdx), %xmm2 movapd %xmm1, %xmm10 movsd 31(%rdx), %xmm3 movapd %xmm2, %xmm12 mulsd %xmm11, %xmm10 mulsd %xmm9, %xmm12 mulsd %xmm2, %xmm11 mulsd %xmm1, %xmm9 movsd 23(%rdx), %xmm0 addsd %xmm12, %xmm10 movapd %xmm2, %xmm12 mulsd %xmm7, %xmm2 subsd %xmm9, %xmm11 movapd %xmm1, %xmm9 mulsd %xmm5, %xmm12 mulsd %xmm5, %xmm1 movapd %xmm8, %xmm5 mulsd %xmm7, %xmm9 movapd %xmm4, %xmm7 subsd %xmm11, %xmm13 addsd %xmm6, %xmm11 movsd .LC5(%rip), %xmm6 subsd %xmm1, %xmm2 movapd %xmm0, %xmm1 addsd %xmm12, %xmm9 movapd %xmm14, %xmm12 xorpd %xmm3, %xmm6 subsd %xmm10, %xmm12 mulsd %xmm13, %xmm1 subsd %xmm2, %xmm7 addsd %xmm4, %xmm2 movapd %xmm6, %xmm4 addsd %xmm14, %xmm10 mulsd %xmm13, %xmm6 mulsd %xmm12, %xmm4 subsd %xmm9, %xmm5 mulsd %xmm0, %xmm12 addsd %xmm8, %xmm9 movapd %xmm0, %xmm8 mulsd %xmm11, %xmm0 addsd %xmm1, %xmm4 movapd %xmm3, %xmm1 mulsd %xmm10, %xmm3 subsd %xmm12, %xmm6 mulsd %xmm11, %xmm1 mulsd %xmm10, %xmm8 subsd %xmm3, %xmm0 addsd %xmm1, %xmm8 movapd %xmm2, %xmm1 addsd %xmm0, %xmm1 subsd %xmm0, %xmm2 movapd %xmm7, %xmm0 subsd %xmm6, %xmm7 addsd %xmm6, %xmm0 movsd %xmm1, (%r8) movapd %xmm9, %xmm1 movq40(%rax), %rdx subsd %xmm8, %xmm9 addsd %xmm8, %xmm1 movsd %xmm1, 7(%rbx,%rdx) movq-8(%rax), %rbx movq40(%rax), %rdx movsd %xmm2, 7(%rdx,%rbx,2) movq-16(%rax), %rbx movq40(%rax), %rdx movsd %xmm9, 7(%rdx,%rbx,2) movq-24(%rax), %rbx movq40(%rax), %rdx movsd %xmm0, 7(%rdx,%rbx,2) movapd %xmm5, %xmm0 movq-32(%rax), %rbx movq40(%rax), %rdx subsd %xmm4, %xmm5 addsd %xmm4, %xmm0 movsd %xmm0, 7(%rdx,%rbx,2) movq-40(%rax), %rbx movq40(%rax), %rdx movsd %xmm7, 7(%rdx,%rbx,2) movq-48(%rax), %rbx movq40(%rax), %rdx movsd %xmm5, 7(%rdx,%rbx,2) jg .L2752 movq%rdi, %r13 .L2751: -- lucier at math dot purdue dot edu changed: What|Removed |Added
[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #66 from lucier at math dot purdue dot edu 2009-05-07 05:27 --- Adding -frename-registers gives a significant speedup (sometimes as fast as 4.1.2 on this shared machine, i.e., it somtimes hits 108 ms instead of 132-140ms), the command line with -fforward-propagate -fno-move-loop-invariants -frename-registers is /pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fforward-propagate -fno-move-loop-invariants -frename-registers -DHAVE_CONFIG_H -D___PRIMAL -D___LIBRARY -D___GAMBCDIR=\/usr/local/Gambit-C/v4.1.2\ -D___SYS_TYPE_CPU=\x86_64\ -D___SYS_TYPE_VENDOR=\unknown\ -D___SYS_TYPE_OS=\linux-gnu\ -c _num.c and the loop is .L2752: movq%rcx, %r12 addq8(%rax), %r12 leaq4(%rcx), %rdi movq%r12, -8(%rax) leaq4(%r12), %r8 addq8(%rax), %r12 movq%r8, -16(%rax) movq-8(%rax), %r8 movq-16(%rax), %rdx movq%r12, -24(%rax) leaq4(%r12), %rbx addq8(%rax), %r12 movq-24(%rax), %r9 movq%rbx, -32(%rax) movq24(%rax), %rbx movq-32(%rax), %r10 leaq4(%r12), %r11 movq%r12, -40(%rax) movq40(%rax), %r12 movq-40(%rax), %r14 movq%r11, -48(%rax) movsd 15(%rbx), %xmm1 movsd 7(%rbx), %xmm2 movsd 7(%r12,%r11,2), %xmm9 movapd %xmm1, %xmm3 movsd 7(%r12,%r14,2), %xmm11 leaq7(%r12,%rcx,2), %r11 movapd %xmm2, %xmm10 leaq(%rdi,%rdi), %r14 mulsd %xmm11, %xmm3 movapd %xmm2, %xmm12 mulsd %xmm9, %xmm10 addq$8, %rcx mulsd %xmm1, %xmm9 cmpq%rcx, %r13 mulsd %xmm2, %xmm11 movsd 7(%r12,%r10,2), %xmm5 movsd 7(%r12,%r9,2), %xmm7 addsd %xmm10, %xmm3 movsd 7(%r12,%r8,2), %xmm6 subsd %xmm9, %xmm11 mulsd %xmm7, %xmm2 movapd %xmm1, %xmm9 mulsd %xmm5, %xmm1 movapd %xmm6, %xmm13 movsd 7(%r12,%rdx,2), %xmm14 mulsd %xmm5, %xmm12 mulsd %xmm7, %xmm9 subsd %xmm11, %xmm13 movsd 31(%rbx), %xmm0 addsd %xmm6, %xmm11 movsd .LC5(%rip), %xmm6 subsd %xmm1, %xmm2 movsd (%r11), %xmm4 movapd %xmm14, %xmm10 xorpd %xmm0, %xmm6 addsd %xmm12, %xmm9 movsd 7(%r14,%r12), %xmm8 subsd %xmm3, %xmm10 movapd %xmm4, %xmm7 addsd %xmm14, %xmm3 movsd 23(%rbx), %xmm15 subsd %xmm2, %xmm7 movapd %xmm8, %xmm5 addsd %xmm4, %xmm2 movapd %xmm6, %xmm4 subsd %xmm9, %xmm5 movapd %xmm15, %xmm14 addsd %xmm8, %xmm9 mulsd %xmm10, %xmm4 movapd %xmm15, %xmm8 mulsd %xmm15, %xmm10 movapd %xmm0, %xmm12 mulsd %xmm11, %xmm15 mulsd %xmm3, %xmm0 movapd %xmm7, %xmm1 mulsd %xmm13, %xmm6 mulsd %xmm3, %xmm8 movapd %xmm9, %xmm3 mulsd %xmm11, %xmm12 subsd %xmm0, %xmm15 mulsd %xmm13, %xmm14 subsd %xmm10, %xmm6 movapd %xmm2, %xmm10 movapd %xmm5, %xmm0 addsd %xmm12, %xmm8 addsd %xmm15, %xmm10 subsd %xmm15, %xmm2 addsd %xmm14, %xmm4 addsd %xmm8, %xmm3 movsd %xmm10, (%r11) movq40(%rax), %r10 subsd %xmm8, %xmm9 addsd %xmm6, %xmm1 addsd %xmm4, %xmm0 movsd %xmm3, 7(%r14,%r10) movq-8(%rax), %r9 movq40(%rax), %rdx subsd %xmm6, %xmm7 subsd %xmm4, %xmm5 movsd %xmm2, 7(%rdx,%r9,2) movq-16(%rax), %r8 movq40(%rax), %r12 movsd %xmm9, 7(%r12,%r8,2) movq-24(%rax), %rbx movq40(%rax), %r11 movsd %xmm1, 7(%r11,%rbx,2) movq-32(%rax), %r14 movq40(%rax), %r10 movsd %xmm0, 7(%r10,%r14,2) movq-40(%rax), %r9 movq40(%rax), %rdx movsd %xmm7, 7(%rdx,%r9,2) movq-48(%rax), %r8 movq40(%rax), %r12 movsd %xmm5, 7(%r12,%r8,2) jg .L2752 Adding -fforward-propagate -fno-move-loop-invariants -fweb instead of -fforward-propagate -fno-move-loop-invariants -frename-registers, so the compile line is /pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fforward-propagate -fno-move-loop-invariants -fweb -DHAVE_CONFIG_H -D___PRIMAL -D___LIBRARY -D___GAMBCDIR=\/usr/local/Gambit-C/v4.1.2\ -D___SYS_TYPE_CPU=\x86_64
[Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #53 from lucier at math dot purdue dot edu 2009-05-06 03:43 --- I posted a possible fix to gcc-patches with the subject line Possible fix for 30% performance regression in PR 33928 Here's the assembly for the main loop after the changes I proposed: .L4230: movq%r11, %rdi addq8(%r10), %rdi movq8(%r10), %rsi movq8(%r10), %rdx movq40(%r10), %rax leaq4(%r11), %rbx addq%rdi, %rsi leaq4(%rdi), %r9 movq%rdi, -8(%r10) addq%rsi, %rdx leaq4(%rsi), %r8 movq%rsi, -24(%r10) leaq4(%rdx), %rcx movq%r9, -16(%r10) movq%rdx, -40(%r10) movq%r8, -32(%r10) addq$7, %rax movq%rcx, -48(%r10) movsd (%rax,%rcx,2), %xmm12 leaq(%rbx,%rbx), %rcx movsd (%rax,%rdx,2), %xmm3 leaq(%rax,%r11,2), %rdx addq$8, %r11 movsd (%rax,%r8,2), %xmm14 cmpq%r11, %r13 movsd (%rax,%rsi,2), %xmm13 movsd (%rax,%r9,2), %xmm11 movsd (%rax,%rdi,2), %xmm10 movsd (%rax,%rcx), %xmm8 movq24(%r10), %rax movsd (%rdx), %xmm7 movsd 15(%rax), %xmm2 movsd 7(%rax), %xmm1 movapd %xmm2, %xmm0 movsd 31(%rax), %xmm9 movapd %xmm1, %xmm6 mulsd %xmm3, %xmm0 movapd %xmm1, %xmm4 mulsd %xmm12, %xmm6 mulsd %xmm3, %xmm4 movapd %xmm1, %xmm3 mulsd %xmm13, %xmm1 mulsd %xmm14, %xmm3 addsd %xmm0, %xmm6 movapd %xmm2, %xmm0 movsd 23(%rax), %xmm5 mulsd %xmm12, %xmm0 movapd %xmm7, %xmm12 subsd %xmm0, %xmm4 movapd %xmm2, %xmm0 mulsd %xmm14, %xmm2 movapd %xmm8, %xmm14 mulsd %xmm13, %xmm0 movapd %xmm11, %xmm13 addsd %xmm6, %xmm11 subsd %xmm6, %xmm13 subsd %xmm2, %xmm1 movapd %xmm10, %xmm2 addsd %xmm0, %xmm3 movapd %xmm5, %xmm0 subsd %xmm4, %xmm2 addsd %xmm4, %xmm10 subsd %xmm1, %xmm12 addsd %xmm1, %xmm7 movapd %xmm9, %xmm1 subsd %xmm3, %xmm14 mulsd %xmm2, %xmm0 xorpd .LC5(%rip), %xmm1 addsd %xmm3, %xmm8 movapd %xmm1, %xmm3 mulsd %xmm2, %xmm1 movapd %xmm5, %xmm2 mulsd %xmm13, %xmm3 mulsd %xmm11, %xmm2 addsd %xmm0, %xmm3 movapd %xmm5, %xmm0 mulsd %xmm10, %xmm5 mulsd %xmm13, %xmm0 subsd %xmm0, %xmm1 movapd %xmm9, %xmm0 mulsd %xmm11, %xmm9 mulsd %xmm10, %xmm0 subsd %xmm9, %xmm5 addsd %xmm0, %xmm2 movapd %xmm7, %xmm0 addsd %xmm5, %xmm0 subsd %xmm5, %xmm7 movsd %xmm0, (%rdx) movapd %xmm8, %xmm0 movq40(%r10), %rax subsd %xmm2, %xmm8 addsd %xmm2, %xmm0 movsd %xmm0, 7(%rcx,%rax) movq-8(%r10), %rdx movq40(%r10), %rax movapd %xmm12, %xmm0 subsd %xmm1, %xmm12 movsd %xmm7, 7(%rax,%rdx,2) movq-16(%r10), %rdx movq40(%r10), %rax addsd %xmm1, %xmm0 movsd %xmm8, 7(%rax,%rdx,2) movq-24(%r10), %rdx movq40(%r10), %rax movsd %xmm0, 7(%rax,%rdx,2) movapd %xmm14, %xmm0 movq-32(%r10), %rdx movq40(%r10), %rax subsd %xmm3, %xmm14 addsd %xmm3, %xmm0 movsd %xmm0, 7(%rax,%rdx,2) movq-40(%r10), %rdx movq40(%r10), %rax movsd %xmm12, 7(%rax,%rdx,2) movq-48(%r10), %rdx movq40(%r10), %rax movsd %xmm14, 7(%rax,%rdx,2) jg .L4230 movq%rbx, %r13 .L4228: -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #54 from lucier at math dot purdue dot edu 2009-05-06 03:50 --- Created an attachment (id=17805) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17805action=view) svn diff of cse.c to fix the performance regression This partially reverts r118475 and adds code to call find_best_address for MEMs in fold_rtx. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/39914] 96% performance regression in floating point code; part of the problem started 2009/03/12-13
--- Comment #3 from lucier at math dot purdue dot edu 2009-04-27 15:07 --- Subject: Re: 96% performance regression in floating point code; part of the problem started 2009/03/12-13 On Sun, 2009-04-26 at 18:43 +, ubizjak at gmail dot com wrote: --- Comment #1 from ubizjak at gmail dot com 2009-04-26 18:43 --- There are a couple of possible candidates in this range: URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=144812 Log: 2009-03-12 Vladimir Makarov vmaka...@redhat.com PR debug/39432 * ira-int.h (struct allocno): Fix comment for calls_crossed_num. * ira-conflicts.c (ira_build_conflicts): Prohibit call used registers for allocnos created from user-defined variables. The problem exists in gcc version 4.4.0 20090312 (experimental) [trunk revision 144812] (GCC) So perhaps it's this checkin. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914
[Bug regression/39914] 96% performance regression in floating point code; part of the problem started 2009/03/12-13
--- Comment #4 from lucier at math dot purdue dot edu 2009-04-27 15:11 --- Subject: Re: 96% performance regression in floating point code; part of the problem started 2009/03/12-13 On Mon, 2009-04-27 at 08:16 +, ubizjak at gmail dot com wrote: --- Comment #2 from ubizjak at gmail dot com 2009-04-27 08:16 --- (In reply to comment #0) (same .i file, same instructions for reproducing, same compiler options, same everything) I guess that this is direct.i compiled with -O1? Yes, the compile flags are -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp It is not clear from your report, if -O1 flag is problematic, -O2 code looks good to me. Yes, the -O2 code looks good to me, too. I've used the above list of options (starting with -O1) on this code instead of -O2 because the above list (a) has generally given faster performance, and (b) has required much less compile time and memory to compile the C code generated by the Gambit Scheme-C compiler. I have not yet seen any evidence that -O2 generates better code (overall) than those set of options above. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914
[Bug regression/39914] 96% performance regression in floating point code; part of the problem started 2009/03/12-13
--- Comment #6 from lucier at math dot purdue dot edu 2009-04-27 15:32 --- Subject: Re: 96% performance regression in floating point code; part of the problem started 2009/03/12-13 On Mon, 2009-04-27 at 15:26 +, pinskia at gcc dot gnu dot org wrote: This is by design -O1 is way slower than -O2 now. I have seen no general discussion that -O1 should be destroyed as a useful compilation option. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914
[Bug regression/39914] 96% performance regression in floating point code; part of the problem started 2009/03/12-13
--- Comment #7 from lucier at math dot purdue dot edu 2009-04-27 15:35 --- Subject: Re: 96% performance regression in floating point code; part of the problem started 2009/03/12-13 On Mon, 2009-04-27 at 15:32 +, lucier at math dot purdue dot edu wrote: On Mon, 2009-04-27 at 15:26 +, pinskia at gcc dot gnu dot org wrote: This is by design -O1 is way slower than -O2 now. I have seen no general discussion that -O1 should be destroyed as a useful compilation option. Perhaps I should also point out that code generated by -O2 is not generally much faster than before, so if you believe that -O1 is much slower than -O2 now by design, it is only by making code generated by -O1 much slower. BTW, this code runs in 108 ms when compiled with gcc-4.2.4 with the given options (including -O1). Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914
[Bug regression/39914] 96% performance regression in floating point code; part of the problem started 2009/03/12-13
--- Comment #8 from lucier at math dot purdue dot edu 2009-04-27 16:29 --- I hadn't noticed before that Andrew had marked it as RESOLVED INVALID. I'm reopening it, as I believe that resolving it as INVALID should require a more general discussion than a one-line dismissal of the bug. Brad -- lucier at math dot purdue dot edu changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914
[Bug regression/39914] [4.4/4.5 Regression] 96% performance regression in floating point code; part of the problem started 2009/03/12-13
--- Comment #11 from lucier at math dot purdue dot edu 2009-04-27 20:37 --- As far as I can tell, the patch proposed by Uros restores the performance of code generated by gcc version 4.4.0 20090312 (experimental) [trunk revision 144812] (GCC) In particular, the assembly code for the main loop is identical for code generated by gcc version 4.4.0 20090312 (experimental) [trunk revision 144801] (GCC) and by gcc version 4.4.0 20090312 (experimental) [trunk revision 144812] (GCC) after his patch. Thanks for getting to this so quickly. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914
[Bug regression/39914] [4.4/4.5 Regression] 96% performance regression in floating point code; part of the problem started 2009/03/12-13
--- Comment #12 from lucier at math dot purdue dot edu 2009-04-28 01:39 --- I tried to build and check with this patch, but I got stopped with: /tmp/lucier/gcc/objdirs/mainline/./prev-gcc/xgcc -B/tmp/lucier/gcc/objdirs/mainline/./prev-gcc/ -B/pkgs/gcc-mainline/x86_64-unknown-linux-gnu/bin/ -c -g -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wcast-qual -Wold-style-definition -Wc++-compat -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -I. -Ibuild -I../../../mainline/gcc -I../../../mainline/gcc/build -I../../../mainline/gcc/../include -I../../../mainline/gcc/../libcpp/include -I/tmp/lucier/gcc/objdirs/mainline/./gmp -I/tmp/lucier/gcc/mainline/gmp -I/tmp/lucier/gcc/objdirs/mainline/./mpfr -I/tmp/lucier/gcc/mainline/mpfr -I../../../mainline/gcc/../libdecnumber -I../../../mainline/gcc/../libdecnumber/bid -I../libdecnumber-o build/vec.o ../../../mainline/gcc/vec.c cc1: warnings being treated as errors ../../../mainline/gcc/vec.c: In function vec_descriptor: ../../../mainline/gcc/vec.c:116: error: enum conversion when passing argument 3 of htab_find_slot is invalid in C++ ../../../mainline/gcc/../include/hashtab.h:172: note: expected enum insert_option but argument is of type int make[3]: *** [build/vec.o] Error 1 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914
[Bug regression/39914] New: 96% performance regression in floating point code; part of the problem started 2009/03/12-13
60% performance regression, the rest is accounte for by http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928 Brad -- Summary: 96% performance regression in floating point code; part of the problem started 2009/03/12-13 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lucier at math dot purdue dot edu GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914
[Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #52 from lucier at math dot purdue dot edu 2009-04-26 18:27 --- I narrowed down the new performance regression to code added some time around March 12, 2009, so I changed back the subject line of this PR to reflect the performance regression caused only by the code added 2006-11-03 and added a new PR http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914 to reflect the effects of the March, 2009, code. -- lucier at math dot purdue dot edu changed: What|Removed |Added Summary|[4.3/4.4/4.5 Regression] 79%|[4.3/4.4/4.5 Regression] 30% |performance slowdown in |performance slowdown in |floating-point code |floating-point code caused |partially caused by r118475|by r118475 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 79% performance slowdown in floating-point code partially caused by r118475
--- Comment #49 from lucier at math dot purdue dot edu 2009-04-23 15:58 --- With 4.4.0 and with mainline this code now runs in 280 ms instead of in 156 ms with 4.2.4. Since 280/156 = 1.794871794871795 I changed the subject line (the slowdown is now not completely caused by r118475). I guess I'll post the assembly code generated by 4.4.0 in the next attachment. Timings (best of three runs) for the last (time (direct-fft-recursive-4 a table)) from gsi/gsi -e '(define a (time (expt 3 1000)))(define b (time (* a a)))' With gcc-4.1.2: 188 ms cpu time (188 user, 0 system) With gcc-4.2.4 156 ms cpu time (152 user, 4 system) With gcc-4.3.3: 180 ms cpu time (180 user, 0 system) With gcc-4.4.0 280 ms cpu time (280 user, 0 system) With 4.5.0 20090423 (experimental) [trunk revision 146634] 280 ms cpu time (280 user, 0 system) -- lucier at math dot purdue dot edu changed: What|Removed |Added Summary|[4.3/4.4/4.5 Regression] 30%|[4.3/4.4/4.5 Regression] 79% |performance slowdown in |performance slowdown in |floating-point code caused |floating-point code |by r118475 |partially caused by r118475 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 79% performance slowdown in floating-point code partially caused by r118475
--- Comment #50 from lucier at math dot purdue dot edu 2009-04-23 16:00 --- Created an attachment (id=17685) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17685action=view) direct.s generated by 4.4.0 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 79% performance slowdown in floating-point code partially caused by r118475
--- Comment #51 from lucier at math dot purdue dot edu 2009-04-23 16:03 --- Forgot to mention, the main loop starts at .L2947. This is on model name : Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115
--- Comment #5 from lucier at math dot purdue dot edu 2009-03-31 12:38 --- You have --disable-bootstrap, so my guess is that cc1 is a 32-bit binary if that's what your system compiler builds by default. By bootstrapping you get a 64-bit binary (the first cc1 built in the bootstrap is 32-bit, but the second and third are 64-bit). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301
[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115
--- Comment #3 from lucier at math dot purdue dot edu 2009-03-27 15:12 --- I'm still seeing it with: [luc...@descartes ~]$ /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: powerpc64-unknown-linux-gnu Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-gather-detailed-mem-stats --with-cpu=default64 Thread model: posix gcc version 4.4.0 20090327 (experimental) [trunk revision 145100] (GCC) as [luc...@descartes compiler.i-test]$ /pkgs/gcc-mainline/libexec/gcc/powerpc64-unknown-linux-gnu/4.4.0/cc1 -I../include -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common compiler.i btowc wctob mbrlen {GC 5325k - 3526k} {GC 5325k - 4483k} code_size ___H__20_compiler_2e_o1 {GC 201152k - 113339k} ___init_proc 20_compiler_2e_o1 Analyzing compilation unit {GC 181409k - 135700k}Performing interprocedural optimizations visibility early_local_cleanups {GC 237979k - 236431k} summary generate inline static-var pure-constAssembling functions: code_size ___init_proc 20_compiler_2e_o1 ___H__20_compiler_2e_o1 {GC 349493k - 288659k} {GC 406233k - 272085k} compiler.c: In function â: compiler.c:322876: internal compiler error: in register_overhead, at bitmap.c:115 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. I have to admit I didn't see it with an x86-64 compiler; perhaps the ppc64 port is more complicated and requires more bitmaps. I suspect, given the error message, that you built a 32-bit compiler and ran out of memory space before you hit this problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301
[Bug c/39301] New: ICE in register_overhead, at bitmap.c:115
With this compiler: [luc...@descartes gambc-v4_4_1-devel]$ /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: powerpc64-unknown-linux-gnu Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-gather-detailed-mem-stats --with-cpu=default64 Thread model: posix gcc version 4.4.0 20090224 (experimental) [trunk revision 144414] (GCC) with compiler.i found at http://www.math.purdue.edu/~lucier/bugzilla/8 and this command line: [luc...@descartes gambc-v4_4_1-devel]$ gdb /pkgs/gcc-mainline/libexec/gcc/powerpc64-unknown-linux-gnu/4.4.0/cc1 (gdb) run -I../include -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common compiler.i one gets an ICE Starting program: /pkgs/gcc-mainline/libexec/gcc/powerpc64-unknown-linux-gnu/4.4.0/cc1 -I../include -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common compiler.i btowc wctob mbrlen {GC 5504k - 3345k} {GC 5325k - 4387k} code_size ___H__20_compiler_2e_o1 {GC 202396k - 113348k} ___init_proc 20_compiler_2e_o1 Analyzing compilation unit {GC 182571k - 135708k}Performing interprocedural optimizations visibility early_local_cleanups {GC 237987k - 236439k} summary generate inline static-var pure-constAssembling functions: code_size ___init_proc 20_compiler_2e_o1 ___H__20_compiler_2e_o1 {GC 349654k - 288661k} {GC 406235k - 272087k} compiler.c: In function ___H__20_compiler_2e_o1: compiler.c:322876: internal compiler error: in register_overhead, at bitmap.c:115 I'm sorry the test case is enormous, but it runs in about a GB of RAM. I also haven't been able to figure out how to use gdb properly in this mixed ppc32/ppc64 environment. -- Summary: ICE in register_overhead, at bitmap.c:115 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lucier at math dot purdue dot edu GCC build triplet: powerpc64-unknown-linux-gnu GCC host triplet: powerpc64-unknown-linux-gnu GCC target triplet: powerpc64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301
[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines
--- Comment #104 from lucier at math dot purdue dot edu 2009-02-21 18:56 --- Subject: Re: [4.3/4.4 Regression] Inordinate compile times on large routines Cool, that leaves me with DFS = ??? SCC = ? Confict ? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines
--- Comment #98 from lucier at math dot purdue dot edu 2009-02-20 19:52 --- Thank you, that indeed fixes the LICM problem. Based on some comments for this PR and for PR 39157 I thought that a similar patch might apply to PRE. So with euler-14% /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-gather-detailed-mem-stats Thread model: posix gcc version 4.4.0 20090220 (experimental) [trunk revision 144328] (GCC) I ran this command /pkgs/gcc-mainline/bin/gcc -v -c -O2 -fmem-report -ftime-report compiler.i -save-temps ! report-compiler where compiler.i is found at http://www.math.purdue.edu/~lucier/bugzilla/8/ and I killed the job after it required 17GB of RAM. This job compiles just fine with euler-15% /pkgs/gcc-4.1.2/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../configure --prefix=/pkgs/gcc-4.1.2 Thread model: posix gcc version 4.1.2 in about 1.5 GB of RAM. To derive some statistics I ran /pkgs/gcc-mainline/bin/gcc -v -c -O2 -fmem-report -ftime-report _num.i -save-temps ! report-num where the smaller file _num.i is also found at http://www.math.purdue.edu/~lucier/bugzilla/8/ I'll attach report-num to this PR. The highlights are PRE : 23.28 (24%) usr 0.01 ( 0%) sys 23.51 (24%) wall 681 kB ( 0%) ggc integrated RA : 12.70 (13%) usr 0.00 ( 0%) sys 12.83 (13%) wall 3709 kB ( 2%) ggc TOTAL : 95.93 2.7399.72 227422 kB and that's about it, nothing else above 5%. There are also accurate memory statistics, as I've added a patch to my local sources so that memory statistics don't overflow 32-bit counters. I think the -O1 and -O2 limits for LICM are quite reasonable; would it be possible to limit PRE similarly so that one could compile compiler.i with -O2 in a reasonable amount of memory? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines
--- Comment #99 from lucier at math dot purdue dot edu 2009-02-20 19:54 --- Created an attachment (id=17336) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17336action=view) Memory and CPU statistics when compiling _num.i with -O2 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines
--- Comment #100 from lucier at math dot purdue dot edu 2009-02-20 19:56 --- The large memory requirements for LICM at -O1 and -O2 is still a regression for the 4.2 and 4.3 branches. Jakub's patch is short and elegant; do you think it would be a good idea to backport it to the other open branches? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines
--- Comment #93 from lucier at math dot purdue dot edu 2009-02-14 21:58 --- Subject: Re: [4.3/4.4 Regression] Inordinate compile times on large routines I instrumented the compiler and looked how many nodes were in each loop processed by LICM for the Gambit runtime and compiler. For generated code, except for the loop that contained the entire function, the greatest number of nodes was 30. (Because computed gotos are used in the code that checks for heap and stack overflows after allocations and for waiting interrupts, it's hard to go long in Scheme code without hitting the big loop.) For hand-written code, the greatest number of nodes in a loop was 123. When bootstrapping gcc with --enable-languages=c, the largest number of nodes in a loop was 803, and there were 12 loops detected that had over 500 nodes. 548 loops had 100 nodes or greater. (This is a bootstrap, so some files were compiled twice with the instrumented compiler.) So perhaps an -O1 default for LICM of 100 nodes is reasonable, or perhaps one might up it to 1000 just to catch everything reasonable. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines
--- Comment #86 from lucier at math dot purdue dot edu 2009-02-13 15:40 --- Subject: Re: [4.3/4.4 Regression] Inordinate compile times on large routines It's unfortunate that the discussion from 39157 will be somewhat hard to find now that that bug is closed. Steven wrote in a comment for 39157: It's not like there will not be any loop invariant code motion (LICM) at all anymore if the RTL LICM pass is disabled. There is an LICM pass on GIMPLE, and there is also PRE for GIMPLE (and lazy code motion for RTL but I think it disables itself for your test case). The RTL LICM pass mostly cleans up after expand, i.e. moves things that are not exposed in GIMPLE. This is mostly just address calculations. The loop in _num.i that I mentioned in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157#c19 is the loop in PR 33928 that is no longer fully optimized after Paolo (and you, I guess, your name is on the patch) added PRE and disabled some optimizations in CSE, and what is no longer optimized in that loop are address calculations. I don't know whether those address calculations fall under LICM, the only point I'm trying to make right now is that address calculations are no longer optimized as much as they were before http://gcc.gnu.org/viewcvs?root=gccview=revrev=118475 and address calculations are an important class of calculations to optimize. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #45 from lucier at math dot purdue dot edu 2009-02-13 16:09 --- Subject: Re: [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475 On Fri, 2009-02-13 at 16:05 +, bonzini at gnu dot org wrote: --- Comment #44 from bonzini at gnu dot org 2009-02-13 16:05 --- A simplified (local, noncascading) fwprop not using UD chains would not be hard to do... Basically, at -O1 use FOR_EACH_BB/FOR_EACH_BB_INSN instead of walking the uses, keep a (regno, insn) map of pseudos (cleared at the beginning of every basic block), and use that info instead of UD chains in use_killed_between... As noted in comment 42, enabling FWPROP on this test case does not fix the performance problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines
--- Comment #90 from lucier at math dot purdue dot edu 2009-02-13 17:37 --- Subject: Re: [4.3/4.4 Regression] Inordinate compile times on large routines On Fri, 2009-02-13 at 16:54 +, bonzini at gnu dot org wrote: --- Comment #87 from bonzini at gnu dot org 2009-02-13 16:54 --- The problem is that -O1 was never meant to give very fast code. I'm not looking for very fast code, I'm looking for code that doesn't get 30% slower from one SVN revision number to the next. You are using it only because our throttling of expensive passes is insufficient. I am using -O1 because code of this type compiled with -O2 runs significantly more slowly than code of this type compiled with -O1. I have never used -O2 on this type of code. Fixing that has two sides, as done in PR39157's discussion: 1) disabling more passes at -O1, 2) establishing some parameters to throttle down passes at -O2. I don't see that (1) and (2) form the main strategy to fix that, it seems that understanding the existing optimizations that are being disabled in preference for new ones is a good start. And generally ensuring that -O1 code doesn't get significantly slower while compile times get significantly higher. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines
--- Comment #91 from lucier at math dot purdue dot edu 2009-02-13 17:43 --- Subject: Re: [4.3/4.4 Regression] Inordinate compile times on large routines On Fri, 2009-02-13 at 17:37 +, lucier at math dot purdue dot edu wrote: --- Comment #90 from lucier at math dot purdue dot edu 2009-02-13 17:37 --- Subject: Re: [4.3/4.4 Regression] Inordinate compile times on large routines On Fri, 2009-02-13 at 16:54 +, bonzini at gnu dot org wrote: --- Comment #87 from bonzini at gnu dot org 2009-02-13 16:54 --- The problem is that -O1 was never meant to give very fast code. I'm not looking for very fast code, I'm looking for code that doesn't get 30% slower from one SVN revision number to the next. Sorry, this comment refers to PR 33928, not this PR. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher
--- Comment #15 from lucier at math dot purdue dot edu 2009-02-12 16:35 --- Some comments (a lot went on while I was sleeping): 1. Yes, this is similar to the test case of PR26854, but the C code generator has changed significantly since that test case was filed. I don't know if the changes in the code generator really affect what's happening here, however. 2. I'm trying to get a moderately sized test case that will compile in about 3GB of RAM, as Steven requested. (The test case from PR26854 takes at least 7GB of RAM to compile on ppc64.) 3. If the amount of memory and cpu time required by the test case at -O1 doesn't increase significantly when loop-invariant motion is performed on loops of size up to 10,000, then it would be good if the parameter at -O1 could be 10,000 instead of 100 (or at least larger than 100), as is suggested in the most recent patch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157
[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher
--- Comment #18 from lucier at math dot purdue dot edu 2009-02-12 19:54 --- There is now a file slatex.i at http://www.math.purdue.edu/~lucier/bugzilla/8/ that compiles in about 650MB of memory with gcc-4.2.3 on x86-64 with the same options; I don't know if that will help Steven. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157
[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher
--- Comment #19 from lucier at math dot purdue dot edu 2009-02-12 20:51 --- Subject: Re: Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher On Thu, 2009-02-12 at 16:52 +, rguenth at gcc dot gnu dot org wrote: --- Comment #16 from rguenth at gcc dot gnu dot org 2009-02-12 16:52 --- Actually for PR26854 it is just one loop that is detected, covering all of the function (with approx. 56000 basic blocks and one basic-block that has edges to all other basic blocks in the loop). Richard: I'm wondering if you could look at a smaller file that's generated in a somewhat different way. At http://www.math.purdue.edu/~lucier/bugzilla/8/ there's a file _num.i.gz that I think should have smaller (nested) loops than the entire file, for example, from the label ___L189__23__23_bignum_2e__2a_: at line 50031 to just before label ___L190__23__23_bignum_2e__2a_: at line 50105. Moving loop invariants out of this loop might help if it detected as a loop, but I don't know how to check whether it is. Perhaps you might check and report whether this small loop is treated as a loop or whether, again, the entire function is the only loop detected. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157
[Bug bootstrap/39173] New: PR37739 (bootstrap failure) applies to 4.3.3
PR 37739 applies to 4.3.3, as does the fix (applied by hand to my sources). I'm running make check right now with the patched sources. -- Summary: PR37739 (bootstrap failure) applies to 4.3.3 Product: gcc Version: 4.3.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lucier at math dot purdue dot edu GCC build triplet: powerpc64-unknown-linux-gnu GCC host triplet: powerpc64-unknown-linux-gnu GCC target triplet: powerpc64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39173
[Bug bootstrap/39173] PR37739 (bootstrap failure) applies to 4.3.3
--- Comment #1 from lucier at math dot purdue dot edu 2009-02-12 22:45 --- The test suite has finished (I only built the C compiler), and results are at http://gcc.gnu.org/ml/gcc-testresults/2009-02/msg01220.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39173
[Bug bootstrap/37739] [4.4 Regression] bootstrap broken with core gcc gcc-4.2.x
--- Comment #12 from lucier at math dot purdue dot edu 2009-02-11 18:13 --- I just got the same error with 140 12:54 ../../gcc-4.3.3/configure --prefix=/pkgs/gcc-4.3.3 --enable-languages=c 141 12:54 make -j 4 bootstrap build.log trying to build gcc-4.3.3 with [luc...@descartes gcc-4.3.3]$ gcc -v Using built-in specs. Target: ppc64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --enable-secureplt --with-long-double-128 --build=ppc64-redhat-linux --target=ppc64-redhat-linux --with-cpu=default32 Thread model: posix gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) So, if it was fixed on mainline, it wasn't fixed on the branch. Should I just reopen this against 4.3.3, or should I file a new bug report for 4.3.3 and refer back to this one. -- lucier at math dot purdue dot edu changed: What|Removed |Added CC||lucier at math dot purdue ||dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37739
[Bug middle-end/39157] New: Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher
With this compiler [luc...@descartes gambit]$ gcc -v Using built-in specs. Target: powerpc64-unknown-linux-gnu Configured with: ../../gcc-4.3.3/configure --prefix=/pkgs/gcc-4.3.3 --enable-languages=c --with-cpu=default64 Thread model: posix gcc version 4.3.3 (GCC) with the file compiler.i found here: http://www.math.purdue.edu/~lucier/bugzilla/8/ attempting to compile with these options: gcc -m64 -mcpu=970 -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -rdynamic -shared can't compile in 8GB of RAM. With this compiler: euler-77% /pkgs/gcc-4.2.3/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../gcc-4.2.3/configure --prefix=/pkgs/gcc-4.2.3 --enable-checking=release --with-gmp=/pkgs/gmp-4.2.2 --with-mpfr=/pkgs/gmp-4.2.2 Thread model: posix gcc version 4.2.3 and these options: gcc -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -rdynamic -shared it can't compiler in 20GB of RAM. (That machine has only 16GB of RAM, so I killed the compile when it hit 20GB of physical+virtual memory.) It compiles just fine in about 1GB of RAM with euler-76% gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../configure --prefix=/pkgs/gcc-4.1.2 Thread model: posix gcc version 4.1.2 compiler.i is the output from the Gambit Scheme-C compiler; the source scheme program is from a standard benchmark suite for Scheme compilers. So I found this by trying to change the code generator for Gambit and running the benchmark suite on x86_64. I don't know how this can be fixed. Basically, the entire middle-end infrastructure since 4.1.* is telling people like me with computer-generated code like this to just go away (to put it very politely). On Mac OS X 10.5.*, Apple bundles their version of 4.0.1, which compiles this just fine; on Red Hat 5.2, they bundle their version of 4.1.2 (I think, my RH5.2 box is down at the moment), which compiles this just fine; but on Ubuntu 8.10 or Fedora 10 you can't compile this because they bundle newer compilers. (I guess I'll see if I can install 4.1.* on both of these.) As a stopgap measure, perhaps someone can tell me what optimization level to use. As you can see, I use -O1 and a few others (mainly -fschedule-insns2). gcc 4.1.* and earlier compiled something like this just fine, but -O1 must mean something different now. -- Summary: Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher Product: gcc Version: 4.3.3 Status: UNCONFIRMED Severity: major Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lucier at math dot purdue dot edu GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157
[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines
--- Comment #81 from lucier at math dot purdue dot edu 2009-02-04 17:27 --- Created an attachment (id=17243) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17243action=view) Memory and CPU statistics for 2009/02/04 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines
--- Comment #82 from lucier at math dot purdue dot edu 2009-02-04 17:28 --- I still have the bitmap.c patch from http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01270.html in my tree so I don't get meaningless statistics for bitmaps. (Kenny installed in the trunk something like the patch above for alloc-pool.c.) There are more bitmaps allocated than on 2008-09-26 (13GB instead of 12GB). 3GB was allocated in alloc-pool. Execution time was worse, 228.17 user seconds versus 168 seconds. I didn't watch top to estimate the maximum memory usage. This is with euler-8% /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-gather-detailed-mem-stats Thread model: posix gcc version 4.4.0 20090204 (experimental) [trunk revision 143922] (GCC) Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
[Bug bootstrap/26814] Bootstrapping with a non default ABI (-m64 on ppc-darwin or on ppc-linux with a compiler defaulting to 32 and now defaulting to 64)
--- Comment #19 from lucier at math dot purdue dot edu 2008-12-29 01:30 --- Maybe you could offer a few more details; I just tried % cat ../../mainline/build-and-check-gcc-64-32 #!/bin/tcsh /bin/rm -rf *; ../../mainline/configure CC='/usr/bin/gcc-4.0 -mcpu=970 -m64' --build=powerpc64-apple-darwin9.6.0 --host=powerpc64-apple-darwin9.6.0 --target=powerpc-apple-darwin9.6.0 --with-gmp-include=/sw/include/ --with-gmp-lib=/sw/lib/ppc64 --with-mpfr-include=/sw/include/ --with-mpfr-lib=/sw/lib/ppc64 --prefix=/pkgs/gcc-4.4.0-64-32 --with-libiconv-prefix=/usr --with-system-zlib; make -j 4 BOOT_LDFLAGS='-Wl,-search_paths_first' build.log (make install) (make -k -j 8 check RUNTESTFLAGS=--target_board 'unix{-mcpu=970/-m64}' check.log ; make mail-report.log) (make bootstrap isn't even available) and ended up with checking for powerpc-apple-darwin9.6.0-gcc... /Users/lucier/programs/gcc/objdirs/mainline/./gcc/xgcc -B/Users/lucier/programs/gcc/objdirs/mainline/./gcc/ -B/pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/bin/ -B/pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/lib/ -isystem /pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/include -isystem /pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/sys-include checking for suffix of object files... configure: error: in `/Users/lucier/programs/gcc/objdirs/mainline/powerpc-apple-darwin9.6.0/libgcc': configure: error: cannot compute suffix of object files: cannot compile See `config.log' for more details. while config.log gives configure:2611: /Users/lucier/programs/gcc/objdirs/mainline/./gcc/xgcc -B/Users/lucier/programs/gcc/objdirs/mainline/./gcc/ -B/pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/bin/ -B/pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/lib/ -isystem /pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/include -isystem /pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/sys-include -c -g -O2 conftest.c 5 /Users/lucier/programs/gcc/objdirs/mainline/./gcc/as: line 76: exec: : not found configure:2614: $? = 1 configure: failed program was: | /* confdefs.h. */ | | #define PACKAGE_NAME GNU C Runtime Library | #define PACKAGE_TARNAME libgcc | #define PACKAGE_VERSION 1.0 | #define PACKAGE_STRING GNU C Runtime Library 1.0 | #define PACKAGE_BUGREPORT | /* end confdefs.h. */ | | int | main () | { | | ; | return 0; | } configure:2627: error: in `/Users/lucier/programs/gcc/objdirs/mainline/powerpc-apple-darwin9.6.0/libgcc': configure:2630: error: cannot compute suffix of object files: cannot compile See `config.log' for more details. It appears to be looking for a special as. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26814
[Bug bootstrap/26814] Bootstrapping with a non default ABI (-m64 on ppc-darwin or on ppc-linux with a compiler defaulting to 32 and now defaulting to 64)
--- Comment #21 from lucier at math dot purdue dot edu 2008-12-29 03:06 --- Thanks for your comments. So, to get back to basics, how do I build a compiler on darwin that has a 64-bit gcc/cc1/etc., but compiles to 32-bit binaries by default? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26814
[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #42 from lucier at math dot purdue dot edu 2008-12-07 19:39 --- Just a comment that -fforward-propagate isn't enabled at -O1 (the main optimization option in the test) while the cse code it replaces was enabled at -O1. This is presumably why adding -fno-forward-propagate to the command line in the test a year ago didn't affect the generated code. Adding -fno-forward-propagate to the command line of the test case with revision r118475 of gcc changes the generated code, but doesn't improve the problem code in the main loop. Updated the title to report the performance hit on Intel(R) Xeon(R) CPU X5460 @ 3.16GHz as reported by /proc/cpuinfo -- lucier at math dot purdue dot edu changed: What|Removed |Added Summary|[4.3/4.4 Regression] 22%|[4.3/4.4 Regression] 30% |performance slowdown from |performance slowdown in |4.2.2 to 4.3/4.4.0 in |floating-point code caused |floating-point code |by r118475 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
--- Comment #39 from lucier at math dot purdue dot edu 2008-12-06 16:37 --- I may have narrowed down the problem a bit. With this compiler (revision 118491): pythagoras-277% /tmp/lucier/install/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --enable-checking=release --prefix=/tmp/lucier/install --enable-languages=c Thread model: posix gcc version 4.3.0 20061105 (experimental) one gets (on a faster machine than previous reports) (time (direct-fft-recursive-4 a table)) 133 ms real time 140 ms cpu time (140 user, 0 system) no collections 64 bytes allocated no minor faults no major faults With this compiler (revision 118474): pythagoras-24% /tmp/lucier/install/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --enable-checking=release --prefix=/tmp/lucier/install --enable-languages=c Thread model: posix gcc version 4.3.0 20061104 (experimental) one gets (time (direct-fft-recursive-4 a table)) 116 ms real time 108 ms cpu time (108 user, 0 system) no collections 64 bytes allocated no minor faults no major faults and you see the typical problem with assembly code from direct.i with the later compiler. Paolo may have been right about fwprop, this patch was installed that day: Author: bonzini Date: Sat Nov 4 08:36:45 2006 New Revision: 118475 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=118475 Log: 2006-11-03 Paolo Bonzini [EMAIL PROTECTED] Steven Bosscher [EMAIL PROTECTED] * fwprop.c: New file. * Makefile.in: Add fwprop.o. * tree-pass.h (pass_rtl_fwprop, pass_rtl_fwprop_with_addr): New. * passes.c (init_optimization_passes): Schedule forward propagation. * rtlanal.c (loc_mentioned_in_p): Support NULL value of the second parameter. * timevar.def (TV_FWPROP): New. * common.opt (-fforward-propagate): New. * opts.c (decode_options): Enable forward propagation at -O2. * gcse.c (one_cprop_pass): Do not run local cprop unless touching jumps. * cse.c (fold_rtx_subreg, fold_rtx_mem, fold_rtx_mem_1, find_best_addr, canon_for_address, table_size): Remove. (new_basic_block, insert, remove_from_table): Remove references to table_size. (fold_rtx): Process SUBREGs and MEMs with equiv_constant, make simplification loop more straightforward by not calling fold_rtx recursively. (equiv_constant): Move here a small part of fold_rtx_subreg, do not call fold_rtx. Call avoid_constant_pool_reference to process MEMs. * recog.c (canonicalize_change_group): New. * recog.h (canonicalize_change_group): New. * doc/invoke.texi (Optimization Options): Document fwprop. * doc/passes.texi (RTL passes): Document fwprop. Added: trunk/gcc/fwprop.c Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/common.opt trunk/gcc/cse.c trunk/gcc/doc/invoke.texi trunk/gcc/doc/passes.texi trunk/gcc/gcse.c trunk/gcc/opts.c trunk/gcc/passes.c trunk/gcc/recog.c trunk/gcc/recog.h trunk/gcc/rtlanal.c trunk/gcc/timevar.def trunk/gcc/tree-pass.h -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug target/37878] [4.4 regression] PPC64 ldu command generated with invalid offset
--- Comment #14 from lucier at math dot purdue dot edu 2008-10-30 00:02 --- Thank you, this fixes the original bug. I took the liberty of closing this bug report. Thanks again. Brad -- lucier at math dot purdue dot edu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37878
[Bug bootstrap/37639] Bootstrap fails with may be used uninitialized warning in c-parser.c
--- Comment #3 from lucier at math dot purdue dot edu 2008-10-30 00:19 --- You're right, it was fixed by Revision 141193 - (view) (download) - [select for diffs] Modified Fri Oct 17 14:50:07 2008 UTC (12 days, 9 hours ago) by krebbel File length: 238566 byte(s) Diff to previous 140914 (colored) 2008-10-17 Andreas Krebbel [EMAIL PROTECTED] * c-parser.c (c_parser_binary_expression): Silence the uninitialized variable warning emitted for binary_loc. I hadn't noticed because I started adding --disable-werror to my configuration files. Closing as fixed. -- lucier at math dot purdue dot edu changed: What|Removed |Added Status|WAITING |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37639
[Bug target/37878] [4.4 regression] PPC64 ldu command generated with invalid offset
--- Comment #9 from lucier at math dot purdue dot edu 2008-10-23 19:20 --- I bootstrapped and regtested the suggested patch. There was one fewer FAIL in the gcc tests: FAIL: gcc.c-torture/execute/nestfunc-6.c execution, -O0 and one more failure in the libgomp tests: FAIL: libgomp.fortran/crayptr2.f90 -O3 -fomit-frame-pointer -funroll-loops execution test However, it's not clear to me from the output of gdb implies that this may is a problem with the compiled code (the command lines are taken from the log file): [descartes:powerpc64-apple-darwin9.5.0/libgomp/testsuite] lucier% /Users/lucier/programs/gcc/objdirs/mainline/gcc/xgcc -B/Users/lucier/programs/gcc/objdirs/mainline/gcc/ /Users/lucier/programs/gcc/mainline/libgomp/testsuite/libgomp.fortran/crayptr2.f90 -B/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/ -I/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp -I/Users/lucier/programs/gcc/mainline/libgomp/testsuite/.. -shared-libgcc -fmessage-length=0 -fopenmp -O3 -fomit-frame-pointer -funroll-loops -fopenmp -fcray-pointer -static-libgcc -L/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/.libs -lgomp -L/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/../libgfortran/.libs -lgfortranbegin -lgfortran -lm -mcpu=970 -m64 -o ./crayptr2.exe [descartes:powerpc64-apple-darwin9.5.0/libgomp/testsuite] lucier% env LD_LIBRARY_PATH=.:/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/.libs:/Users/lucier/programs/gcc/objdirs/mainline/gcc:/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/../libgfortran/.libs:.:/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/.libs:/Users/lucier/programs/gcc/objdirs/mainline/gcc:/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/../libgfortran/.libs gdb ./crayptr2.exe GNU gdb 6.3.50-20050815 (Apple version gdb-962) (Sat Jul 26 08:17:57 UTC 2008) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as powerpc-apple-darwin...Reading symbols for shared libraries done (gdb) run Starting program: /Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/libgomp/testsuite/crayptr2.exe warning: posix_spawn failed, trying execvp, error: 86 Reading symbols for shared libraries +++.. done Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x 0x00011678 in MAIN__.omp_fn.0 () (gdb) where #0 0x00011678 in MAIN__.omp_fn.0 () #1 0x0001187c in MAIN__ () #2 0x000118e4 in main (argc=1, argv=value temporarily unavailable, due to optimizations) at ../../../../mainline/libgfortran/fmain.c:21 It is completely reproducible, however. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37878