[Bug bootstrap/37632] Darwin bootstrap failure, ld: bl out of range

2010-04-20 Thread lucier at math dot purdue dot edu


--- Comment #11 from lucier at math dot purdue dot edu  2010-04-21 01:17 
---
Thank you for your way to build a 64-bit gcc, it has now worked for me using
Apple's gcc-4.0.1 as you say, so I'll close this bug as WORKSFORME.

Brad


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||WORKSFORME


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37632



[Bug bootstrap/37632] Darwin bootstrap failure, ld: bl out of range

2010-04-12 Thread lucier at math dot purdue dot edu


--- Comment #10 from lucier at math dot purdue dot edu  2010-04-12 13:17 
---
Subject: Re:  Darwin bootstrap failure, ld: bl out of
 range

On Sun, 2010-04-11 at 10:29 +, iains at gcc dot gnu dot org wrote:

 2. As a matter of curiosity - do you see a big improvement in performance from
 building gcc 64bit?
 
   I normally build ppc-apple-darwin9 - since this is quite capable of
 generating m64 code should I have an app that requires it.

I build a 64-bit gcc so that I can compile codes that require gcc to use
more than 4GB of memory.

It will take me a day or two before I can get back to your other
comments.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37632



[Bug bootstrap/37632] Darwin bootstrap failure, ld: bl out of range

2010-04-10 Thread lucier at math dot purdue dot edu


--- Comment #4 from lucier at math dot purdue dot edu  2010-04-10 20:43 
---
I can't get it to bootstrap with the following:

[monster-mac:~/programs/gcc/gcc-4_4-branch] lucier% cat build-gcc 
#!/bin/tcsh
/bin/rm -rf *; ../../gcc-4_4-branch/configure CC='/pkgs/gcc-4.3.2-64/bin/gcc
-mcpu=970 -m64' --build=powerpc64-apple-darwin9.8.0
--host=powerpc64-apple-darwin9.8.0 --target=powerpc64-apple-darwin9.8.0
--prefix=/pkgs/gcc-4.4.4-64 --with-libiconv-prefix=/usr  --with-system-zlib;
make bootstrap BOOT_LDFLAGS='-Wl,-search_paths_first'  build.log  (make
install)  (make -k -j 8 check RUNTESTFLAGS=--target_board
'unix{-mcpu=970/-m64}'   check.log ; make mail-report.log)

The error is

checking for flex... flex
checking lex output file root... configure: error: cannot find output from
flex; giving up
make[2]: *** [configure-stage1-gmp] Error 1
make[1]: *** [stage1-bubble] Error 2
make: *** [bootstrap] Error 2

And I get the same error if I use your configure line.

So I can't reproduce this working with 

[monster-mac:~/programs/gcc/gcc-4_4-branch] lucier% head LAST_UPDATED
gcc/BASE-VER 
== LAST_UPDATED ==
Sat Apr 10 16:26:49 EDT 2010
Sat Apr 10 20:26:49 UTC 2010 (revision 158195)

== gcc/BASE-VER ==
4.4.4

and with in-source gmp, mpfr, and mpc directories.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37632



[Bug bootstrap/37632] Darwin bootstrap failure, ld: bl out of range

2010-04-10 Thread lucier at math dot purdue dot edu


--- Comment #6 from lucier at math dot purdue dot edu  2010-04-10 21:18 
---
I wrote

 And I get the same error if I use your configure line.

which means using gcc-4.0.1; I used *exactly* your configure line.

Did you have the gmp and mpfr sources in the gcc-4_4-branch source directory?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37632



[Bug tree-optimization/26854] [4.3/4.4/4.5 Regression] Inordinate compile times on large routines

2010-03-27 Thread lucier at math dot purdue dot edu


--- Comment #117 from lucier at math dot purdue dot edu  2010-03-27 16:38 
---
Subject: Re:  [4.3/4.4/4.5 Regression] Inordinate compile times on large
routines


On Mar 27, 2010, at 7:14 AM, rguenth at gcc dot gnu dot org wrote:

 I wonder if the parsing numbers are accurate as the initial report has
 like 9s parsing while the current ones are 200s.  Can you explain  
 that
 difference?  (like, were you testing different source?)

Yes, different source (compiler.i instead of all.i), different  
(faster) machine.  Perhaps gathering the detailed memory stats affect  
the parser time.

Here are times for the original source file all.i using the same  
machine and compiler as in the immediately previous report for  
compiler.i:

  df liveinitialized regs:  45.00 ( 8%) usr   0.00 ( 0%) sys  45.04  
( 8%) wall   0 kB ( 0%) ggc
  parser:  19.60 ( 3%) usr   1.22 ( 7%) sys  21.25  
( 4%) wall   70217 kB ( 2%) ggc
  scheduling: 301.86 (52%) usr   0.00 ( 0%) sys 301.87  
(51%) wall8739 kB ( 0%) ggc
  TOTAL : 579.8817.55
597.653393985 kB

Glancing at top, the maximum reported memory usage was  13GB.  I'll  
attach the detailed results for all.i next

 As is the testcase(s) are an interesting source of information -  
 maybe we
 should gather those up on a page in the wiki just in case we end up  
 closing
 this bug at some point (I suggest not to at the moment, the parsing  
 times
 look odd and 20GB memory use doesn't sound reasonable).  Did you ever
 test other compilers and see how they perform with respect to memory  
 usage
 and compile time?

No, none that were not a gcc derivative.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug tree-optimization/26854] [4.3/4.4/4.5 Regression] Inordinate compile times on large routines

2010-03-27 Thread lucier at math dot purdue dot edu


--- Comment #118 from lucier at math dot purdue dot edu  2010-03-27 16:44 
---
Created an attachment (id=20224)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20224action=view)
time/memory report compiling all.i with -O3

These are the detailed time and memory statistics reported when compiling all.i
with -O3 -fschedule-insns on x86-64.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug tree-optimization/26854] [4.3/4.4/4.5 Regression] Inordinate compile times on large routines

2010-03-26 Thread lucier at math dot purdue dot edu


--- Comment #113 from lucier at math dot purdue dot edu  2010-03-27 04:27 
---
Created an attachment (id=20220)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20220action=view)
time/mem report compiling compiler.i

This is the time and detailed memory report for 20100302 compiling compiler.i
above with main optimization options -O1 -fschedule-insns2 (precise command
line and configuration options are given at the top of the file).

With these optimization levels cpu time and memory don't look too bad to me. 
The main routines are

 parser: 320.93 (59%) usr   1.40 (27%) sys 322.62 (59%) wall 
103143 kB (15%) ggc
 tree CFG cleanup  :  73.43 (14%) usr   0.01 ( 0%) sys  73.46 (13%) wall   
1388 kB ( 0%) ggc

Nothing else is above 3%.

I'm building today's gcc on an X86-64 RHEL5 machine with more memory to test
with -O3 -fschedule-insns, as this set of options now gives about 20% speedup
on some of my codes of this type.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug tree-optimization/26854] [4.3/4.4/4.5 Regression] Inordinate compile times on large routines

2010-03-26 Thread lucier at math dot purdue dot edu


--- Comment #114 from lucier at math dot purdue dot edu  2010-03-27 04:59 
---
Created an attachment (id=20221)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20221action=view)
time/mem report compiling compiler.i

This is the time and detailed memory report for compiling compiler.i with
today's gcc and optimization level -O3 -fschedule-insns.  Again, the detailed
configuration information and command line are contained at the beginning of
the file.

Except for taking  20GB of RAM, this doesn't look too bad, either.  The passes
taking the most time are:

 parser: 222.18 (21%) usr   2.95 (11%) sys 225.37 (21%) wall 
103148 kB (11%) ggc
 tree CFG cleanup  :  63.67 ( 6%) usr   0.00 ( 0%) sys  63.60 ( 6%) wall   
2467 kB ( 0%) ggc
 scheduling: 394.04 (37%) usr   0.00 ( 0%) sys 394.04 (36%) wall   
5824 kB ( 1%) ggc
 TOTAL :1056.6926.47  1083.41
916872 kB


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug tree-optimization/26854] [4.3/4.4/4.5 Regression] Inordinate compile times on large routines

2010-03-26 Thread lucier at math dot purdue dot edu


--- Comment #115 from lucier at math dot purdue dot edu  2010-03-27 05:20 
---
Created an attachment (id=20222)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20222action=view)
time/mem report compiling compiler.i with -O1

Here is the time and memory report with -O1 -fschedule-insns2 on the same
machine as the -O3 -fschedule-insns report.

The biggest times are:

 parser: 224.89 (54%) usr   2.61 (24%) sys 226.97 (53%) wall 
103148 kB (15%) ggc
 tree CFG cleanup  :  60.61 (15%) usr   0.00 ( 0%) sys  60.58 (14%) wall   
1388 kB ( 0%) ggc
 reload:  19.17 ( 5%) usr   0.00 ( 0%) sys  19.17 ( 5%) wall   
4694 kB ( 1%) ggc
 TOTAL : 413.2910.95   424.28
709657 kB


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

  Attachment #20220|0   |1
is obsolete||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug bootstrap/42002] Bootstrap failure: ld doesn't find 64-bit libelf on Fedora 11

2009-11-11 Thread lucier at math dot purdue dot edu


--- Comment #2 from lucier at math dot purdue dot edu  2009-11-11 13:52 
---
Thanks a lot for the explanation!

I'm looking through the list of packages on Fedora with elfutils in the title;
there is no elfutils-libelf-devel.ppc64, but the only ppc64 packages I can find
are

elfutils-devel-0.142-1.fc11 (ppc64)

with file list

/usr/include/dwarf.h
/usr/include/elfutils
/usr/include/elfutils/elf-knowledge.h
/usr/include/elfutils/libasm.h
/usr/include/elfutils/libdw.h
/usr/include/elfutils/libdwfl.h
/usr/include/elfutils/libebl.h
/usr/include/elfutils/version.h
/usr/lib64/libasm.so
/usr/lib64/libdw.so
/usr/lib64/libebl.a

and

elfutils-libelf-0.142-1.fc11 (ppc64)

with file list

/usr/lib64/libelf-0.142.so
/usr/lib64/libelf.so.1

So I put in the link from libelf.so to libelf.so.1 by hand and the bootstrap is
proceeding.

Should I file a bug report with Fedora?  I was told Fedora 12 won't support
ppc64, so maybe there's no point.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42002



[Bug bootstrap/42002] New: Bootstrap failure: ld doesn't find 64-bit libelf on Fedora 11

2009-11-10 Thread lucier at math dot purdue dot edu
I configured today's mainline with

../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c,c++
--enable-stage1-languages=c,c++ --with-cpu=default64 --enable-checking=release

and bootstrap fails with

/home/lucier/programs/gcc/objdirs/mainline/./prev-gcc/xgcc
-B/home/lucier/programs/gcc/objdirs/mainline/./prev-gcc/
-B/pkgs/gcc-mainline/powerpc64-unknown-linux-gnu/bin/
-B/pkgs/gcc-mainline/powerpc64-unknown-linux-gnu/bin/
-B/pkgs/gcc-mainline/powerpc64-unknown-linux-gnu/lib/ -isystem
/pkgs/gcc-mainline/powerpc64-unknown-linux-gnu/include -isystem
/pkgs/gcc-mainline/powerpc64-unknown-linux-gnu/sys-include -g -O2 -gtoggle
-DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes
-Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long
-Wno-variadic-macros -Wno-overlength-strings -Werror -Wold-style-definition
-Wc++-compat   -DHAVE_CONFIG_H  -o cc1-dummy c-lang.o stub-objc.o attribs.o
c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o c-aux-info.o
c-common.o c-opts.o c-format.o c-semantics.o c-ppoutput.o c-cppbuiltin.o
c-objc-common.o c-dump.o c-pch.o c-parser.o rs6000-c.o c-gimplify.o
tree-mudflap.o c-pretty-print.o c-omp.o \
  dummy-checksum.o main.o  libbackend.a ../libcpp/libcpp.a
../libdecnumber/libdecnumber.a ../libcpp/libcpp.a   ../libiberty/libiberty.a
../libdecnumber/libdecnumber.a   
-L/home/lucier/programs/gcc/objdirs/mainline/./gmp/.libs
-L/home/lucier/programs/gcc/objdirs/mainline/./gmp/_libs
-L/home/lucier/programs/gcc/objdirs/mainline/./mpfr/.libs
-L/home/lucier/programs/gcc/objdirs/mainline/./mpfr/_libs -lmpfr -lgmp
-rdynamic -ldl  -L../zlib -lz -lelf
/usr/bin/ld: skipping incompatible /usr/lib/libelf.so when searching for -lelf
/usr/bin/ld: cannot find -lelf
collect2: ld returned 1 exit status

The object files are 64-bit:

[luc...@lambda-head mainline]$ file gcc/rs6000-c.o
gcc/rs6000-c.o: ELF 64-bit MSB relocatable, 64-bit PowerPC or cisco 7500,
version 1 (SYSV), not stripped

and a 64-bit libelf is installed:

[luc...@lambda-head mainline]$ file /usr/lib64/libelf*
/usr/lib64/libelf-0.142.so: ELF 64-bit MSB shared object, 64-bit PowerPC or
cisco 7500, version 1 (SYSV), dynamically linked, stripped
/usr/lib64/libelf.so.1: symbolic link to `libelf-0.142.so'

but I don't know why it isn't being found.


-- 
   Summary: Bootstrap failure: ld doesn't find 64-bit libelf on
Fedora 11
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: lucier at math dot purdue dot edu
 GCC build triplet: powerpc64-unknown-linux-gnu/
  GCC host triplet: powerpc64-unknown-linux-gnu
GCC target triplet: powerpc64-unknown-linux-gnu/


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42002



[Bug bootstrap/40968] [4.5 Regression] ICE when compiling O2g.gch; problem with --enable-gather-detailed-mem-stats

2009-11-09 Thread lucier at math dot purdue dot edu


--- Comment #4 from lucier at math dot purdue dot edu  2009-11-10 00:28 
---
This is fixed, at least by the time of

gcc version 4.5.0 20091109 (experimental) [trunk revision 154037] (GCC) 


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40968



[Bug rtl-optimization/41891] [4.5 Regression] ICE in move_loop_invariants

2009-11-01 Thread lucier at math dot purdue dot edu


--- Comment #3 from lucier at math dot purdue dot edu  2009-11-01 23:55 
---
This one works:

frying-pan:~/programs/gambc-v4_5_2-devel /pkgs/gcc-mainline/bin/gcc
-march=core2 -msse4 -save-temps -Wno-unused -O1 -fno-math-errno
-fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv
-fomit-frame-pointer -fPIC -fno-common -mieee-fp -c -o _io.o _io.i
frying-pan:~/programs/gambc-v4_5_2-devel /pkgs/gcc-mainline/bin/gcc -v
   
Using built-in specs.
COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc
COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c --enable-checking=release
Thread model: posix
gcc version 4.5.0 20091014 (experimental) [trunk revision 152748] (GCC) 

This one fails:

frying-pan:~/programs/gambc-v4_5_2-devel /pkgs/gcc-mainline/bin/gcc
-march=core2 -msse4 -save-temps -Wno-unused -O1 -fno-math-errno
-fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv
-fomit-frame-pointer -fPIC -fno-common -mieee-fp -c -o _io.o _io.i
_io.i: In function â:
_io.i:15174:1: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.
frying-pan:~/programs/gambc-v4_5_2-devel /pkgs/gcc-mainline/bin/gcc -v
   
Using built-in specs.
COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc
COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c --enable-checking=release
Thread model: posix
gcc version 4.5.0 20091015 (experimental) [trunk revision 152797] (GCC) 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41891



[Bug c/41891] New: ICE in move_loop_invariants

2009-10-31 Thread lucier at math dot purdue dot edu
With this compiler

frying-pan:~/programs/gambc-v4_5_2-devel /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc
COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
--enable-checking=release --enable-languages=c
Thread model: posix
gcc version 4.5.0 20091031 (experimental) [trunk revision 153773] (GCC) 

I get an ICE:

frying-pan:~/programs/gambc-v4_5_2-devel /pkgs/gcc-mainline/bin/gcc
-march=core2 -msse4 -save-temps -Wno-unused -O1 -fno-math-errno
-fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv
-fomit-frame-pointer -fPIC -fno-common -mieee-fp -c -o _io.o _io.i
_io.i: In function ‘___H__23__23_read_2d_six_2d_datum_2d_or_2d_eof’:
_io.i:15174:1: internal compiler error: Segmentation fault

In gdb I get

frying-pan:~/programs/gambc-v4_5_2-devel gdb
/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/cc1 
GNU gdb (GDB) 7.0-ubuntu
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-linux-gnu.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from
/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/cc1...done.
(gdb) run -march=core2 -msse4 -Wno-unused -O1 -fno-math-errno -fschedule-insns2
-fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC
-fno-common -mieee-fp _io.i
Starting program:
/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/cc1 -march=core2
-msse4 -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math
-fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
_io.i
 btowc wctob mbrlen __signbitf __signbit __signbitl
___H__23__23_read_2d_six_2d_datum_2d_or_2d_eof
Analyzing compilation unit
Performing interprocedural optimizations
 visibility  early_local_cleanups whole-program inline static-var
pure-constAssembling functions:
 ___H__23__23_read_2d_six_2d_datum_2d_or_2d_eof
Program received signal SIGSEGV, Segmentation fault.
bitmap_clear (head=0x78) at ../../../mainline/gcc/bitmap.c:297
297   if (head-first)
(gdb) where
#0  bitmap_clear (head=0x78) at ../../../mainline/gcc/bitmap.c:297
#1  0x00622c78 in free_loop_data () at
../../../mainline/gcc/loop-invariant.c:1568
#2  move_loop_invariants () at ../../../mainline/gcc/loop-invariant.c:1906
#3  0x006206d7 in rtl_move_loop_invariants () at
../../../mainline/gcc/loop-init.c:254
#4  0x006544f0 in execute_one_pass (pass=0xf8fc60) at
../../../mainline/gcc/passes.c:1518
#5  0x00654705 in execute_pass_list (pass=0xf8fc60) at
../../../mainline/gcc/passes.c:1567
#6  0x00654717 in execute_pass_list (pass=0xf8fb40) at
../../../mainline/gcc/passes.c:1568
#7  0x00654717 in execute_pass_list (pass=0x1010d60) at
../../../mainline/gcc/passes.c:1568
#8  0x007263dc in tree_rest_of_compilation (fndecl=0x7713fe00) at
../../../mainline/gcc/tree-optimize.c:392
#9  0x00851b7c in cgraph_expand_function (node=0x7713fd00) at
../../../mainline/gcc/cgraphunit.c:1160
#10 0x00853485 in cgraph_expand_all_functions () at
../../../mainline/gcc/cgraphunit.c:1219
#11 cgraph_optimize () at ../../../mainline/gcc/cgraphunit.c:1465
#12 0x0085383f in cgraph_finalize_compilation_unit () at
../../../mainline/gcc/cgraphunit.c:1089
#13 0x0048e45b in c_write_global_declarations () at
../../../mainline/gcc/c-decl.c:9489
#14 0x006e98ac in compile_file (argc=15, argv=0x7fffe5d8) at
../../../mainline/gcc/toplev.c:1061
#15 do_compile (argc=15, argv=0x7fffe5d8) at
../../../mainline/gcc/toplev.c:2408
#16 toplev_main (argc=15, argv=0x7fffe5d8) at
../../../mainline/gcc/toplev.c:2450
#17 0x773d8abd in __libc_start_main () from /lib/libc.so.6
#18 0x0047af09 in _start () at ../sysdeps/x86_64/elf/start.S:113
(gdb) print head
$1 = (bitmap) 0x78

I'll add the (unfortunately very long) input file as an attachment.

Brad


-- 
   Summary: ICE in move_loop_invariants
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: lucier at math dot purdue dot edu
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41891



[Bug c/41891] ICE in move_loop_invariants

2009-10-31 Thread lucier at math dot purdue dot edu


--- Comment #1 from lucier at math dot purdue dot edu  2009-10-31 16:56 
---
Created an attachment (id=18942)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18942action=view)
test case

This is the test case.

BTW, this works in 4.4.1.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41891



[Bug middle-end/41891] ICE in move_loop_invariants

2009-10-31 Thread lucier at math dot purdue dot edu


--- Comment #2 from lucier at math dot purdue dot edu  2009-10-31 17:32 
---
There is no ICE with

heine:~/Desktop /pkgs/gcc-mainline/bin/gcc -vUsing built-in specs.
COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc
COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/pkgs/gcc-mainline --enable-languages=c --disable-multilib
Thread model: posix
gcc version 4.5.0 20091005 (experimental) [trunk revision 152459] (GCC) 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41891



[Bug bootstrap/40968] [4.5 Regression] ICE when compiling O2g.gch; problem with --enable-gather-detailed-mem-stats

2009-10-05 Thread lucier at math dot purdue dot edu


--- Comment #3 from lucier at math dot purdue dot edu  2009-10-06 00:51 
---
Now I'm getting comparison errors with

[trunk revision 152459]

and the same configuration:

Comparing stages 2 and 3
warning: gcc/cc1plus-checksum.o differs
warning: gcc/cc1-checksum.o differs
Bootstrap comparison failure!
x86_64-unknown-linux-gnu/libstdc++-v3/src/basic_file.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/future.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/basic_file.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/future.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/pool_allocator.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/debug.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/mt_allocator.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/locale_init.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/atomic.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/system_error.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs/locale.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/pool_allocator.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/debug.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/mt_allocator.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/locale_init.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/atomic.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/system_error.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/src/locale.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/eh_alloc.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/vec.o differs
x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/eh_globals.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/basic_file.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/future.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/basic_file.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/future.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/pool_allocator.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/debug.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/mt_allocator.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/locale_init.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/atomic.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/system_error.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs/locale.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/pool_allocator.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/debug.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/mt_allocator.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/locale_init.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/atomic.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/system_error.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/src/locale.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/libsupc++/guard.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/libsupc++/eh_alloc.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/libsupc++/vec.o differs
x86_64-unknown-linux-gnu/32/libstdc++-v3/libsupc++/eh_globals.o differs


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40968



[Bug target/41531] -O1 -fschedule-insns swscale error

2009-10-01 Thread lucier at math dot purdue dot edu


--- Comment #3 from lucier at math dot purdue dot edu  2009-10-01 13:19 
---
This is not the same problem as 24319.  Vlad thinks he fixed 24319, and indeed
the problem in this bug report from 4.4 is gone.  The reported problem in 4.5
is different.

Don't turn 234319 into a grab bag of any problem that arises when using
-fschedule-insns.

And, again, I can't reopen this bug.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41531



[Bug target/41176] ICE in reload_cse_simplify_operands at postreload.c:396

2009-10-01 Thread lucier at math dot purdue dot edu


--- Comment #5 from lucier at math dot purdue dot edu  2009-10-01 19:43 
---
No ICE with 4.3.3, either, but there is an ICE with

Target: ppc64-redhat-linux
gcc version 4.4.1 20090725 (Red Hat 4.4.1-2) (GCC) 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41176



[Bug rtl-optimization/24319] [4.3/4.4/4.5 regression] amd64 register spill error with -fschedule-insns

2009-09-03 Thread lucier at math dot purdue dot edu


--- Comment #23 from lucier at math dot purdue dot edu  2009-09-03 18:04 
---
The gprof output on the _num.i example, with and without -fschedule-insns is at

http://www.math.purdue.edu/~lucier/bugzilla/11/gprof.out-fschedule-insns.gz
http://www.math.purdue.edu/~lucier/bugzilla/11/gprof.out-fnoschedule-insns.gz


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24319



[Bug rtl-optimization/24319] [4.3/4.4/4.5 regression] amd64 register spill error with -fschedule-insns

2009-09-02 Thread lucier at math dot purdue dot edu


--- Comment #20 from lucier at math dot purdue dot edu  2009-09-02 16:52 
---
Vlad:

Thank you for your reply.

The times I reported are for -fschedule-insns without -fpressure-sched.

The times with the addition of -fpressure-sched are not much greater than
with -fschedule-insns by itself:

With -fschedule-insns

 scheduling:  22.89 (41%) usr   0.02 ( 2%) sys  22.93 (40%) wall   
2125 kB ( 1%) ggc
 integrated RA :   9.15 (16%) usr   0.06 ( 6%) sys   9.21 (16%) wall   
5488 kB ( 3%) ggc
 scheduling 2  :   0.60 ( 1%) usr   0.00 ( 0%) sys   0.62 ( 1%) wall   
 422 kB ( 0%) ggc
 TOTAL :  55.67 0.9356.66
180793 kB

with -fschedule-insns -fsched-pressure

 scheduling:  23.31 (42%) usr   0.02 ( 2%) sys  23.36 (41%) wall   
2125 kB ( 1%) ggc
 integrated RA :   9.18 (16%) usr   0.04 ( 4%) sys   9.22 (16%) wall   
5517 kB ( 3%) ggc
 scheduling 2  :   0.58 ( 1%) usr   0.01 ( 1%) sys   0.58 ( 1%) wall   
 251 kB ( 0%) ggc
 TOTAL :  55.77 1.0056.89
179606 kB

and with neither -fschedule-insns nor -fsched-pressure:

 integrated RA :   6.40 (21%) usr   0.05 ( 5%) sys   6.41 (21%) wall   
5087 kB ( 3%) ggc
 scheduling 2  :   0.58 ( 2%) usr   0.01 ( 1%) sys   0.60 ( 2%) wall   
 244 kB ( 0%) ggc
 TOTAL :  29.84 0.9830.83
176587 kB

So pre--register allocation instruction scheduling even without the new
register pressure--aware algorithm takes quite a bit of time.

I'll try to build a profiled gcc, and then if I find something I'll put it in a
new PR.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24319



[Bug rtl-optimization/24319] [4.3/4.4/4.5 regression] amd64 register spill error with -fschedule-insns

2009-09-02 Thread lucier at math dot purdue dot edu


--- Comment #22 from lucier at math dot purdue dot edu  2009-09-02 17:24 
---
The output of gprof on this example is at

http://www.math.purdue.edu/~lucier/bugzilla/11/gprof.out.gz

Everything that takes more than a second is

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 10.73  4.45 4.4515565 0.00 0.00  pop_scope
  7.28  7.47 3.02 314259938 0.00 0.00  free_list
  7.04 10.39 2.92 5575 0.00 0.00  dfs_enumerate_from
  5.62 12.72 2.33 314988148 0.00 0.00  alloc_INSN_LIST
  5.28 14.91 2.19 5292 0.00 0.00  get_loop_exit_edges
  5.14 17.04 2.13 331244515 0.00 0.00  bitmap_set_bit
  3.28 18.40 1.36   135329 0.00 0.00  sched_analyze_insn
  3.09 19.68 1.2829650 0.00 0.00  free_deps
  2.75 20.82 1.14 21773210 0.00 0.00  bitmap_bit_p
  2.35 21.80 0.98 14093247 0.00 0.00  dominated_by_p
  1.99 22.62 0.83  5357385 0.00 0.00  bitmap_ior_into
  1.88 23.40 0.78  199 0.00 0.00 
inverted_post_order_compute
  1.57 24.05 0.65  342 0.00 0.01  df_worklist_dataflow
  1.37 24.62 0.57 51278357 0.00 0.00  decl_jump_unsafe
  1.35 25.18 0.56 26181017 0.00 0.00  flow_bb_inside_loop_p
  1.13 25.65 0.47  201 0.00 0.00  post_order_compute

Nothing immediate jumps out at me.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24319



[Bug target/41176] ICE in reload_cse_simplify_operands at postreload.c:396

2009-09-02 Thread lucier at math dot purdue dot edu


--- Comment #2 from lucier at math dot purdue dot edu  2009-09-03 02:37 
---
I thought Vlad's scheduling/register allocation patch here

http://gcc.gnu.org/ml/gcc-patches/2009-09/msg3.html

which solves PR24319, might fix this problem, but it does not.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41176



[Bug rtl-optimization/24319] [4.3/4.4/4.5 regression] amd64 register spill error with -fschedule-insns

2009-09-01 Thread lucier at math dot purdue dot edu


--- Comment #18 from lucier at math dot purdue dot edu  2009-09-02 02:54 
---
Vlad:

The patch works great in my tests so far, thanks.

After installing your patch on today's trunk so that -fschedule-insns actually
works, I find it is quite expensive on large files.

For example, with today's trunk with your patches applied, for the file 

http://www.math.purdue.edu/~lucier/bugzilla/8/_num.i.gz

and the options

/pkgs/gcc-mainline-schedule/bin/gcc -Wno-unused -O1 -fno-math-errno
-fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv
-fomit-frame-pointer -fPIC -fno-common -mieee-fp -ftime-report -c _num.i

total CPU time on my x86-64 box is

 TOTAL :  29.60 0.9230.54
176587 kB

while with -fschedule-insns it is

 scheduling:  23.03 (42%) usr   0.02 ( 2%) sys  23.07 (41%) wall   
2125 kB ( 1%) ggc
 TOTAL :  55.47 1.0356.57
180793 kB

I don't know whether you can make it go faster now, or whether that's
unreasonable and I should just wait and file another PR.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24319



[Bug rtl-optimization/24319] [4.3/4.4/4.5 regression] amd64 register spill error with -fschedule-insns

2009-08-28 Thread lucier at math dot purdue dot edu


--- Comment #16 from lucier at math dot purdue dot edu  2009-08-28 16:54 
---
Re: Comment 7:

Since end users will gain little benefit from being able to run the sched1 pass
on x86 code, I don't think this is a serious problem.

PR33928 (comments 108 and 111) give an example where -fschedule-insns on x64-64
gives a 14% speedup on some direct and inverse FFT codes, certainly not a
trivial difference.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24319



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-08-27 Thread lucier at math dot purdue dot edu


--- Comment #111 from lucier at math dot purdue dot edu  2009-08-27 17:02 
---
I can compile gambit 4.1.2 with -fschedule-insns except for the function noted
in PR41164.

On

model name  : Intel(R) Core(TM)2 Quad  CPU   Q8200  @ 2.33GHz

with

gcc version 4.5.0 20090803 (experimental) [trunk revision 150373] (GCC) 

the times with -fschedule-insns are

(time (direct-fft-recursive-4 a table))
144 ms cpu time (144 user, 0 system)
(time (inverse-fft-recursive-4 a table))
136 ms cpu time (136 user, 0 system)

and the times without -fschedule-insns are

(time (direct-fft-recursive-4 a table))
168 ms cpu time (168 user, 0 system)
(time (inverse-fft-recursive-4 a table))
172 ms cpu time (172 user, 0 system)

That's a pretty big improvement.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug target/41176] New: ICE in reload_cse_simplify_operands at postreload.c:396

2009-08-26 Thread lucier at math dot purdue dot edu
with this compiler:

[luc...@lambda-head lib]$ /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: powerpc64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c,c++ --enable-stage1-languages=c,c++ --with-cpu=default64
Thread model: posix
gcc version 4.5.0 20090825 (experimental) [trunk revision 151108] (GCC) 

and this command line

 /pkgs/gcc-mainline/libexec/gcc/powerpc64-unknown-linux-gnu/4.5.0/cc1
-fpreprocessed thread.i -quiet -mcpu=970 -m64  -O1 -Wno-unused -version
-fschedule-insns -fno-math-errno -fschedule-insns2 -fno-trapping-math
-fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common

I get the following error:

thread.i: In function ‘___H_make_2d_thread’:
thread.i:719:1: error: insn does not satisfy its constraints:
(insn 625 411 219 26 thread.i:625 (set (reg:DF 19 19)
(mem:DF (plus:DI (reg:DI 22 22 [orig:197 D.3836 ] [197])
(const_int 23 [0x17])) [0 S8 A64])) 357 {*movdf_hardfloat64}
(nil))
thread.i:719:1: internal compiler error: in reload_cse_simplify_operands, at
postreload.c:396

I apologize in advance for the size of the test case, which I will post next.


-- 
   Summary: ICE in reload_cse_simplify_operands at postreload.c:396
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: lucier at math dot purdue dot edu
 GCC build triplet: powerpc64-unknown-linux-gnu
  GCC host triplet: powerpc64-unknown-linux-gnu
GCC target triplet: powerpc64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41176



[Bug target/41176] ICE in reload_cse_simplify_operands at postreload.c:396

2009-08-26 Thread lucier at math dot purdue dot edu


--- Comment #1 from lucier at math dot purdue dot edu  2009-08-27 00:14 
---
Created an attachment (id=18431)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18431action=view)
preprocessed source file

I'm not having much luck cutting this down more, sorry.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41176



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-08-26 Thread lucier at math dot purdue dot edu


--- Comment #108 from lucier at math dot purdue dot edu  2009-08-27 01:18 
---
direct.c contains a direct FFT; I've compiled the direct and inverse fft and I
ran it on arrays with 2^23 double-precision complex elements and

heine:~/programs/gcc/objdirs/bench-mainline-on-fft /pkgs/gcc-mainline/bin/gcc
-v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/pkgs/gcc-mainline --enable-languages=c,c++
-enable-stage1-languages=c,c++
Thread model: posix
gcc version 4.5.0 20090803 (experimental) [trunk revision 150373] (GCC) 

The compile options were

/pkgs/gcc-mainline/bin/gcc -save-temps -c -Wno-unused -O1 -fno-math-errno
-fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv
-fomit-frame-pointer -fPIC -fno-common -mieee-fp -rdynamic -shared
-fschedule-insns

and the same without -fschedule-insns.

The runtime for direct+inverse FFT with instruction scheduling was 1.264
seconds and the time for direct+inverse FFT without -fschedule-insns was 1.444
seconds, which is a 14% speedup for that one compiler option.  This is on a
2.33GHz Core 2 quad machine.

I'll attach the inner loops of direct.c with and with -fschedule-insns.

I haven't been able to compile the complete Gambit runtime with
-fschedule-insns on either x86-64 or ppc64; I've filed PR41164 and PR41176 for
those two different failures.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-08-26 Thread lucier at math dot purdue dot edu


--- Comment #109 from lucier at math dot purdue dot edu  2009-08-27 01:22 
---
Created an attachment (id=18432)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18432action=view)
inner loop of direct.c with -fschedule-insns


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-08-26 Thread lucier at math dot purdue dot edu


--- Comment #110 from lucier at math dot purdue dot edu  2009-08-27 01:22 
---
Created an attachment (id=18433)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18433action=view)
inner loop of direct.c without -fschedule-insns


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug rtl-optimization/41164] New: Unable to find spill register

2009-08-25 Thread lucier at math dot purdue dot edu
With this compiler:

heine:~/programs/gambc-v4_5_1-devel /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/pkgs/gcc-mainline --enable-languages=c,c++
-enable-stage1-languages=c,c++
Thread model: posix
gcc version 4.5.0 20090803 (experimental) [trunk revision 150373] (GCC) 

with this command:

heine:~/programs/gambc-v4_5_1-devel/lib /pkgs/gcc-mainline/bin/gcc
-fschedule-insns -Wno-unused -O1 -fno-math-errno -fschedule-insns2
-fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC
-fno-common -mieee-fp -save-temps -c os_test.i


fails with this error:

os_base.c: In function ‘___os_err_code_to_string’:
os_base.c:1247:1: error: unable to find a register to spill in class ‘DREG’
os_base.c:1247:1: error: this is the insn:
(insn 264 280 266 41 os_test.i:158 (parallel [
(set (reg:SI 37 r8 [133])
(truncate:SI (lshiftrt:DI (mult:DI (sign_extend:DI (reg:SI 2 cx
[130]))
(sign_extend:DI (reg:SI 6 bp [185])))
(const_int 32 [0x20]
(clobber (scratch:SI))
(clobber (reg:CC 17 flags))
]) 347 {*smulsi3_highpart_insn} (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_EQUAL (truncate:SI (lshiftrt:DI (mult:DI (sign_extend:DI
(reg:SI 2 cx [130]))
(const_int 1717986919 [0x6667]))
(const_int 32 [0x20])))
(nil
os_base.c:1247: confused by earlier errors, bailing out

I'll add the .i file next.

What's interesting is that I get similar errors with

heine:~/programs/gambc-v4_5_1-devel/lib gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.3.3-5ubuntu4'
--with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.3
--program-suffix=-4.3 --enable-clocale=gnu --enable-libstdcxx-debug
--enable-objc-gc --enable-mpfr --with-tune=generic --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) 

heine:~/programs/gambc-v4_5_1-devel/lib /pkgs/gcc-4.4-branch/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../gcc-4.4-branch/configure --prefix=/pkgs/gcc-4.4-branch
--enable-languages=c --enable-checking=release --disable-multilib
Thread model: posix
gcc version 4.4.1 20090522 (prerelease) (GCC) 

and

heine:~/programs/gambc-v4_5_1-devel/lib /pkgs/gcc-4.2.4/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../gcc-4.2.4/configure --prefix=/pkgs/gcc-4.2.4
--enable-languages=c --enable-checking=release --disable-multilib
Thread model: posix
gcc version 4.2.4


-- 
   Summary: Unable to find spill register
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: lucier at math dot purdue dot edu
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41164



[Bug rtl-optimization/41164] Unable to find spill register

2009-08-25 Thread lucier at math dot purdue dot edu


--- Comment #1 from lucier at math dot purdue dot edu  2009-08-25 14:57 
---
Created an attachment (id=18423)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18423action=view)
test file that illustrates failure


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41164



[Bug libstdc++/40968] New: ICE including fenv.h when compiling O2g.gch

2009-08-04 Thread lucier at math dot purdue dot edu
with this compiler:

Mon Aug  3 16:57:15 UTC 2009 (revision 150373)

with this configure and build:

/bin/rm -rf *; ../../mainline/configure --enable-checking=release
--prefix=/pkgs/gcc-mainline-mem-stats --enable-languages=c,c++
--enable-gather-detailed-mem-stats -enable-stage1-languages=c,c++; make -j 6
bootstrap  build.log

bootstrap fails with

/home/lucier/programs/gcc/objdirs/mainline/./gcc/xgcc -shared-libgcc
-B/home/lucier/programs/gcc/objdirs/mainline/./gcc -nostdinc++
-L/home/lucier/programs/gcc/objdirs/mainline/x86_64-unknown-linux-gnu/32/libstdc++-v3/src
-L/home/lucier/programs/gcc/objdirs/mainline/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs
-B/pkgs/gcc-mainline-mem-stats/x86_64-unknown-linux-gnu/bin/
-B/pkgs/gcc-mainline-mem-stats/x86_64-unknown-linux-gnu/lib/ -isystem
/pkgs/gcc-mainline-mem-stats/x86_64-unknown-linux-gnu/include -isystem
/pkgs/gcc-mainline-mem-stats/x86_64-unknown-linux-gnu/sys-include  -m32 -x
c++-header -D_GNU_SOURCE  -m32
-I/home/lucier/programs/gcc/objdirs/mainline/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/x86_64-unknown-linux-gnu
-I/home/lucier/programs/gcc/objdirs/mainline/x86_64-unknown-linux-gnu/32/libstdc++-v3/include
-I/home/lucier/programs/gcc/mainline/libstdc++-v3/libsupc++ -O2 -g
/home/lucier/programs/gcc/mainline/libstdc++-v3/include/precompiled/stdtr1c++.h
-o x86_64-unknown-linux-gnu/bits/stdtr1c++.h.gch/O2g.gch
In file included from
/home/lucier/programs/gcc/objdirs/mainline/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/tr1/cfenv:36:0,
 from
/home/lucier/programs/gcc/mainline/libstdc++-v3/include/precompiled/stdtr1c++.h:33:
/home/lucier/programs/gcc/objdirs/mainline/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/fenv.h:32:9:
internal compiler error: Segmentation fault

I'm sorry, but I don't really know how to go further in diagnosing this.


-- 
   Summary: ICE including fenv.h when compiling O2g.gch
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: lucier at math dot purdue dot edu
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40968



[Bug libstdc++/40968] ICE when compiling O2g.gch; problem with --enable-gather-detailed-mem-stats

2009-08-04 Thread lucier at math dot purdue dot edu


--- Comment #1 from lucier at math dot purdue dot edu  2009-08-04 23:15 
---
bootstrap completes without --enable-gather-detailed-mem-stats


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

Summary|ICE including fenv.h when   |ICE when compiling O2g.gch;
   |compiling O2g.gch   |problem with --enable-
   ||gather-detailed-mem-stats


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40968



[Bug bootstrap/40950] New: Bootstrap fails with in-tree gmp and without system C++ compiler

2009-08-03 Thread lucier at math dot purdue dot edu
With this build script

#!/bin/tcsh
/bin/rm -rf *; ../../mainline/configure --enable-checking=release
--prefix=/pkgs/gcc-mainline-mem-stats --enable-languages=c
--enable-gather-detailed-mem-stats ; make -j 6 bootstrap  build.log

on this OS:

heine:~/programs/gcc/objdirs/mainline uname -a
Linux heine.math.purdue.edu 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25
01:19:55 UTC 2009 x86_64 GNU/Linux

with gmp 4.2.4 and mpfr-2.3.2 added to the mainline tree with revision

Mon Aug  3 12:57:15 EDT 2009
Mon Aug  3 16:57:15 UTC 2009 (revision 150373)

bootstrap fails when configuring gmp with the stage1 compiler with the message

checking how to run the C++ preprocessor... /lib/cpp
configure: error: C++ preprocessor /lib/cpp fails sanity check
See `config.log' for more details.
make[2]: *** [configure-stage2-gmp] Error 1
make[2]: Leaving directory `/home/lucier/programs/gcc/objdirs/mainline'
make[1]: *** [stage2-bubble] Error 2
make[1]: Leaving directory `/home/lucier/programs/gcc/objdirs/mainline'
make: *** [bootstrap] Error 2

I'll attach build.log and gmp/config.log

and without a


-- 
   Summary: Bootstrap fails with in-tree gmp and without system C++
compiler
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: lucier at math dot purdue dot edu
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40950



[Bug bootstrap/40950] Bootstrap fails with in-tree gmp and without system C++ compiler

2009-08-03 Thread lucier at math dot purdue dot edu


--- Comment #1 from lucier at math dot purdue dot edu  2009-08-03 17:15 
---
Created an attachment (id=18291)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18291action=view)
Build log of failed bootstrap


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40950



[Bug bootstrap/40950] Bootstrap fails with in-tree gmp and without system C++ compiler

2009-08-03 Thread lucier at math dot purdue dot edu


--- Comment #2 from lucier at math dot purdue dot edu  2009-08-03 17:16 
---
Created an attachment (id=18292)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18292action=view)
log of failed gmp configuration


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40950



[Bug bootstrap/40950] Bootstrap fails with in-tree gmp and without system C++ compiler

2009-08-03 Thread lucier at math dot purdue dot edu


--- Comment #3 from lucier at math dot purdue dot edu  2009-08-03 17:17 
---
Created an attachment (id=18293)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18293action=view)
build log with right content type


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

  Attachment #18291|0   |1
is obsolete||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40950



[Bug bootstrap/37739] [4.4 Regression] bootstrap broken with core gcc gcc-4.2.x

2009-07-02 Thread lucier at math dot purdue dot edu


--- Comment #16 from lucier at math dot purdue dot edu  2009-07-02 16:35 
---
OK, so we've had several reliable reports that this bug still exists, but I'm
not high enough in the GCC bugzilla hierarchy to reopen this bug (I just
tried),  perhaps Andreas or Jakub would like to do so.  (Jakub, I've added your
e-mail as a CC to this bug, sorry if that isn't appropriate.


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37739



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-16 Thread lucier at math dot purdue dot edu


--- Comment #106 from lucier at math dot purdue dot edu  2009-06-16 07:24 
---
This machine has 4ms ticks, so we're getting down to a few ticks difference
with a benchmark of this size.  It's 156ms with 4.2.4, 168ms with 4.5.0, and
164 ms when -frename-registers is added to the command line.

It's not just scheduling, there are more memory accesses with 4.5.0.

With a problem roughly 10 times as large, the times are

4.2.4:  2912ms
4.5.0:  3204ms
4.5.0:  3120ms (adding -frename-registers)

So there's a 7% difference with -frename-registers.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-15 Thread lucier at math dot purdue dot edu


--- Comment #98 from lucier at math dot purdue dot edu  2009-06-15 16:11 
---
I don't quite understand how you would like me to configure and run the test.

First, I've applied your patches to speed up computing DF to my tree; do you
want them included in the test, or should I use a pristine mainline?

Second, when configuring mainline, should I include, or not include

1.  --enable-gather-detailed-mem-stats
2.  --enable-checking=release

After that, I think you just want to run two compiles with and without
-ftime-report, is that right?  (Nothing about -fmem-report.)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug rtl-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-15 Thread lucier at math dot purdue dot edu


--- Comment #102 from lucier at math dot purdue dot edu  2009-06-15 19:57 
---
Subject: Re:  [4.3/4.4/4.5 Regression] 30%
 performance slowdown in floating-point code caused by  r118475

On Mon, 2009-06-15 at 16:20 +, paolo dot bonzini at gmail dot com
wrote:

 Yes, and the output of -ftime-report is not needed.  Just the time 
 ./cc1 ... output for the two.  Thanks!

The two commands:

time /pkgs/gcc-mainline/bin/gcc -O1 -fno-math-errno -fschedule-insns2
-fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC
-fno-common -mieee-fp -c compiler.i 
261.424u 1.184s 4:22.76 99.9%   0+0k 0+28456io 0pf+0w
time /pkgs/gcc-mainline/bin/gcc -O1 -fno-math-errno -fschedule-insns2
-fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC
-fno-common -mieee-fp -c compiler.i -ftime-report 
263.424u 4.900s 4:28.68 99.8%   0+0k 0+28480io 0pf+0w


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-15 Thread lucier at math dot purdue dot edu


--- Comment #103 from lucier at math dot purdue dot edu  2009-06-15 20:21 
---
Regarding comment #101 ...

With

heine:~/programs/gcc/objdirs/gsc-fft-tests/gambc-v4_1_2
/pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c --disable-multilib --enable-checking=release
Thread model: posix
gcc version 4.5.0 20090608 (experimental) [trunk revision 148276] (GCC) 

(and including Paolo's patch to speed up DF), the routine in direct.c takes

168 ms cpu time (168 user, 0 system)

As reported here

http://www.math.purdue.edu/~lucier/bugzilla/9/

with gcc-4.2.4, this routine takes 156 ms on the same machine.

Comment #9 gives the code that 4.2.4 generates at the start of the main loop; 
the start of the main loop with the version of 4.5.0 I gave above is:

.L2938:
movq%rcx, %rdx
addq8(%rax), %rdx
leaq4(%rcx), %rbx
movq%rdx, -8(%rax)
leaq4(%rdx), %rdi
addq8(%rax), %rdx
movq%rdi, -16(%rax)
movq%rdx, -24(%rax)
leaq4(%rdx), %rdi
addq8(%rax), %rdx
movq%rdi, -32(%rax)
movq%rdx, -40(%rax)
leaq4(%rdx), %rdi
movq40(%rax), %rdx
movq%rdi, -48(%rax)
movsd   7(%rdx,%rdi,2), %xmm7
movq-40(%rax), %rdi
leaq7(%rdx,%rcx,2), %r8
addq$8, %rcx
movsd   (%r8), %xmm4
cmpq%rcx, %r13
movsd   7(%rdx,%rdi,2), %xmm10
movq-32(%rax), %rdi
movsd   7(%rdx,%rdi,2), %xmm5
movq-24(%rax), %rdi
movsd   7(%rdx,%rdi,2), %xmm6
movq-16(%rax), %rdi
movsd   7(%rdx,%rdi,2), %xmm13
movq-8(%rax), %rdi
movsd   7(%rdx,%rdi,2), %xmm11
leaq(%rbx,%rbx), %rdi
movsd   7(%rdi,%rdx), %xmm9
movq24(%rax), %rdx
movapd  %xmm11, %xmm14
movsd   15(%rdx), %xmm1
movsd   7(%rdx), %xmm2
movapd  %xmm1, %xmm8
movsd   31(%rdx), %xmm3
movapd  %xmm2, %xmm12
mulsd   %xmm10, %xmm8
mulsd   %xmm7, %xmm12
mulsd   %xmm2, %xmm10
mulsd   %xmm1, %xmm7
movsd   23(%rdx), %xmm0

So, to my mind, this is still a 4.5 regression, as there is still a slow-down
and the code is still much less optimized by 4.5.0 than by 4.2.4. 168/156 ~
1.08, so if you want to change the Summary of this bug to 8% regression, or
some other things, that's fine, but I've changed this PR back to being a 4.5
regression.

I was not really thrilled when Richard marked PR 39157 as a duplicate of this
PR.  To my mind, there are three more or less independent things---run time of
Gambit-generated code, compile time of the code, and the space required to
compile the code.  This PR is about run time; PR 39157 was about space needed
by the compiler; PR 26854 is about compile time.  They seem to have all been
mushed together.


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

  Known to work|4.5.0   |
Summary|[4.3/4.4 Regression] 30%|[4.3/4.4/4.5 Regression] 30%
   |performance slowdown in |performance slowdown in
   |floating-point code caused  |floating-point code caused
   |by  r118475 |by  r118475


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-14 Thread lucier at math dot purdue dot edu


--- Comment #95 from lucier at math dot purdue dot edu  2009-06-14 14:59 
---
The test case is compiler.i.gz


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-14 Thread lucier at math dot purdue dot edu


--- Comment #96 from lucier at math dot purdue dot edu  2009-06-14 15:02 
---
Sorry, the gcc options are in comment 87 (the -fforward-propagate is now
redundant), and without Paolo's recently proposed patch it requires about 9GB
of memory to compile.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-08 Thread lucier at math dot purdue dot edu


--- Comment #91 from lucier at math dot purdue dot edu  2009-06-08 18:19 
---
Created an attachment (id=17968)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17968action=view)
time and memory report for compiler.i after Paolo's patch

The patch cut the total bitmaps used compiling compiler.i from  60GB to 3GB;
maximum memory (just from top) was 1631MB.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115

2009-05-16 Thread lucier at math dot purdue dot edu


--- Comment #13 from lucier at math dot purdue dot edu  2009-05-16 14:37 
---
Subject: Re:  ICE in register_overhead, at bitmap.c:115


On May 13, 2009, at 9:32 PM, bje at gcc dot gnu dot org wrote:

 The test case does not run in a GB of RAM on my x86-64 system.  It  
 sends the
 system deep into swap until the out-of-memory manager kicks in.

Ah, now that -fforward-propagate has been added to -O1 on mainline it  
takes a bit over 8GB of RAM to run instead of a GB.

Sorry.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301



[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115

2009-05-16 Thread lucier at math dot purdue dot edu


--- Comment #15 from lucier at math dot purdue dot edu  2009-05-17 01:09 
---
Fixed by

http://gcc.gnu.org/viewcvs?root=gccview=revrev=147624


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301



[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115

2009-05-15 Thread lucier at math dot purdue dot edu


--- Comment #8 from lucier at math dot purdue dot edu  2009-05-15 21:55 
---
Created an attachment (id=17876)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17876action=view)
patch to use HOST_WIDEST_INT for bitmap statistics

Here's a hack to use HOST_WIDEST_INT for bitmap statistics.  I'll attach the
report from the compiler.i test case.  If you think the report is useful,
perhaps you can use this as a starting point for a real patch and I'll
bootstrap and test it.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301



[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115

2009-05-15 Thread lucier at math dot purdue dot edu


--- Comment #9 from lucier at math dot purdue dot edu  2009-05-15 21:57 
---
Created an attachment (id=17877)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17877action=view)
memory and time report for compiler.i test case

Here's the output for the test case.  See if you like it.

I used the following configure command and compiler version:

pythagoras-147% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /tmp/lucier/gcc/mainline/configure --enable-checking=release
--prefix=/pkgs/gcc-mainline --enable-languages=c
--enable-gather-detailed-mem-stats --disable-bootstrap
Thread model: posix
gcc version 4.5.0 20090515 (experimental) [trunk revision 147594] (GCC) 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-15 Thread lucier at math dot purdue dot edu


--- Comment #85 from lucier at math dot purdue dot edu  2009-05-16 00:20 
---
Created an attachment (id=17878)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17878action=view)
Large test file for testing time and memory usage

This is the file compiler.i used in the previous tests.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115

2009-05-08 Thread lucier at math dot purdue dot edu


--- Comment #6 from lucier at math dot purdue dot edu  2009-05-08 20:27 
---
Just for more information, I now hit this on x86_64-unknown-linux-gnu with the
compiler

pythagoras-32% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /tmp/lucier/gcc/mainline/configure --enable-checking=release
--prefix=/pkgs/gcc-mainline --enable-languages=c
--enable-gather-detailed-mem-stats
Thread model: posix
gcc version 4.5.0 20090508 (experimental) [trunk revision 147288] (GCC) 

on the compiler.i test case with

/pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I.  -Wall -W -Wno-unused
-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing
-fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -frename-registers
-fno-move-loop-invariants -fforward-propagate -DHAVE_CONFIG_H -D___PRIMAL
-D___LIBRARY -c compiler.i -ftime-report -fmem-report  
rename-no-move-loop-invariants-forward-propagate-report-new


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-07 Thread lucier at math dot purdue dot edu


--- Comment #71 from lucier at math dot purdue dot edu  2009-05-07 16:02 
---
Created an attachment (id=17820)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17820action=view)
time for 31957, with rename-registers


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-07 Thread lucier at math dot purdue dot edu


--- Comment #75 from lucier at math dot purdue dot edu  2009-05-07 16:31 
---
Subject: Re:  [4.3/4.4/4.5 Regression] 30% performance slowdown in
floating-point code caused by  r118475


On May 7, 2009, at 12:21 PM, bonzini at gnu dot org wrote:

 --- Comment #74 from bonzini at gnu dot org  2009-05-07 16:21  
 ---
 Ok.  One step at a time. :-)  To recap, here is the situation:

 - that scheduling is necessary now and not in 4.2.x, probably is  
 just a matter
 of luck

If you mean -fschedule-insns2, it has always been part of the options  
list.

 - at least we have a set of options providing good performance on this
 testcase, and guidance towards better tuning of the various  
 problematic
 optimizations

OK, but -fforward-propagate is not viable in general for these  
machine-generated codes.


Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-06 Thread lucier at math dot purdue dot edu


--- Comment #63 from lucier at math dot purdue dot edu  2009-05-06 19:57 
---
Was the patch in comment 55 meant for me to bootstrap and test with today's
mainline?  It crashes at the gcc_assert at

/* Subroutine of canon_reg.  Pass *XLOC through canon_reg, and validate
   the result if necessary.  INSN is as for canon_reg.  */

static void
validate_canon_reg (rtx *xloc, rtx insn)
{
  if (*xloc)
{
  rtx new_rtx = canon_reg (*xloc, insn);

  /* If replacing pseudo with hard reg or vice versa, ensure the
 insn remains valid.  Likewise if the insn has MATCH_DUPs.  */
  gcc_assert (insn  new_rtx);
  validate_change (insn, xloc, new_rtx, 1);
}
}

when building libgcc:

/tmp/lucier/gcc/objdirs/mainline/./gcc/xgcc
-B/tmp/lucier/gcc/objdirs/mainline/./gcc/
-B/pkgs/gcc-mainline/x86_64-unknown-linux-gnu/bin/
-B/pkgs/gcc-mainline/x86_64-unknown-linux-gnu/lib/ -isystem
/pkgs/gcc-mainline/x86_64-unknown-linux-gnu/include -isystem
/pkgs/gcc-mainline/x86_64-unknown-linux-gnu/sys-include -g -O2 -m32 -O2  -g -O2
-DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes
-Wcast-qual -Wold-style-definition  -isystem ./include  -fPIC -g
-DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED   -I. -I.
-I../../.././gcc -I../../../../../mainline/libgcc
-I../../../../../mainline/libgcc/. -I../../../../../mainline/libgcc/../gcc
-I../../../../../mainline/libgcc/../include
-I../../../../../mainline/libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT
-DHAVE_CC_TLS -DUSE_TLS -o _moddi3.o -MT _moddi3.o -MD -MP -MF _moddi3.dep
-DL_moddi3 -c ../../../../../mainline/libgcc/../gcc/libgcc2.c \
  -fexceptions -fnon-call-exceptions -fvisibility=hidden -DHIDE_EXPORTS
../../../../../mainline/libgcc/../gcc/libgcc2.c: In function â:
../../../../../mainline/libgcc/../gcc/libgcc2.c:1121: internal compiler error:
in validate_canon_reg, at cse.c:2730


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-06 Thread lucier at math dot purdue dot edu


--- Comment #64 from lucier at math dot purdue dot edu  2009-05-06 20:43 
---
In answer to comment 60, here's the command line where I added
-fforward-propagate -fno-move-loop-invariants:

/pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused
-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing
-fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fforward-propagate
-fno-move-loop-invariants -DHAVE_CONFIG_H -D___PRIMAL -D___LIBRARY
-D___GAMBCDIR=\/usr/local/Gambit-C/v4.1.2\ -D___SYS_TYPE_CPU=\x86_64\
-D___SYS_TYPE_VENDOR=\unknown\ -D___SYS_TYPE_OS=\linux-gnu\ -c _num.c

here's the compiler:

/pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /tmp/lucier/gcc/mainline/configure --enable-checking=release
--prefix=/pkgs/gcc-mainline --enable-languages=c
Thread model: posix
gcc version 4.5.0 20090506 (experimental) [trunk revision 147199] (GCC) 

and the runtime didn't change (substantially)

132 ms cpu time (132 user, 0 system)

and the loop looks pretty much just as bad (it's 117 instructions long, by my
count):

.L2752:
movq%rcx, %rdx
addq8(%rax), %rdx
leaq4(%rcx), %rdi
movq%rdx, -8(%rax)
leaq4(%rdx), %rbx
addq8(%rax), %rdx
movq%rbx, -16(%rax)
movq%rdx, -24(%rax)
leaq4(%rdx), %rbx
addq8(%rax), %rdx
movq%rbx, -32(%rax)
movq%rdx, -40(%rax)
leaq4(%rdx), %rbx
movq40(%rax), %rdx
movq%rbx, -48(%rax)
movsd   7(%rdx,%rbx,2), %xmm9
movq-40(%rax), %rbx
leaq7(%rdx,%rcx,2), %r8
addq$8, %rcx
movsd   (%r8), %xmm4
cmpq%rcx, %r13
movsd   7(%rdx,%rbx,2), %xmm11
movq-32(%rax), %rbx
movsd   7(%rdx,%rbx,2), %xmm5
movq-24(%rax), %rbx
movsd   7(%rdx,%rbx,2), %xmm7
movq-16(%rax), %rbx
movsd   7(%rdx,%rbx,2), %xmm14
movq-8(%rax), %rbx
movsd   7(%rdx,%rbx,2), %xmm6
leaq(%rdi,%rdi), %rbx
movsd   7(%rbx,%rdx), %xmm8
movq24(%rax), %rdx
movapd  %xmm6, %xmm13
movsd   15(%rdx), %xmm1
movsd   7(%rdx), %xmm2
movapd  %xmm1, %xmm10
movsd   31(%rdx), %xmm3
movapd  %xmm2, %xmm12
mulsd   %xmm11, %xmm10
mulsd   %xmm9, %xmm12
mulsd   %xmm2, %xmm11
mulsd   %xmm1, %xmm9
movsd   23(%rdx), %xmm0
addsd   %xmm12, %xmm10
movapd  %xmm2, %xmm12
mulsd   %xmm7, %xmm2
subsd   %xmm9, %xmm11
movapd  %xmm1, %xmm9
mulsd   %xmm5, %xmm12
mulsd   %xmm5, %xmm1
movapd  %xmm8, %xmm5
mulsd   %xmm7, %xmm9
movapd  %xmm4, %xmm7
subsd   %xmm11, %xmm13
addsd   %xmm6, %xmm11
movsd   .LC5(%rip), %xmm6
subsd   %xmm1, %xmm2
movapd  %xmm0, %xmm1
addsd   %xmm12, %xmm9
movapd  %xmm14, %xmm12
xorpd   %xmm3, %xmm6
subsd   %xmm10, %xmm12
mulsd   %xmm13, %xmm1
subsd   %xmm2, %xmm7
addsd   %xmm4, %xmm2
movapd  %xmm6, %xmm4
addsd   %xmm14, %xmm10
mulsd   %xmm13, %xmm6
mulsd   %xmm12, %xmm4
subsd   %xmm9, %xmm5
mulsd   %xmm0, %xmm12
addsd   %xmm8, %xmm9
movapd  %xmm0, %xmm8
mulsd   %xmm11, %xmm0
addsd   %xmm1, %xmm4
movapd  %xmm3, %xmm1
mulsd   %xmm10, %xmm3
subsd   %xmm12, %xmm6
mulsd   %xmm11, %xmm1
mulsd   %xmm10, %xmm8
subsd   %xmm3, %xmm0
addsd   %xmm1, %xmm8
movapd  %xmm2, %xmm1
addsd   %xmm0, %xmm1
subsd   %xmm0, %xmm2
movapd  %xmm7, %xmm0
subsd   %xmm6, %xmm7
addsd   %xmm6, %xmm0
movsd   %xmm1, (%r8)
movapd  %xmm9, %xmm1
movq40(%rax), %rdx
subsd   %xmm8, %xmm9
addsd   %xmm8, %xmm1
movsd   %xmm1, 7(%rbx,%rdx)
movq-8(%rax), %rbx
movq40(%rax), %rdx
movsd   %xmm2, 7(%rdx,%rbx,2)
movq-16(%rax), %rbx
movq40(%rax), %rdx
movsd   %xmm9, 7(%rdx,%rbx,2)
movq-24(%rax), %rbx
movq40(%rax), %rdx
movsd   %xmm0, 7(%rdx,%rbx,2)
movapd  %xmm5, %xmm0
movq-32(%rax), %rbx
movq40(%rax), %rdx
subsd   %xmm4, %xmm5
addsd   %xmm4, %xmm0
movsd   %xmm0, 7(%rdx,%rbx,2)
movq-40(%rax), %rbx
movq40(%rax), %rdx
movsd   %xmm7, 7(%rdx,%rbx,2)
movq-48(%rax), %rbx
movq40(%rax), %rdx
movsd   %xmm5, 7(%rdx,%rbx,2)
jg  .L2752
movq%rdi, %r13
.L2751:


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-06 Thread lucier at math dot purdue dot edu


--- Comment #66 from lucier at math dot purdue dot edu  2009-05-07 05:27 
---
Adding -frename-registers gives a significant speedup (sometimes as fast as
4.1.2 on this shared machine, i.e., it somtimes hits 108 ms instead of
132-140ms), the command line with -fforward-propagate -fno-move-loop-invariants
-frename-registers  is

/pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused
-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing
-fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fforward-propagate
-fno-move-loop-invariants -frename-registers -DHAVE_CONFIG_H -D___PRIMAL
-D___LIBRARY -D___GAMBCDIR=\/usr/local/Gambit-C/v4.1.2\
-D___SYS_TYPE_CPU=\x86_64\ -D___SYS_TYPE_VENDOR=\unknown\
-D___SYS_TYPE_OS=\linux-gnu\ -c _num.c

and the loop is

.L2752:
movq%rcx, %r12
addq8(%rax), %r12
leaq4(%rcx), %rdi
movq%r12, -8(%rax)
leaq4(%r12), %r8
addq8(%rax), %r12
movq%r8, -16(%rax)
movq-8(%rax), %r8
movq-16(%rax), %rdx
movq%r12, -24(%rax)
leaq4(%r12), %rbx
addq8(%rax), %r12
movq-24(%rax), %r9
movq%rbx, -32(%rax)
movq24(%rax), %rbx
movq-32(%rax), %r10
leaq4(%r12), %r11
movq%r12, -40(%rax)
movq40(%rax), %r12
movq-40(%rax), %r14
movq%r11, -48(%rax)
movsd   15(%rbx), %xmm1
movsd   7(%rbx), %xmm2
movsd   7(%r12,%r11,2), %xmm9
movapd  %xmm1, %xmm3
movsd   7(%r12,%r14,2), %xmm11
leaq7(%r12,%rcx,2), %r11
movapd  %xmm2, %xmm10
leaq(%rdi,%rdi), %r14
mulsd   %xmm11, %xmm3
movapd  %xmm2, %xmm12
mulsd   %xmm9, %xmm10
addq$8, %rcx
mulsd   %xmm1, %xmm9
cmpq%rcx, %r13
mulsd   %xmm2, %xmm11
movsd   7(%r12,%r10,2), %xmm5
movsd   7(%r12,%r9,2), %xmm7
addsd   %xmm10, %xmm3
movsd   7(%r12,%r8,2), %xmm6
subsd   %xmm9, %xmm11
mulsd   %xmm7, %xmm2
movapd  %xmm1, %xmm9
mulsd   %xmm5, %xmm1
movapd  %xmm6, %xmm13
movsd   7(%r12,%rdx,2), %xmm14
mulsd   %xmm5, %xmm12
mulsd   %xmm7, %xmm9
subsd   %xmm11, %xmm13
movsd   31(%rbx), %xmm0
addsd   %xmm6, %xmm11
movsd   .LC5(%rip), %xmm6
subsd   %xmm1, %xmm2
movsd   (%r11), %xmm4
movapd  %xmm14, %xmm10
xorpd   %xmm0, %xmm6
addsd   %xmm12, %xmm9
movsd   7(%r14,%r12), %xmm8
subsd   %xmm3, %xmm10
movapd  %xmm4, %xmm7
addsd   %xmm14, %xmm3
movsd   23(%rbx), %xmm15
subsd   %xmm2, %xmm7
movapd  %xmm8, %xmm5
addsd   %xmm4, %xmm2
movapd  %xmm6, %xmm4
subsd   %xmm9, %xmm5
movapd  %xmm15, %xmm14
addsd   %xmm8, %xmm9
mulsd   %xmm10, %xmm4
movapd  %xmm15, %xmm8
mulsd   %xmm15, %xmm10
movapd  %xmm0, %xmm12
mulsd   %xmm11, %xmm15
mulsd   %xmm3, %xmm0
movapd  %xmm7, %xmm1
mulsd   %xmm13, %xmm6
mulsd   %xmm3, %xmm8
movapd  %xmm9, %xmm3
mulsd   %xmm11, %xmm12
subsd   %xmm0, %xmm15
mulsd   %xmm13, %xmm14
subsd   %xmm10, %xmm6
movapd  %xmm2, %xmm10
movapd  %xmm5, %xmm0
addsd   %xmm12, %xmm8
addsd   %xmm15, %xmm10
subsd   %xmm15, %xmm2
addsd   %xmm14, %xmm4
addsd   %xmm8, %xmm3
movsd   %xmm10, (%r11)
movq40(%rax), %r10
subsd   %xmm8, %xmm9
addsd   %xmm6, %xmm1
addsd   %xmm4, %xmm0
movsd   %xmm3, 7(%r14,%r10)
movq-8(%rax), %r9
movq40(%rax), %rdx
subsd   %xmm6, %xmm7
subsd   %xmm4, %xmm5
movsd   %xmm2, 7(%rdx,%r9,2)
movq-16(%rax), %r8
movq40(%rax), %r12
movsd   %xmm9, 7(%r12,%r8,2)
movq-24(%rax), %rbx
movq40(%rax), %r11
movsd   %xmm1, 7(%r11,%rbx,2)
movq-32(%rax), %r14
movq40(%rax), %r10
movsd   %xmm0, 7(%r10,%r14,2)
movq-40(%rax), %r9
movq40(%rax), %rdx
movsd   %xmm7, 7(%rdx,%r9,2)
movq-48(%rax), %r8
movq40(%rax), %r12
movsd   %xmm5, 7(%r12,%r8,2)
jg  .L2752

Adding -fforward-propagate -fno-move-loop-invariants -fweb instead of
-fforward-propagate -fno-move-loop-invariants -frename-registers, so the
compile line is

/pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused
-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing
-fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fforward-propagate
-fno-move-loop-invariants -fweb -DHAVE_CONFIG_H -D___PRIMAL -D___LIBRARY
-D___GAMBCDIR=\/usr/local/Gambit-C/v4.1.2\ -D___SYS_TYPE_CPU=\x86_64

[Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-05 Thread lucier at math dot purdue dot edu


--- Comment #53 from lucier at math dot purdue dot edu  2009-05-06 03:43 
---
I posted a possible fix to gcc-patches with the subject line

Possible fix for 30% performance regression in PR 33928

Here's the assembly for the main loop after the changes I proposed:

.L4230:
movq%r11, %rdi
addq8(%r10), %rdi
movq8(%r10), %rsi
movq8(%r10), %rdx
movq40(%r10), %rax
leaq4(%r11), %rbx
addq%rdi, %rsi
leaq4(%rdi), %r9
movq%rdi, -8(%r10)
addq%rsi, %rdx
leaq4(%rsi), %r8
movq%rsi, -24(%r10)
leaq4(%rdx), %rcx
movq%r9, -16(%r10)
movq%rdx, -40(%r10)
movq%r8, -32(%r10)
addq$7, %rax
movq%rcx, -48(%r10)
movsd   (%rax,%rcx,2), %xmm12
leaq(%rbx,%rbx), %rcx
movsd   (%rax,%rdx,2), %xmm3
leaq(%rax,%r11,2), %rdx
addq$8, %r11
movsd   (%rax,%r8,2), %xmm14
cmpq%r11, %r13
movsd   (%rax,%rsi,2), %xmm13
movsd   (%rax,%r9,2), %xmm11
movsd   (%rax,%rdi,2), %xmm10
movsd   (%rax,%rcx), %xmm8
movq24(%r10), %rax
movsd   (%rdx), %xmm7
movsd   15(%rax), %xmm2
movsd   7(%rax), %xmm1
movapd  %xmm2, %xmm0
movsd   31(%rax), %xmm9
movapd  %xmm1, %xmm6
mulsd   %xmm3, %xmm0
movapd  %xmm1, %xmm4
mulsd   %xmm12, %xmm6
mulsd   %xmm3, %xmm4
movapd  %xmm1, %xmm3
mulsd   %xmm13, %xmm1
mulsd   %xmm14, %xmm3
addsd   %xmm0, %xmm6
movapd  %xmm2, %xmm0
movsd   23(%rax), %xmm5
mulsd   %xmm12, %xmm0
movapd  %xmm7, %xmm12
subsd   %xmm0, %xmm4
movapd  %xmm2, %xmm0
mulsd   %xmm14, %xmm2
movapd  %xmm8, %xmm14
mulsd   %xmm13, %xmm0
movapd  %xmm11, %xmm13
addsd   %xmm6, %xmm11
subsd   %xmm6, %xmm13
subsd   %xmm2, %xmm1
movapd  %xmm10, %xmm2
addsd   %xmm0, %xmm3
movapd  %xmm5, %xmm0
subsd   %xmm4, %xmm2
addsd   %xmm4, %xmm10
subsd   %xmm1, %xmm12
addsd   %xmm1, %xmm7
movapd  %xmm9, %xmm1
subsd   %xmm3, %xmm14
mulsd   %xmm2, %xmm0
xorpd   .LC5(%rip), %xmm1
addsd   %xmm3, %xmm8
movapd  %xmm1, %xmm3
mulsd   %xmm2, %xmm1
movapd  %xmm5, %xmm2
mulsd   %xmm13, %xmm3
mulsd   %xmm11, %xmm2
addsd   %xmm0, %xmm3
movapd  %xmm5, %xmm0
mulsd   %xmm10, %xmm5
mulsd   %xmm13, %xmm0
subsd   %xmm0, %xmm1
movapd  %xmm9, %xmm0
mulsd   %xmm11, %xmm9
mulsd   %xmm10, %xmm0
subsd   %xmm9, %xmm5
addsd   %xmm0, %xmm2
movapd  %xmm7, %xmm0
addsd   %xmm5, %xmm0
subsd   %xmm5, %xmm7
movsd   %xmm0, (%rdx)
movapd  %xmm8, %xmm0
movq40(%r10), %rax
subsd   %xmm2, %xmm8
addsd   %xmm2, %xmm0
movsd   %xmm0, 7(%rcx,%rax)
movq-8(%r10), %rdx
movq40(%r10), %rax
movapd  %xmm12, %xmm0
subsd   %xmm1, %xmm12
movsd   %xmm7, 7(%rax,%rdx,2)
movq-16(%r10), %rdx
movq40(%r10), %rax
addsd   %xmm1, %xmm0
movsd   %xmm8, 7(%rax,%rdx,2)
movq-24(%r10), %rdx
movq40(%r10), %rax
movsd   %xmm0, 7(%rax,%rdx,2)
movapd  %xmm14, %xmm0
movq-32(%r10), %rdx
movq40(%r10), %rax
subsd   %xmm3, %xmm14
addsd   %xmm3, %xmm0
movsd   %xmm0, 7(%rax,%rdx,2)
movq-40(%r10), %rdx
movq40(%r10), %rax
movsd   %xmm12, 7(%rax,%rdx,2)
movq-48(%r10), %rdx
movq40(%r10), %rax
movsd   %xmm14, 7(%rax,%rdx,2)
jg  .L4230
movq%rbx, %r13
.L4228:


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-05 Thread lucier at math dot purdue dot edu


--- Comment #54 from lucier at math dot purdue dot edu  2009-05-06 03:50 
---
Created an attachment (id=17805)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17805action=view)
svn diff of cse.c to fix the performance regression

This partially reverts r118475 and adds code to call find_best_address for MEMs
in fold_rtx.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/39914] 96% performance regression in floating point code; part of the problem started 2009/03/12-13

2009-04-27 Thread lucier at math dot purdue dot edu


--- Comment #3 from lucier at math dot purdue dot edu  2009-04-27 15:07 
---
Subject: Re:  96% performance regression in floating
 point code; part of the problem started 2009/03/12-13

On Sun, 2009-04-26 at 18:43 +, ubizjak at gmail dot com wrote:
 
 
 --- Comment #1 from ubizjak at gmail dot com  2009-04-26 18:43 ---
 There are a couple of possible candidates in this range:
 
 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=144812
 Log:
 2009-03-12  Vladimir Makarov  vmaka...@redhat.com
 
 PR debug/39432
 * ira-int.h (struct allocno): Fix comment for calls_crossed_num.
 * ira-conflicts.c (ira_build_conflicts): Prohibit call used
 registers for allocnos created from user-defined variables.

The problem exists in 

gcc version 4.4.0 20090312 (experimental) [trunk revision 144812] (GCC) 

So perhaps it's this checkin.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914



[Bug regression/39914] 96% performance regression in floating point code; part of the problem started 2009/03/12-13

2009-04-27 Thread lucier at math dot purdue dot edu


--- Comment #4 from lucier at math dot purdue dot edu  2009-04-27 15:11 
---
Subject: Re:  96% performance regression in floating
 point code; part of the problem started 2009/03/12-13

On Mon, 2009-04-27 at 08:16 +, ubizjak at gmail dot com wrote:
 
 
 --- Comment #2 from ubizjak at gmail dot com  2009-04-27 08:16 ---
 (In reply to comment #0)
 
  (same .i file, same instructions for reproducing, same compiler options, 
  same
  everything)
 
 I guess that this is direct.i compiled with -O1?
 

Yes, the compile flags are

-Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math
-fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp

 It is not clear from your report, if -O1 flag is problematic, -O2 code looks
 good to me.

Yes, the -O2 code looks good to me, too.

I've used the above list of options (starting with -O1) on this code
instead of -O2 because the above list (a) has generally given faster
performance, and (b) has required much less compile time and memory to
compile the C code generated by the Gambit Scheme-C compiler.  I have
not yet seen any evidence that -O2 generates better code (overall) than
those set of options above.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914



[Bug regression/39914] 96% performance regression in floating point code; part of the problem started 2009/03/12-13

2009-04-27 Thread lucier at math dot purdue dot edu


--- Comment #6 from lucier at math dot purdue dot edu  2009-04-27 15:32 
---
Subject: Re:  96% performance regression in floating
 point code; part of the problem started 2009/03/12-13

On Mon, 2009-04-27 at 15:26 +, pinskia at gcc dot gnu dot org wrote:

 This is by design -O1 is way slower than -O2 now.

I have seen no general discussion that -O1 should be destroyed as a
useful compilation option.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914



[Bug regression/39914] 96% performance regression in floating point code; part of the problem started 2009/03/12-13

2009-04-27 Thread lucier at math dot purdue dot edu


--- Comment #7 from lucier at math dot purdue dot edu  2009-04-27 15:35 
---
Subject: Re:  96% performance regression in floating
 point code; part of the problem started 2009/03/12-13

On Mon, 2009-04-27 at 15:32 +, lucier at math dot purdue dot edu
wrote:


 On Mon, 2009-04-27 at 15:26 +, pinskia at gcc dot gnu dot org wrote:
 
  This is by design -O1 is way slower than -O2 now.
 
 I have seen no general discussion that -O1 should be destroyed as a
 useful compilation option.

Perhaps I should also point out that code generated by -O2 is not
generally much faster than before, so if you believe that -O1 is much
slower than -O2 now by design, it is only by making code generated by
-O1 much slower.

BTW, this code runs in 108 ms when compiled with gcc-4.2.4 with the
given options (including -O1).

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914



[Bug regression/39914] 96% performance regression in floating point code; part of the problem started 2009/03/12-13

2009-04-27 Thread lucier at math dot purdue dot edu


--- Comment #8 from lucier at math dot purdue dot edu  2009-04-27 16:29 
---
I hadn't noticed before that Andrew had marked it as RESOLVED INVALID.

I'm reopening it, as I believe that resolving it as INVALID should require a
more general discussion than a one-line dismissal of the bug.

Brad


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914



[Bug regression/39914] [4.4/4.5 Regression] 96% performance regression in floating point code; part of the problem started 2009/03/12-13

2009-04-27 Thread lucier at math dot purdue dot edu


--- Comment #11 from lucier at math dot purdue dot edu  2009-04-27 20:37 
---
As far as I can tell, the patch proposed by Uros restores the performance of
code generated by

gcc version 4.4.0 20090312 (experimental) [trunk revision 144812] (GCC) 

In particular, the assembly code for the main loop is identical for code
generated by

gcc version 4.4.0 20090312 (experimental) [trunk revision 144801] (GCC) 

and by

gcc version 4.4.0 20090312 (experimental) [trunk revision 144812] (GCC) 

after his patch.

Thanks for getting to this so quickly.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914



[Bug regression/39914] [4.4/4.5 Regression] 96% performance regression in floating point code; part of the problem started 2009/03/12-13

2009-04-27 Thread lucier at math dot purdue dot edu


--- Comment #12 from lucier at math dot purdue dot edu  2009-04-28 01:39 
---
I tried to build and check with this patch, but I got stopped with:

/tmp/lucier/gcc/objdirs/mainline/./prev-gcc/xgcc
-B/tmp/lucier/gcc/objdirs/mainline/./prev-gcc/
-B/pkgs/gcc-mainline/x86_64-unknown-linux-gnu/bin/ -c  -g -O2 -DIN_GCC   -W
-Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wcast-qual
-Wold-style-definition -Wc++-compat -Wmissing-format-attribute -pedantic
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common
 -DHAVE_CONFIG_H -DGENERATOR_FILE -I. -Ibuild -I../../../mainline/gcc
-I../../../mainline/gcc/build -I../../../mainline/gcc/../include
-I../../../mainline/gcc/../libcpp/include
-I/tmp/lucier/gcc/objdirs/mainline/./gmp -I/tmp/lucier/gcc/mainline/gmp
-I/tmp/lucier/gcc/objdirs/mainline/./mpfr -I/tmp/lucier/gcc/mainline/mpfr 
-I../../../mainline/gcc/../libdecnumber
-I../../../mainline/gcc/../libdecnumber/bid -I../libdecnumber-o build/vec.o
../../../mainline/gcc/vec.c
cc1: warnings being treated as errors
../../../mainline/gcc/vec.c: In function ‘vec_descriptor’:
../../../mainline/gcc/vec.c:116: error: enum conversion when passing argument 3
of ‘htab_find_slot’ is invalid in C++
../../../mainline/gcc/../include/hashtab.h:172: note: expected ‘enum
insert_option’ but argument is of type ‘int’
make[3]: *** [build/vec.o] Error 1


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914



[Bug regression/39914] New: 96% performance regression in floating point code; part of the problem started 2009/03/12-13

2009-04-26 Thread lucier at math dot purdue dot edu
 60%
performance regression, the rest is accounte for by

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

Brad


-- 
   Summary: 96% performance regression in floating point code; part
of the problem started 2009/03/12-13
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: lucier at math dot purdue dot edu
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914



[Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-04-26 Thread lucier at math dot purdue dot edu


--- Comment #52 from lucier at math dot purdue dot edu  2009-04-26 18:27 
---
I narrowed down the new performance regression to code added some time around
March 12, 2009, so I changed back the subject line of this PR to reflect the
performance regression caused only by the code added 2006-11-03 and added a new
PR

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914

to reflect the effects of the March, 2009, code.


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

Summary|[4.3/4.4/4.5 Regression] 79%|[4.3/4.4/4.5 Regression] 30%
   |performance slowdown in |performance slowdown in
   |floating-point code |floating-point code caused
   |partially caused by  r118475|by  r118475


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 79% performance slowdown in floating-point code partially caused by r118475

2009-04-23 Thread lucier at math dot purdue dot edu


--- Comment #49 from lucier at math dot purdue dot edu  2009-04-23 15:58 
---
With 4.4.0 and with mainline this code now runs in 280 ms instead of in 156 ms
with 4.2.4.

Since 280/156 = 1.794871794871795 I changed the subject line (the slowdown is
now not completely caused by r118475).

I guess I'll post the assembly code generated by 4.4.0 in the next attachment.

Timings (best of three runs) for the last

(time (direct-fft-recursive-4 a table))

from

 gsi/gsi -e '(define a (time (expt 3 1000)))(define b (time (* a a)))'

With gcc-4.1.2:

188 ms cpu time (188 user, 0 system)

With gcc-4.2.4

156 ms cpu time (152 user, 4 system)

With gcc-4.3.3:

180 ms cpu time (180 user, 0 system)

With gcc-4.4.0

280 ms cpu time (280 user, 0 system)

With 4.5.0 20090423 (experimental) [trunk revision 146634]

280 ms cpu time (280 user, 0 system)


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

Summary|[4.3/4.4/4.5 Regression] 30%|[4.3/4.4/4.5 Regression] 79%
   |performance slowdown in |performance slowdown in
   |floating-point code caused  |floating-point code
   |by  r118475 |partially caused by  r118475


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 79% performance slowdown in floating-point code partially caused by r118475

2009-04-23 Thread lucier at math dot purdue dot edu


--- Comment #50 from lucier at math dot purdue dot edu  2009-04-23 16:00 
---
Created an attachment (id=17685)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17685action=view)
direct.s generated by 4.4.0


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 79% performance slowdown in floating-point code partially caused by r118475

2009-04-23 Thread lucier at math dot purdue dot edu


--- Comment #51 from lucier at math dot purdue dot edu  2009-04-23 16:03 
---
Forgot to mention, the main loop starts at .L2947.

This is on

model name  : Intel(R) Core(TM)2 Duo CPU E6550  @ 2.33GHz

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115

2009-03-31 Thread lucier at math dot purdue dot edu


--- Comment #5 from lucier at math dot purdue dot edu  2009-03-31 12:38 
---
You have --disable-bootstrap, so my guess is that cc1 is a 32-bit binary if
that's what your system compiler builds by default.  By bootstrapping you get a
64-bit binary (the first cc1 built in the bootstrap is 32-bit, but the second
and third are 64-bit).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301



[Bug middle-end/39301] ICE in register_overhead, at bitmap.c:115

2009-03-27 Thread lucier at math dot purdue dot edu


--- Comment #3 from lucier at math dot purdue dot edu  2009-03-27 15:12 
---
I'm still seeing it with:

[luc...@descartes ~]$ /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: powerpc64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c --enable-gather-detailed-mem-stats --with-cpu=default64
Thread model: posix
gcc version 4.4.0 20090327 (experimental) [trunk revision 145100] (GCC) 

as

[luc...@descartes compiler.i-test]$
/pkgs/gcc-mainline/libexec/gcc/powerpc64-unknown-linux-gnu/4.4.0/cc1
-I../include -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -O1
-fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing
-fwrapv -fomit-frame-pointer -fPIC -fno-common compiler.i
 btowc wctob mbrlen {GC 5325k - 3526k} {GC 5325k - 4483k} code_size
___H__20_compiler_2e_o1 {GC 201152k - 113339k} ___init_proc
20_compiler_2e_o1
Analyzing compilation unit
 {GC 181409k - 135700k}Performing interprocedural optimizations
 visibility early_local_cleanups {GC 237979k - 236431k} summary generate
inline static-var pure-constAssembling functions:
 code_size ___init_proc 20_compiler_2e_o1 ___H__20_compiler_2e_o1 {GC
349493k - 288659k} {GC 406233k - 272085k}
compiler.c: In function â:
compiler.c:322876: internal compiler error: in register_overhead, at
bitmap.c:115
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

I have to admit I didn't see it with an x86-64 compiler; perhaps the ppc64 port
is more complicated and requires more bitmaps.

I suspect, given the error message, that you built a 32-bit compiler and ran
out of memory space before you hit this problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301



[Bug c/39301] New: ICE in register_overhead, at bitmap.c:115

2009-02-25 Thread lucier at math dot purdue dot edu
With this compiler:

[luc...@descartes gambc-v4_4_1-devel]$ /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: powerpc64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c --enable-gather-detailed-mem-stats --with-cpu=default64
Thread model: posix
gcc version 4.4.0 20090224 (experimental) [trunk revision 144414] (GCC) 

with compiler.i found at

http://www.math.purdue.edu/~lucier/bugzilla/8

and this command line:

[luc...@descartes gambc-v4_4_1-devel]$ gdb
/pkgs/gcc-mainline/libexec/gcc/powerpc64-unknown-linux-gnu/4.4.0/cc1
(gdb) run  -I../include -Wall -W -Wno-unused -O1 -fno-math-errno
-fschedule-insns2 -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math
-fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common  compiler.i

one gets an ICE 

Starting program:
/pkgs/gcc-mainline/libexec/gcc/powerpc64-unknown-linux-gnu/4.4.0/cc1
-I../include -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -O1
-fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing
-fwrapv -fomit-frame-pointer -fPIC -fno-common  compiler.i
 btowc wctob mbrlen {GC 5504k - 3345k} {GC 5325k - 4387k} code_size
___H__20_compiler_2e_o1 {GC 202396k - 113348k} ___init_proc
20_compiler_2e_o1
Analyzing compilation unit
 {GC 182571k - 135708k}Performing interprocedural optimizations
 visibility early_local_cleanups {GC 237987k - 236439k} summary generate
inline static-var pure-constAssembling functions:
 code_size ___init_proc 20_compiler_2e_o1 ___H__20_compiler_2e_o1 {GC
349654k - 288661k} {GC 406235k - 272087k}
compiler.c: In function ‘___H__20_compiler_2e_o1’:
compiler.c:322876: internal compiler error: in register_overhead, at
bitmap.c:115

I'm sorry the test case is enormous, but it runs in about a GB of RAM. I also
haven't been able to figure out how to use gdb properly in this mixed
ppc32/ppc64 environment.


-- 
   Summary: ICE in register_overhead, at bitmap.c:115
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: lucier at math dot purdue dot edu
 GCC build triplet: powerpc64-unknown-linux-gnu
  GCC host triplet: powerpc64-unknown-linux-gnu
GCC target triplet: powerpc64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301



[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines

2009-02-21 Thread lucier at math dot purdue dot edu


--- Comment #104 from lucier at math dot purdue dot edu  2009-02-21 18:56 
---
Subject: Re:  [4.3/4.4 Regression] Inordinate
 compile times on large routines

Cool, that leaves me with

  DFS = ???
  SCC = ? Confict ?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines

2009-02-20 Thread lucier at math dot purdue dot edu


--- Comment #98 from lucier at math dot purdue dot edu  2009-02-20 19:52 
---
Thank you, that indeed fixes the LICM problem.

Based on some comments for this PR and for PR 39157 I thought that a similar
patch might apply to PRE.  So with

euler-14% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/pkgs/gcc-mainline --enable-languages=c
--enable-gather-detailed-mem-stats
Thread model: posix
gcc version 4.4.0 20090220 (experimental) [trunk revision 144328] (GCC) 

I ran this command

/pkgs/gcc-mainline/bin/gcc -v -c -O2 -fmem-report -ftime-report compiler.i
-save-temps   ! report-compiler

where compiler.i is found at

http://www.math.purdue.edu/~lucier/bugzilla/8/

and I killed the job after it required 17GB of RAM.  This job compiles just
fine with

euler-15% /pkgs/gcc-4.1.2/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/pkgs/gcc-4.1.2
Thread model: posix
gcc version 4.1.2

in about 1.5 GB of RAM.

To derive some statistics I ran

/pkgs/gcc-mainline/bin/gcc -v -c -O2 -fmem-report -ftime-report _num.i
-save-temps   ! report-num

where the smaller file _num.i is also found at

http://www.math.purdue.edu/~lucier/bugzilla/8/

I'll attach report-num to this PR.  The highlights are

 PRE   :  23.28 (24%) usr   0.01 ( 0%) sys  23.51 (24%) wall   
 681 kB ( 0%) ggc
 integrated RA :  12.70 (13%) usr   0.00 ( 0%) sys  12.83 (13%) wall   
3709 kB ( 2%) ggc
 TOTAL :  95.93 2.7399.72
227422 kB

and that's about it, nothing else above 5%.  There are also accurate memory
statistics, as I've added a patch to my local sources so that memory statistics
don't overflow 32-bit counters.

I think the -O1 and -O2 limits for LICM are quite reasonable; would it be
possible to limit PRE similarly so that one could compile compiler.i with -O2
in a reasonable amount of memory?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines

2009-02-20 Thread lucier at math dot purdue dot edu


--- Comment #99 from lucier at math dot purdue dot edu  2009-02-20 19:54 
---
Created an attachment (id=17336)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17336action=view)
Memory and CPU statistics when compiling _num.i with -O2


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines

2009-02-20 Thread lucier at math dot purdue dot edu


--- Comment #100 from lucier at math dot purdue dot edu  2009-02-20 19:56 
---
The large memory requirements for LICM at -O1 and -O2 is still a regression for
the 4.2 and 4.3 branches.  Jakub's patch is short and elegant; do you think it
would be a good idea to backport it to the other open branches?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines

2009-02-14 Thread lucier at math dot purdue dot edu


--- Comment #93 from lucier at math dot purdue dot edu  2009-02-14 21:58 
---
Subject: Re:  [4.3/4.4 Regression] Inordinate compile times on large routines

I instrumented the compiler and looked how many nodes were in each  
loop processed by LICM for the Gambit runtime and compiler.

For generated code, except for the loop that contained the entire  
function, the greatest number of nodes was 30.  (Because computed  
gotos are used in the code that checks for heap and stack overflows  
after allocations and for waiting interrupts, it's hard to go long in  
Scheme code without hitting the big loop.)  For hand-written code,  
the greatest number of nodes in a loop was 123.

When bootstrapping gcc with --enable-languages=c, the largest number  
of nodes in a loop was 803, and there were 12 loops detected that had  
over 500 nodes.  548 loops had 100 nodes or greater. (This is a  
bootstrap, so some files were compiled twice with the instrumented  
compiler.)

So perhaps an -O1 default for LICM of 100 nodes is reasonable, or  
perhaps one might up it to 1000 just to catch everything reasonable.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines

2009-02-13 Thread lucier at math dot purdue dot edu


--- Comment #86 from lucier at math dot purdue dot edu  2009-02-13 15:40 
---
Subject: Re:  [4.3/4.4 Regression] Inordinate
 compile times on large routines

It's unfortunate that the discussion from 39157 will be somewhat hard to
find now that that bug is closed.

Steven wrote in a comment for 39157:

It's not like there will not be any loop invariant code motion
(LICM) at all anymore if the RTL LICM pass is disabled.  There
is an LICM pass on GIMPLE, and there is also PRE for GIMPLE (and
lazy code motion for RTL but I think it disables itself for your
test case).

The RTL LICM pass mostly cleans up after expand, i.e. moves
things that are not exposed in GIMPLE. This is mostly just
address calculations.


The loop in _num.i that I mentioned in

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157#c19

is the loop in PR 33928 that is no longer fully optimized after Paolo
(and you, I guess, your name is on the patch) added PRE and disabled
some optimizations in CSE, and what is no longer optimized in that loop
are address calculations.  I don't know whether those address
calculations fall under LICM, the only point I'm trying to make right
now is that address calculations are no longer optimized as much as they
were before 

http://gcc.gnu.org/viewcvs?root=gccview=revrev=118475

and address calculations are an important class of calculations to
optimize.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-02-13 Thread lucier at math dot purdue dot edu


--- Comment #45 from lucier at math dot purdue dot edu  2009-02-13 16:09 
---
Subject: Re:  [4.3/4.4 Regression] 30%
 performance slowdown in floating-point code caused by  r118475

On Fri, 2009-02-13 at 16:05 +, bonzini at gnu dot org wrote:
 --- Comment #44 from bonzini at gnu dot org  2009-02-13 16:05 ---
 A simplified (local, noncascading) fwprop not using UD chains would not be 
 hard
 to do...  Basically, at -O1 use FOR_EACH_BB/FOR_EACH_BB_INSN instead of 
 walking
 the uses, keep a (regno, insn) map of pseudos (cleared at the beginning of
 every basic block), and use that info instead of UD chains in
 use_killed_between...

As noted in comment 42, enabling FWPROP on this test case does not fix
the performance problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines

2009-02-13 Thread lucier at math dot purdue dot edu


--- Comment #90 from lucier at math dot purdue dot edu  2009-02-13 17:37 
---
Subject: Re:  [4.3/4.4 Regression] Inordinate
 compile times on large routines

On Fri, 2009-02-13 at 16:54 +, bonzini at gnu dot org wrote:
 
 
 --- Comment #87 from bonzini at gnu dot org  2009-02-13 16:54 ---

 The problem is that -O1 was never meant to give very fast code.

I'm not looking for very fast code, I'm looking for code that doesn't
get  30% slower from one SVN revision number to the next.

 You
 are using it only because our throttling of expensive passes is
 insufficient.

I am using -O1 because code of this type compiled with -O2 runs
significantly more slowly than code of this type compiled with -O1. I
have never used -O2 on this type of code.

 Fixing that has two sides, as done in PR39157's
 discussion: 1) disabling more passes at -O1, 2) establishing some
 parameters to throttle down passes at -O2.

I don't see that (1) and (2) form the main strategy to fix that, it
seems that understanding the existing optimizations that are being
disabled in preference for new ones is a good start.  And generally
ensuring that -O1 code doesn't get significantly slower while compile
times get significantly higher.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines

2009-02-13 Thread lucier at math dot purdue dot edu


--- Comment #91 from lucier at math dot purdue dot edu  2009-02-13 17:43 
---
Subject: Re:  [4.3/4.4 Regression] Inordinate
 compile times on large routines

On Fri, 2009-02-13 at 17:37 +, lucier at math dot purdue dot edu
wrote:
 --- Comment #90 from lucier at math dot purdue dot edu  2009-02-13 17:37 
 ---
 Subject: Re:  [4.3/4.4 Regression] Inordinate
  compile times on large routines
 
 On Fri, 2009-02-13 at 16:54 +, bonzini at gnu dot org wrote:
  
  
  --- Comment #87 from bonzini at gnu dot org  2009-02-13 16:54 ---
 
  The problem is that -O1 was never meant to give very fast code.
 
 I'm not looking for very fast code, I'm looking for code that doesn't
 get  30% slower from one SVN revision number to the next.

Sorry, this comment refers to PR 33928, not this PR.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread lucier at math dot purdue dot edu


--- Comment #15 from lucier at math dot purdue dot edu  2009-02-12 16:35 
---
Some comments (a lot went on while I was sleeping):

1.  Yes, this is similar to the test case of PR26854, but the C code generator
has changed significantly since that test case was filed.  I don't know if the
changes in the code generator really affect what's happening here, however.

2.  I'm trying to get a moderately sized test case that will compile in about
3GB of RAM, as Steven requested.  (The test case from PR26854 takes at least
7GB of RAM to compile on ppc64.)

3.  If the amount of memory and cpu time required by the test case at -O1
doesn't increase significantly when loop-invariant motion is performed on loops
of size up to 10,000, then it would be good if the parameter at -O1 could be
10,000 instead of 100 (or at least larger than 100), as is suggested in the
most recent patch.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread lucier at math dot purdue dot edu


--- Comment #18 from lucier at math dot purdue dot edu  2009-02-12 19:54 
---
There is now a file slatex.i at

http://www.math.purdue.edu/~lucier/bugzilla/8/

that compiles in about 650MB of memory with gcc-4.2.3 on x86-64 with the same
options; I don't know if that will help Steven.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug middle-end/39157] Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-12 Thread lucier at math dot purdue dot edu


--- Comment #19 from lucier at math dot purdue dot edu  2009-02-12 20:51 
---
Subject: Re:  Code that compiles fine in 1GB of
 memory with 4.1.2 requires  20GB in 4.2.* and higher

On Thu, 2009-02-12 at 16:52 +, rguenth at gcc dot gnu dot org wrote:
 --- Comment #16 from rguenth at gcc dot gnu dot org  2009-02-12 16:52 
 ---
 Actually for PR26854 it is just one loop that is detected, covering all of
 the function (with approx. 56000 basic blocks and one basic-block that has
 edges to all other basic blocks in the loop).

Richard:

I'm wondering if you could look at a smaller file that's generated in a
somewhat different way.

At

http://www.math.purdue.edu/~lucier/bugzilla/8/

there's a file _num.i.gz that I think should have smaller (nested) loops
than the entire file, for example, from the label

___L189__23__23_bignum_2e__2a_:

at line 50031 to just before label

___L190__23__23_bignum_2e__2a_:

at line 50105.

Moving loop invariants out of this loop might help if it detected as a
loop, but I don't know how to check whether it is.

Perhaps you might check and report whether this small loop is treated as
a loop or whether, again, the entire function is the only loop
detected.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug bootstrap/39173] New: PR37739 (bootstrap failure) applies to 4.3.3

2009-02-12 Thread lucier at math dot purdue dot edu
PR 37739 applies to 4.3.3, as does the fix (applied by hand to my sources).

I'm running make check right now with the patched sources.


-- 
   Summary: PR37739 (bootstrap failure) applies to 4.3.3
   Product: gcc
   Version: 4.3.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: lucier at math dot purdue dot edu
 GCC build triplet: powerpc64-unknown-linux-gnu
  GCC host triplet: powerpc64-unknown-linux-gnu
GCC target triplet: powerpc64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39173



[Bug bootstrap/39173] PR37739 (bootstrap failure) applies to 4.3.3

2009-02-12 Thread lucier at math dot purdue dot edu


--- Comment #1 from lucier at math dot purdue dot edu  2009-02-12 22:45 
---
The test suite has finished (I only built the C compiler), and results are at

http://gcc.gnu.org/ml/gcc-testresults/2009-02/msg01220.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39173



[Bug bootstrap/37739] [4.4 Regression] bootstrap broken with core gcc gcc-4.2.x

2009-02-11 Thread lucier at math dot purdue dot edu


--- Comment #12 from lucier at math dot purdue dot edu  2009-02-11 18:13 
---
I just got the same error with

   140  12:54   ../../gcc-4.3.3/configure --prefix=/pkgs/gcc-4.3.3
--enable-languages=c
   141  12:54   make -j 4 bootstrap   build.log 

trying to build gcc-4.3.3 with

[luc...@descartes gcc-4.3.3]$ gcc -v
Using built-in specs.
Target: ppc64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla
--enable-bootstrap --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk
--disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
--enable-secureplt --with-long-double-128 --build=ppc64-redhat-linux
--target=ppc64-redhat-linux --with-cpu=default32
Thread model: posix
gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) 

So, if it was fixed on mainline, it wasn't fixed on the branch.

Should I just reopen this against 4.3.3, or should I file a new bug report for
4.3.3 and refer back to this one.


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

 CC||lucier at math dot purdue
   ||dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37739



[Bug middle-end/39157] New: Code that compiles fine in 1GB of memory with 4.1.2 requires 20GB in 4.2.* and higher

2009-02-11 Thread lucier at math dot purdue dot edu
With this compiler

[luc...@descartes gambit]$ gcc -v
Using built-in specs.
Target: powerpc64-unknown-linux-gnu
Configured with: ../../gcc-4.3.3/configure --prefix=/pkgs/gcc-4.3.3
--enable-languages=c --with-cpu=default64
Thread model: posix
gcc version 4.3.3 (GCC) 

with the file compiler.i found here:

http://www.math.purdue.edu/~lucier/bugzilla/8/

attempting to compile with these options:

gcc -m64 -mcpu=970 -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2
-fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC
-fno-common -rdynamic -shared

can't compile in 8GB of RAM.  With this compiler:

euler-77% /pkgs/gcc-4.2.3/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../gcc-4.2.3/configure --prefix=/pkgs/gcc-4.2.3
--enable-checking=release --with-gmp=/pkgs/gmp-4.2.2
--with-mpfr=/pkgs/gmp-4.2.2
Thread model: posix
gcc version 4.2.3

and these options:

gcc -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2
-fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC
-fno-common -mieee-fp -rdynamic -shared

it can't compiler in 20GB of RAM.  (That machine has only 16GB of RAM, so I
killed the compile when it hit 20GB of physical+virtual memory.)

It compiles just fine in about 1GB of RAM with 

euler-76% gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/pkgs/gcc-4.1.2
Thread model: posix
gcc version 4.1.2

compiler.i is the output from the Gambit Scheme-C compiler; the source scheme
program is from a standard benchmark suite for Scheme compilers.  So I found
this by trying to change the code generator for Gambit and running the
benchmark suite on x86_64.

I don't know how this can be fixed.  Basically, the entire middle-end
infrastructure since 4.1.* is telling people like me with computer-generated
code like this to just go away (to put it very politely).  On Mac OS X 10.5.*,
Apple bundles their version of 4.0.1, which compiles this just fine; on Red Hat
5.2, they bundle their version of 4.1.2 (I think, my RH5.2 box is down at the
moment), which compiles this just fine; but on Ubuntu 8.10 or Fedora 10 you
can't compile this because they bundle newer compilers.  (I guess I'll see if I
can install 4.1.* on both of these.)

As a stopgap measure, perhaps someone can tell me what optimization level to
use.  As you can see, I use -O1 and a few others (mainly -fschedule-insns2). 
gcc 4.1.* and earlier compiled something like this just fine, but -O1 must mean
something different now.


-- 
   Summary: Code that compiles fine in 1GB of memory with 4.1.2
requires  20GB in 4.2.* and higher
   Product: gcc
   Version: 4.3.3
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: lucier at math dot purdue dot edu
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157



[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines

2009-02-04 Thread lucier at math dot purdue dot edu


--- Comment #81 from lucier at math dot purdue dot edu  2009-02-04 17:27 
---
Created an attachment (id=17243)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17243action=view)
Memory and CPU statistics for 2009/02/04


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug tree-optimization/26854] [4.3/4.4 Regression] Inordinate compile times on large routines

2009-02-04 Thread lucier at math dot purdue dot edu


--- Comment #82 from lucier at math dot purdue dot edu  2009-02-04 17:28 
---
I still have the bitmap.c patch from

http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01270.html

in my tree so I don't get meaningless statistics for bitmaps.  (Kenny installed
in the trunk something like the patch above for alloc-pool.c.)

There are more bitmaps allocated than on 2008-09-26 (13GB instead of 12GB).

3GB was allocated in alloc-pool.

Execution time was worse, 228.17 user seconds versus 168 seconds.

I didn't watch top to estimate the maximum memory usage.

This is with

euler-8% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/pkgs/gcc-mainline --enable-languages=c
--enable-gather-detailed-mem-stats
Thread model: posix
gcc version 4.4.0 20090204 (experimental) [trunk revision 143922] (GCC) 

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854



[Bug bootstrap/26814] Bootstrapping with a non default ABI (-m64 on ppc-darwin or on ppc-linux with a compiler defaulting to 32 and now defaulting to 64)

2008-12-28 Thread lucier at math dot purdue dot edu


--- Comment #19 from lucier at math dot purdue dot edu  2008-12-29 01:30 
---
Maybe you could offer a few more details; I just tried

% cat ../../mainline/build-and-check-gcc-64-32
#!/bin/tcsh
/bin/rm -rf *; ../../mainline/configure CC='/usr/bin/gcc-4.0 -mcpu=970 -m64'
--build=powerpc64-apple-darwin9.6.0 --host=powerpc64-apple-darwin9.6.0
--target=powerpc-apple-darwin9.6.0 --with-gmp-include=/sw/include/
--with-gmp-lib=/sw/lib/ppc64 --with-mpfr-include=/sw/include/
--with-mpfr-lib=/sw/lib/ppc64 --prefix=/pkgs/gcc-4.4.0-64-32
--with-libiconv-prefix=/usr  --with-system-zlib; make -j 4
BOOT_LDFLAGS='-Wl,-search_paths_first'  build.log  (make install)  (make
-k -j 8 check RUNTESTFLAGS=--target_board 'unix{-mcpu=970/-m64}'  
check.log ; make mail-report.log)

(make bootstrap isn't even available) and ended up with

checking for powerpc-apple-darwin9.6.0-gcc...
/Users/lucier/programs/gcc/objdirs/mainline/./gcc/xgcc
-B/Users/lucier/programs/gcc/objdirs/mainline/./gcc/
-B/pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/bin/
-B/pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/lib/ -isystem
/pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/include -isystem
/pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/sys-include
checking for suffix of object files... configure: error: in
`/Users/lucier/programs/gcc/objdirs/mainline/powerpc-apple-darwin9.6.0/libgcc':
configure: error: cannot compute suffix of object files: cannot compile
See `config.log' for more details.

while config.log gives

configure:2611: /Users/lucier/programs/gcc/objdirs/mainline/./gcc/xgcc
-B/Users/lucier/programs/gcc/objdirs/mainline/./gcc/
-B/pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/bin/
-B/pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/lib/ -isystem
/pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/include -isystem
/pkgs/gcc-4.4.0-64-32/powerpc-apple-darwin9.6.0/sys-include -c -g -O2   
conftest.c 5
/Users/lucier/programs/gcc/objdirs/mainline/./gcc/as: line 76: exec: : not
found
configure:2614: $? = 1
configure: failed program was:
| /* confdefs.h.  */
|
| #define PACKAGE_NAME GNU C Runtime Library
| #define PACKAGE_TARNAME libgcc
| #define PACKAGE_VERSION 1.0
| #define PACKAGE_STRING GNU C Runtime Library 1.0
| #define PACKAGE_BUGREPORT 
| /* end confdefs.h.  */
|
| int
| main ()
| {
|
|   ;
|   return 0;
| }
configure:2627: error: in
`/Users/lucier/programs/gcc/objdirs/mainline/powerpc-apple-darwin9.6.0/libgcc':
configure:2630: error: cannot compute suffix of object files: cannot compile
See `config.log' for more details.

It appears to be looking for a special as.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26814



[Bug bootstrap/26814] Bootstrapping with a non default ABI (-m64 on ppc-darwin or on ppc-linux with a compiler defaulting to 32 and now defaulting to 64)

2008-12-28 Thread lucier at math dot purdue dot edu


--- Comment #21 from lucier at math dot purdue dot edu  2008-12-29 03:06 
---
Thanks for your comments.

So, to get back to basics, how do I build a compiler on darwin that has a
64-bit gcc/cc1/etc., but compiles to 32-bit binaries by default?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26814



[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475

2008-12-07 Thread lucier at math dot purdue dot edu


--- Comment #42 from lucier at math dot purdue dot edu  2008-12-07 19:39 
---
Just a comment that -fforward-propagate isn't enabled at -O1 (the main
optimization option in the test) while the cse code it replaces was enabled at
-O1.  This is presumably why adding -fno-forward-propagate to the command line
in the test a year ago didn't affect the generated code.

Adding -fno-forward-propagate to the command line of the test case with
revision r118475 of gcc changes the generated code, but doesn't improve the
problem code in the main loop.

Updated the title to report the performance hit on

Intel(R) Xeon(R) CPU   X5460  @ 3.16GHz

as reported by /proc/cpuinfo


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

Summary|[4.3/4.4 Regression] 22%|[4.3/4.4 Regression] 30%
   |performance slowdown from   |performance slowdown in
   |4.2.2 to 4.3/4.4.0 in   |floating-point code caused
   |floating-point code |by  r118475


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code

2008-12-06 Thread lucier at math dot purdue dot edu


--- Comment #39 from lucier at math dot purdue dot edu  2008-12-06 16:37 
---
I may have narrowed down the problem a bit.

With this compiler (revision 118491):

pythagoras-277% /tmp/lucier/install/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/tmp/lucier/install --enable-languages=c
Thread model: posix
gcc version 4.3.0 20061105 (experimental)

one gets (on a faster machine than previous reports)

(time (direct-fft-recursive-4 a table))
133 ms real time
140 ms cpu time (140 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults

With this compiler (revision 118474):

pythagoras-24% /tmp/lucier/install/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/tmp/lucier/install --enable-languages=c
Thread model: posix
gcc version 4.3.0 20061104 (experimental)

one gets

(time (direct-fft-recursive-4 a table))
116 ms real time
108 ms cpu time (108 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults

and you see the typical problem with assembly code from direct.i with the later
compiler.

Paolo may have been right about fwprop, this patch was installed that day:

Author: bonzini
Date: Sat Nov  4 08:36:45 2006
New Revision: 118475

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=118475
Log:
2006-11-03  Paolo Bonzini  [EMAIL PROTECTED]
Steven Bosscher  [EMAIL PROTECTED]

* fwprop.c: New file.
* Makefile.in: Add fwprop.o.
* tree-pass.h (pass_rtl_fwprop, pass_rtl_fwprop_with_addr): New.
* passes.c (init_optimization_passes): Schedule forward propagation.
* rtlanal.c (loc_mentioned_in_p): Support NULL value of the second
parameter.
* timevar.def (TV_FWPROP): New.
* common.opt (-fforward-propagate): New.
* opts.c (decode_options): Enable forward propagation at -O2.
* gcse.c (one_cprop_pass): Do not run local cprop unless touching
jumps.
* cse.c (fold_rtx_subreg, fold_rtx_mem, fold_rtx_mem_1, find_best_addr,
canon_for_address, table_size): Remove.
(new_basic_block, insert, remove_from_table): Remove references to
table_size.
(fold_rtx): Process SUBREGs and MEMs with equiv_constant, make
simplification loop more straightforward by not calling fold_rtx
recursively.
(equiv_constant): Move here a small part of fold_rtx_subreg,
do not call fold_rtx.  Call avoid_constant_pool_reference
to process MEMs.
* recog.c (canonicalize_change_group): New.
* recog.h (canonicalize_change_group): New.

* doc/invoke.texi (Optimization Options): Document fwprop.
* doc/passes.texi (RTL passes): Document fwprop.


Added:
trunk/gcc/fwprop.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/common.opt
trunk/gcc/cse.c
trunk/gcc/doc/invoke.texi
trunk/gcc/doc/passes.texi
trunk/gcc/gcse.c
trunk/gcc/opts.c
trunk/gcc/passes.c
trunk/gcc/recog.c
trunk/gcc/recog.h
trunk/gcc/rtlanal.c
trunk/gcc/timevar.def
trunk/gcc/tree-pass.h


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug target/37878] [4.4 regression] PPC64 ldu command generated with invalid offset

2008-10-29 Thread lucier at math dot purdue dot edu


--- Comment #14 from lucier at math dot purdue dot edu  2008-10-30 00:02 
---
Thank you, this fixes the original bug.

I took the liberty of closing this bug report.

Thanks again.

Brad


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37878



[Bug bootstrap/37639] Bootstrap fails with may be used uninitialized warning in c-parser.c

2008-10-29 Thread lucier at math dot purdue dot edu


--- Comment #3 from lucier at math dot purdue dot edu  2008-10-30 00:19 
---
You're right, it was fixed by 

Revision 141193 - (view) (download) - [select for diffs]
Modified Fri Oct 17 14:50:07 2008 UTC (12 days, 9 hours ago) by krebbel
File length: 238566 byte(s)
Diff to previous 140914 (colored)

2008-10-17  Andreas Krebbel  [EMAIL PROTECTED]

* c-parser.c (c_parser_binary_expression): Silence the
uninitialized variable warning emitted for binary_loc.


I hadn't noticed because I started adding --disable-werror to my configuration
files.

Closing as fixed.


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37639



[Bug target/37878] [4.4 regression] PPC64 ldu command generated with invalid offset

2008-10-23 Thread lucier at math dot purdue dot edu


--- Comment #9 from lucier at math dot purdue dot edu  2008-10-23 19:20 
---
I bootstrapped and regtested the suggested patch.  There was one fewer FAIL in
the gcc tests:

FAIL: gcc.c-torture/execute/nestfunc-6.c execution,  -O0 

and one more failure in the libgomp tests:

FAIL: libgomp.fortran/crayptr2.f90  -O3 -fomit-frame-pointer -funroll-loops 
execution test

However, it's not clear to me from the output of gdb implies  that this may is
a problem with the compiled code (the command lines are taken from the log
file):

[descartes:powerpc64-apple-darwin9.5.0/libgomp/testsuite] lucier%
/Users/lucier/programs/gcc/objdirs/mainline/gcc/xgcc
-B/Users/lucier/programs/gcc/objdirs/mainline/gcc/
/Users/lucier/programs/gcc/mainline/libgomp/testsuite/libgomp.fortran/crayptr2.f90

-B/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/
-I/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp
-I/Users/lucier/programs/gcc/mainline/libgomp/testsuite/.. -shared-libgcc
-fmessage-length=0 -fopenmp  -O3 -fomit-frame-pointer -funroll-loops  -fopenmp
-fcray-pointer -static-libgcc  
-L/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/.libs
-lgomp
-L/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/../libgfortran/.libs
-lgfortranbegin -lgfortran -lm   -mcpu=970 -m64 -o ./crayptr2.exe
[descartes:powerpc64-apple-darwin9.5.0/libgomp/testsuite] lucier% env
LD_LIBRARY_PATH=.:/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/.libs:/Users/lucier/programs/gcc/objdirs/mainline/gcc:/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/../libgfortran/.libs:.:/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/.libs:/Users/lucier/programs/gcc/objdirs/mainline/gcc:/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/./libgomp/../libgfortran/.libs
gdb ./crayptr2.exe
GNU gdb 6.3.50-20050815 (Apple version gdb-962) (Sat Jul 26 08:17:57 UTC 2008)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as powerpc-apple-darwin...Reading symbols for shared
libraries  done

(gdb) run
Starting program:
/Users/lucier/programs/gcc/objdirs/mainline/powerpc64-apple-darwin9.5.0/libgomp/testsuite/crayptr2.exe
 
warning: posix_spawn failed, trying execvp, error: 86
Reading symbols for shared libraries +++.. done

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x
0x00011678 in MAIN__.omp_fn.0 ()
(gdb) where
#0  0x00011678 in MAIN__.omp_fn.0 ()
#1  0x0001187c in MAIN__ ()
#2  0x000118e4 in main (argc=1, argv=value temporarily unavailable,
due to optimizations) at ../../../../mainline/libgfortran/fmain.c:21

It is completely reproducible, however.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37878



  1   2   3   >