[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #47 from Joey dot ye at intel dot com 2009-03-12 06:51 --- (In reply to comment #46) Created an attachment (id=17444) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17444action=view) [edit] gcc.target/i386/stackalign/longlong-2.c for -mnostackalign on darwin10 /sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/ /sw/src/fink.build/gcc44-4.3.999-20090311/gcc-4.4-20090311/gcc/testsuite/gcc.target/i386/stackalign/longlong-2.c -mstackrealign -O2 -mpreferred-stack-boundary=2 -S -m32 -o longlong-2.s That's because MacOS require stack alignment to 16 byte when making call and ignores -mpreferred-stack-boundary=2. These cases should skipped for MacOS. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #35 from Joey dot ye at intel dot com 2009-03-04 01:41 --- (In reply to comment #32) I don't see the reason for optimize_function_for_size_p (cfun), care to back up with benchmarks that forcing dynamic realignment for long long variables with -mpreferred-stack-boundary=2 improves performance rather than slows things down (because of the dynamic realignment)? Checking optimize_function_for_size_p is to avoid prologue/epilogue code size increase when -Os is used, which is initially complained by Jakub. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug middle-end/39315] Unaligned move used on aligned stack variable
--- Comment #3 from Joey dot ye at intel dot com 2009-02-27 02:53 --- (In reply to comment #2) Created an attachment (id=17368) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17368action=view) [edit] A patch Does this patch make sense? It works fine. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39315
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #31 from Joey dot ye at intel dot com 2009-02-23 03:15 --- How about this patch? 1. Only reduce DI mode when -Os 2. Ignore TYPE_USER_ALIGN, so that stack realign happens for case in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137#c28, which IMHO is acceptable. Index: config/i386/i386.c === --- config/i386/i386.c (revision 5221) +++ config/i386/i386.c (working copy) @@ -19607,6 +19607,13 @@ ix86_local_alignment (tree type, enum machine_mode mode, unsigned int align) { + /* We don't want to align DImode to 64bit for compilation with + -mpreferred-stack-boundary=2 to not enforce dynamic stack alignment + prologue. */ + if (mode == DImode !TARGET_64BIT ix86_preferred_stack_boundary 64 + optimize_function_for_size_p (cfun)) +align = 32; + /* If TYPE is NULL, we are allocating a stack slot for caller-save register in MODE. We will return the largest alignment of XF and DF. */ @@ -19616,6 +19623,12 @@ align = GET_MODE_ALIGNMENT (DFmode); return align; } + if (!TARGET_64BIT + optimize_function_for_size_p (cfun) + align == 64 + ix86_preferred_stack_boundary 64 + (mode == DImode || (type TYPE_MODE (type) == DImode))) +align = 32; /* x86-64 ABI requires arrays greater than 16 bytes to be aligned to 16byte boundary. */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #20 from Joey dot ye at intel dot com 2009-02-17 09:18 --- (In reply to comment #19) Just for the record, here is an unsuccessful attempt to avoid stack realignment just because of DImode for -m32 or because of DFmode at -m32 -Os. This patch unfortunately caused a handful regressions, like 20020220-1.c. Is it OK to enable this patch with a new option? Defaultly not to realign a mode (DImode) to its nature boundary is confusing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39146] Unnecessary stack alignment
--- Comment #12 from Joey dot ye at intel dot com 2009-02-16 08:49 --- Created an attachment (id=17305) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17305action=view) New patch attached Test finished. No regression with emx_avx_sim. Wait to checkin to 4.5 -- Joey dot ye at intel dot com changed: What|Removed |Added Attachment #17283|0 |1 is obsolete|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146
[Bug target/39146] Unnecessary stack alignment
--- Comment #10 from Joey dot ye at intel dot com 2009-02-12 15:20 --- (In reply to comment #8) We still have push and mov. I guess it may be the best we can do. But please run full 32 and 64bit testsuite with your patch as well as under emx-avx-sim. full 32/64 bit test pass with no regression {-m32, -m32 -mstackrealign -mpreferred-stack-boundary=4, -m64}. Haven't tested emx-avx-sim test yet. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146
[Bug target/39146] Unnecessary stack alignment
--- Comment #5 from Joey dot ye at intel dot com 2009-02-12 01:45 --- Stack realign is finalized by stack_realign = (incoming_stack_boundary (current_function_is_leaf ? crtl-max_used_stack_slot_alignment : crtl-stack_alignment_needed)); since bar is leaf function, it checks max_used_stack_slot_alignment. According to it's definition, max_used_stack_slot_alignment is /* The largest alignment of slot allocated on the stack. */. Parameter x isn't allocated on local stack, so max_used_stack_slot_alignment shouldn't be set to 256 bits. In locate_and_pad_parm, if (crtl-max_used_stack_slot_alignment crtl-stack_alignment_needed) crtl-max_used_stack_slot_alignment = crtl-stack_alignment_needed; sets max_used_stack_slot_alignment to 256 bits, which seems shouldn't happen all the time. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146
[Bug target/39146] Unnecessary stack alignment
--- Comment #7 from Joey dot ye at intel dot com 2009-02-12 02:26 --- Created an attachment (id=17283) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17283action=view) A patch to fix this problem Impact to other test unknown. Test undergoing. HJ, can you also help to verify and test this patch? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146
[Bug target/39148] -Os increase code size when stack is aligned
--- Comment #6 from Joey dot ye at intel dot com 2009-02-12 02:33 --- (In reply to comment #5) If ACCUMULATE_OUTGOING_ARGS is off, ECX will be used for stack alignment and it may lead to code size increase due to register spill since ia32 has very few registers. The code increase resulted from stack realign are mainly from prologue increase. ECX is only used as hard register in prologue/epilogue and the impact to function body is low. If ACCUMULATE_OUTGOING_ARGS does increase code size, then for big functions, benefit of !ACCUMULATE_OUTGOING_ARGS will offset increase of prologue/epilogue. So simply enable ACCUMULATE_OUTGOING_ARGS for stack realign isn't be the best option for all cases either. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39148
[Bug target/39146] Unnecessary stack alignment
--- Comment #9 from Joey dot ye at intel dot com 2009-02-12 02:40 --- (In reply to comment #8) We still have push and mov. I guess it may be the best we can do. I believe so too. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #10 from Joey dot ye at intel dot com 2009-02-11 01:03 --- (In reply to comment #9) Created an attachment (id=17279) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17279action=view) [edit] A patch to add a new -malign-double= option This patch looks OK to me. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39146] Unnecessary stack alignment
--- Comment #1 from Joey dot ye at intel dot com 2009-02-10 05:35 --- Argument need 32 bytes alignment, No way to guarantee the argument won't be spilled. That's why stack adjustment is there. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146
[Bug target/39082] union with long double doesn't follow x86-64 psABI
--- Comment #1 from Joey dot ye at intel dot com 2009-02-04 02:17 --- GCC doesn't follow x86-64 psABI on this case. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39082
[Bug target/38952] [4.4 Regression] EH does not work.
--- Comment #20 from Joey dot ye at intel dot com 2009-01-26 11:49 --- (In reply to comment #10) This is caused by stack alignment change, revision 138335. Joey and Xuepeng will look into it after holiday, Feb. 1. This must be stack alignment change. Looks we didn't handle stack unwinding on Cygwin correctly. Dave, comparing the the EH mechanism in Linux, what's the difference of SjLj EH in Cygwin? Answer to this question might help solving the problem sooner. Thanks - Joey -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38952
[Bug target/38899] pessimizes function without SSE intrinsics
--- Comment #2 from Joey dot ye at intel dot com 2009-01-21 02:40 --- Following case isn't vecterized with -O3 on x86_64 either, although arrays are aligned: #include stdio.h float __attribute__((aligned(16))) in1[] = { 1.2, 3.5, 1.7, 2.8 }; float __attribute__((aligned(16))) in2[] = { -0.7, 2.6, 3.3, -4.0 }; float __attribute__((aligned(16))) out[4]; void __attribute__((noinline)) mul() { int i; for (i = 0; i 4; i++) out[i] = in1[i] * in2[i]; } int main(void) { mul(); printf(%f %f %f %f\n, out[0], out[1], out[2], out[3]); return 0; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38899
[Bug tree-optimization/38785] huge performance regression on EEMBC bitmnp01
--- Comment #7 from Joey dot ye at intel dot com 2009-01-14 10:08 --- (In reply to comment #5) Joern, re. comment #4, Richi refers to my patch to enable PRE at -Os, see [1]. An extension to this patch that we tested on x86 machines, is to disable PRE for scalar integer registers, via SMALL_REGISTER_CLASSES. I changed SMALL_REGISTER_CLASSES into a target hook for this purpose, see [2]. You could play with this, see if you can use this to cure your problem... [1] http://gcc.gnu.org/ml/gcc-patches/2008-12/msg00199.html [2] http://gcc.gnu.org/ml/gcc-patches/2008-12/msg00590.html Reproduced on x86. But I fail to build with patch [2] on x86_64, anything wrong? ../../src/gcc/target-def.h:476:1: error: unterminated #ifndef ../../src/gcc/c-common.c:8197: error: 'TARGETCM_INITIALIZER' undeclared here (not in a function) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785
[Bug target/38736] [4.4 Regression] -mavx can change the ABI via BIGGEST_ALIGNMENT
--- Comment #5 from Joey dot ye at intel dot com 2009-01-07 02:45 --- More places with BIGGEST_ALIGN: $ grep -r (aligned) .|grep attribute|grep -v testsuite|grep -v texi ./libstdc++-v3/libsupc++/eh_alloc.cc:typedef char one_buffer[EMERGENCY_OBJ_SIZE] __attribute__((aligned)); ./libjava/exception.cc: char end[0] __attribute__((aligned)); ./libjava/exception.cc:__attribute__((aligned)); ./gcc/unwind-sjlj.c: jmp_buf jbuf __attribute__((aligned)); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38736
[Bug target/33604] [4.3/4.4 Regression] Revision 119502 causes significantly slower results with 4.3/4.4 compared to 4.2
--- Comment #45 from Joey dot ye at intel dot com 2008-12-30 01:49 --- (In reply to comment #44) Does anyone have new numbers? Fixed on both i386/x86_64: x86_64: 4.4 (trunk 142847): 5.4s 4.3.2 release: 5.4s 4.2.4 release: 5.4s i386: 4.4 (trunk 142847): 2.7s 4.3.2 release: 2.8s 4.2.4 release: 2.7s -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33604
[Bug rtl-optimization/37397] [4.4 Regression] IRA performance impact on SPEC CPU 2K/2006
--- Comment #6 from Joey dot ye at intel dot com 2008-12-30 02:50 --- (In reply to comment #4) Revision 141860 caused 30% slowdown on 454.calculix in SPEC CPU 2006 with -O2 -ffast-math on Linux/Intel64. This regression has been fixed in some revision between 142187 and 142212. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37397
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code
--- Comment #12 from Joey dot ye at intel dot com 2008-12-10 03:01 --- Fixed at trunk 142631 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/38280] [4.4 regression] Revision 142207 breaks 416.gamess/481.wrf in SPEC CPU 2006
--- Comment #8 from Joey dot ye at intel dot com 2008-12-01 02:18 --- Yes. It fixes 416/481 on 32 bits and 481 on 64 bits. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38280
[Bug rtl-optimization/38280] [4.4 regression] Revision 142207 breaks 416.gamess/481.wrf in SPEC CPU 2006
--- Comment #6 from Joey dot ye at intel dot com 2008-11-28 15:11 --- Patch at http://gcc.gnu.org/ml/gcc-patches/2008-11/msg01428.html fixed this regression. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38280
[Bug rtl-optimization/38280] [4.4 regression] Revision 142207 breaks 416.gamess/481.wrf in SPEC CPU 2006
--- Comment #4 from Joey dot ye at intel dot com 2008-11-28 03:39 --- 142250 doesn't fix this regression. 416.gamess and 481.wrf still fail. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38280
[Bug target/38201] -mfma/-mavx and -msse5/-msse4a don't work together
--- Comment #8 from Joey dot ye at intel dot com 2008-11-21 12:00 --- In short, set A={-favx, -ffma}, set B={-f3dnow, -f3dnowa, -fsse4a, -fsse5}. Any option combination from both sets should be prohibited. Please add more options into these set in case I missed any. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38201
[Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
--- Comment #23 from Joey dot ye at intel dot com 2008-10-28 01:19 --- (In reply to comment #22) Created an attachment (id=16571) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16571action=view) [edit] A patch to re-enable regmove After applying this patch to re-enable regmove, I got [EMAIL PROTECTED] gcc]$ ./xgcc -B./ -O2 -mtune=core2 /tmp/foo.c -o noira -fno-ira -m32 HJ, is your foo.c the case attached in comment #18? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
[Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
--- Comment #18 from Joey dot ye at intel dot com 2008-10-24 08:36 --- Created an attachment (id=16536) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16536action=view) Reduced performance case from cpu2006/454.calculix 50% regression with IRA core2 on trunk revsion 140514 and 141335 $ gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --disable-bootstrap --enable-languages=c,c++,fortran --enable-checking=assert Thread model: posix gcc version 4.4.0 20081024 (experimental) [trunk revision 141335] (GCC) $ gcc -m32 -O2 -mssse3 -mfpmath=sse 36.c $ time -p ./a.out real 7.97 $ gcc -m32 -O2 -mssse3 -mfpmath=sse -mtune=core2 -o core2.exe 36.c $ time -p ./core2.exe real 12.27 $ gcc -m32 -O2 -mssse3 -mfpmath=sse -mtune=core2 -fno-ira -o no-ira.exe 36.c $ time -p ./no-ira.exe real 8.03 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
[Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
--- Comment #21 from Joey dot ye at intel dot com 2008-10-25 04:14 --- To me scheduler is irrelevant here. GCC has no core2 pipeline description so the instruction scheduling doesn't looks optimized. But for OOO processor like core2, IMHO scheduling shouldn't make that much difference. Also core2 + no-ira doesn't hurt, which means core2 scheduling is not the root cause. Instead old code uses different register for loading, but IRA code always uses xmm7 as load target. Need to figure out two questions: 1. why instructions from core2+ira runs slower than ira? 2. why core2+ira generate so different code as non-core2? Scheduler dump for core2: ;; insn codebb dep prio cost reservation ;; -- --- --- ;; 10847 4 0 0 0 nothing : 70 109 43 ;; 43 102 4 1 0 0 nothing : 70 51 117 114 67 109 ;; 10947 4 2 0 0 nothing : 70 44 ;; 44 102 4 1 0 0 nothing : 70 57 55 59 67 ;; 45 102 4 0 0 0 nothing : 70 65 67 112 110 ;; 46 102 4 0 0 0 nothing : 70 55 49 67 61 ;; 110 102 4 1 0 0 nothing : 70 65 61 ;; 61 720 4 2 0 0 nothing : 70 55 62 ;; 62 720 4 1 0 0 nothing : 70 47 111 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
[Bug target/37364] [4.4 Regression] IRA generates inefficient code due to missing regmove pass
--- Comment #17 from Joey dot ye at intel dot com 2008-10-23 08:42 --- CPU2006/454.calculix has about 10% regression with IRA + core2 + fpmath=sse on Core2 ix86: IRAIRA_core2 NO_IRA_core2 454.calculix 1.00 0.901.01 Revision: trunk 140514 Options in detail: IRA= -m32 -O2 -mssse3 -mfpmath=sse IRA_core2= $IRA -mtune=core2 NO_IRA_core2= $IRA_core2 -fno-ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37364
[Bug rtl-optimization/37571] New: Performance regression with -mtune=core2
On Core2 ix86 machine following case (reduced from cpu2000.mcf) runs 50% slower if compiled with trunk -mtune=core2 -O2 unsigned int g_i,g_j; unsigned int g_a=1,g_b; void __attribute__((noinline)) foo() { do { if (g_a g_i) {g_i++;} else {g_j++;} } while (g_b--); } int main() { int i; for (i=0; i4; i++) { g_b=0x7fff; foo(); } return 0; } -- Summary: Performance regression with -mtune=core2 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Joey dot ye at intel dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37571
[Bug rtl-optimization/37571] Performance regression with -mtune=core2
--- Comment #1 from Joey dot ye at intel dot com 2008-09-18 16:01 --- Root cause is that instruction length of fused jcc is set to 16, which prevent the block from merging and copying. For some reason Core2 runs poorly with a unmerged branch block under certain circonstances. Following patch fixes it: Index: i386.md === --- i386.md (revision 3923) +++ i386.md (working copy) @@ -421,6 +421,9 @@ ] (const_int 1))) +(define_attr length_jcc_fuse + (const_int 0)) + ;; The (bounding maximum) length of an instruction in bytes. ;; ??? fistp and frndint are in fact fldcw/{fistp,frndint}/fldcw sequences. ;; Later we may want to split them and compute proper length as for @@ -442,7 +445,8 @@ (plus (attr prefix_rep) (plus (attr prefix_data16) (plus (attr length_immediate) -(attr length_address))) +(plus (attr length_address) + (attr length_jcc_fuse ;; The `memory' attribute is `none' if no memory is referenced, `load' or ;; `store' if there is a simple memory reference therein, or `unknown' @@ -645,7 +649,7 @@ (include k6.md) (include athlon.md) (include geode.md) -;;(include core2.md) +(include core2.md) ;; Operand and operator predicates and constraints @@ -14033,7 +14037,8 @@ return test{imodesuffix}\t%2, %2\n\t %+j%E1\t%l0\t ASM_COMMENT_START fused; } - [(set_attr type multi) + [(set_attr type icmp) + (set_attr length_jcc_fuse 2) (set_attr mode MODE)]) (define_insn *jcc_fused_2 @@ -14048,7 +14053,8 @@ return test{imodesuffix}\t%2, %2\n\t %+j%e1\t%l0\t ASM_COMMENT_START fused; } - [(set_attr type multi) + [(set_attr type icmp) + (set_attr length_jcc_fuse 2) (set_attr mode MODE)]) (define_insn *jcc_fused_3 @@ -14066,7 +14072,8 @@ return cmp{imodesuffix}\t{%3, %2|%2, %3}\n\t %+j%E1\t%l0\t ASM_COMMENT_START fused; } - [(set_attr type multi) + [(set_attr type icmp) + (set_attr length_jcc_fuse 2) (set_attr mode MODE)]) (define_insn *jcc_fused_4 @@ -14084,7 +14091,8 @@ return cmp{imodesuffix}\t{%3, %2|%2, %3}\n\t %+j%e1\t%l0\t ASM_COMMENT_START fused; } - [(set_attr type multi) + [(set_attr type icmp) + (set_attr length_jcc_fuse 2) (set_attr mode MODE)]) ;; In general it is not safe to assume too much about CCmode registers, -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37571
[Bug middle-end/37243] [4.4 Regression] Revision 139590 caused many regressions
--- Comment #11 from Joey dot ye at intel dot com 2008-08-28 06:14 --- (In reply to comment #4) We got Running 416.gamess ref base lnx32-gcc default 416.gamess: copy #0 non-zero return code (rc=0, signal=11) 416.gamess: copy #0 non-zero return code (rc=0, signal=11) 416.gamess: copy #0 non-zero return code (rc=0, signal=11) We will try to find a small testcase. Small case available: $ cat case.f SUBROUTINE SCHMD(V,M,N,LDV) IMPLICIT DOUBLE PRECISION(A-H,O-Z) LOGICAL GOPARR,DSKWRK,MASWRK DIMENSION V(LDV,N) COMMON /IOFILE/ IR,IW,IP,IS,IPK,IDAF,NAV,IODA(400) COMMON /PAR / ME,MASTER,NPROC,IBTYP,IPTIM,GOPARR,DSKWRK,MASWRK PARAMETER (ZERO=0.0D+00, ONE=1.0D+00, TOL=1.0D-10) IF (M .EQ. 0) GO TO 180 DO 160 I = 1,M DUMI = ZERO DO 100 K = 1,N 100 DUMI = DUMI+V(K,I)*V(K,I) DUMI = ONE/ SQRT(DUMI) DO 120 K = 1,N 120 V(K,I) = V(K,I)*DUMI IF (I .EQ. M) GO TO 160 I1 = I+1 DO 140 J = I1,M DUM = -DDOT(N,V(1,J),1,V(1,I),1) CALL DAXPY(N,DUM,V(1,I),1,V(1,J),1) 140 CONTINUE 160 CONTINUE IF (M .EQ. N) RETURN 180 CONTINUE I = M J = 0 200 I0 = I I = I+1 IF (I .GT. N) RETURN 220 J = J+1 IF (J .GT. N) GO TO 320 DO 240 K = 1,N 240 V(K,I) = ZERO CALL DAXPY(N,DUM,V(1,II),1,V(1,I),1) 260 CONTINUE DUMI = ZERO DO 280 K = 1,N 280 DUMI = DUMI+V(K,I)*V(K,I) IF ( ABS(DUMI) .LT. TOL) GO TO 220 DO 300 K = 1,N 300 V(K,I) = V(K,I)*DUMI GO TO 200 320 END program main DOUBLE PRECISION V DIMENSION V(18, 18) common // v call schmd(V, 1, 18, 18) end subroutine DAXPY end FUNCTION DDOT () DOUBLE PRECISION DDOT DDOT = 1 end $ gfortran -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --disable-bootstrap --enable-languages=c,c++,fortran --enable-checking=assert Thread model: posix gcc version 4.4.0 20080826 (experimental) [trunk revision 139590] (GCC) $ gfortran -O2 -o case.exe case.f -m32 $ ./case.exe Segmentation fault $ gfortran -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --disable-bootstrap --enable-languages=c,c++,fortran --enable-checking=assert Thread model: posix gcc version 4.4.0 20080826 (experimental) [trunk revision 139589] (GCC) $ gfortran -O2 -o case.exe case.f -m32 $ ./case.exe $ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37243
[Bug middle-end/37243] [4.4 Regression] Revision 139590 caused many regressions
--- Comment #7 from Joey dot ye at intel dot com 2008-08-27 08:07 --- Created an attachment (id=16155) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16155action=view) Test case from 2006.434.zeusmp Though fail to extract a smaller case, hopeful it helpful. Compile with gfortran -c -O2 -DSPEC_CPU_LP64 tranx1.f -S -fdump-rtl-all -g. Miscompile in revision 139590. In IRA dump file, I believe following suspicious RTL is the cause of segfault: (insn 886 885 893 35 tranx1.f:570 (set (reg:DI 0 ax [orig:123 D.3215 ] [123]) (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int -104 [0xff98])) [68 D.3215+0 S8 A64])) 89 {*movdi_1_rex64} (nil)) (insn 893 886 896 35 tranx1.f:570 (set (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int -104 [0xff98])) [68 ivtmp.160+0 S8 A64]) (reg/f:DI 3 bx [orig:159 ivtmp.160 ] [159])) 89 {*movdi_1_rex64} (nil)) D.3215 and ivtmp.160 shares the spill space (%rsp-104), where as D.3215 and ivtmp.160 has overlapped liverange. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37243
[Bug middle-end/37243] [4.4 Regression] Revision 139590 caused many regressions
--- Comment #8 from Joey dot ye at intel dot com 2008-08-27 08:11 --- GDB output: (gdb) b tranx1_ Breakpoint 1 at 0x43a670 (gdb) r Breakpoint 1, 0x0043a670 in tranx1_ () (gdb) b *0x43accd Breakpoint 2 at 0x43accd (gdb) b *0x43acf4 Breakpoint 3 at 0x43acf4 (gdb) b *0x43ad2f Breakpoint 4 at 0x43ad2f (gdb) c Breakpoint 2, 0x0043accd in tranx1_ () (gdb) x 0x43accd 0x43accd tranx1_+1629:mov0xff98(%rsp),%rcx (gdb) c Breakpoint 3, 0x0043acf4 in tranx1_ () (gdb) x 0x43acf4 0x43acf4 tranx1_+1668:lea0x160603e8(,%rcx,8),%rbx (gdb) i r rcx rcx0x5 5 (gdb) c Breakpoint 4, 0x0043ad2f in tranx1_ () (gdb) x 0x43ad2f 0x43ad2f tranx1_+1727:mov%rbx,0xff98(%rsp) // RTL #893 Suspicious (gdb) i r rbx rbx0x16060410 369493008 (gdb) c Breakpoint 2, 0x0043accd in tranx1_ () (gdb) x 0x43accd 0x43accd tranx1_+1629:mov0xff98(%rsp),%rcx (gdb) c Breakpoint 3, 0x0043acf4 in tranx1_ () (gdb) x 0x43acf4 0x43acf4 tranx1_+1668:lea0x160603e8(,%rcx,8),%rbx (gdb) i r rcx rcx0x16060410 369493008 (gdb) c Breakpoint 4, 0x0043ad2f in tranx1_ () (gdb) x 0x43ad2f 0x43ad2f tranx1_+1727:mov%rbx,0xff98(%rsp) (gdb) i r rbx rbx0xc6362468 3325437032 (gdb) c Program received signal SIGSEGV, Segmentation fault. 0x0043ad65 in tranx1_ () (gdb) x 0x43ad65 0x43ad65 tranx1_+1781:subsd (%r14),%xmm0 (gdb) i r r14 r140xc6362468 3325437032 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37243
[Bug target/37158] Wrong insn for _mm_comieq_sd
--- Comment #1 from Joey dot ye at intel dot com 2008-08-19 08:19 --- Check out such code in i386.c: /* Figure out whether to use ordered or unordered fp comparisons. Return the appropriate mode to use. */ enum machine_mode ix86_fp_compare_mode (enum rtx_code code ATTRIBUTE_UNUSED) { /* ??? In order to make all comparisons reversible, we do all comparisons non-trapping when compiling for IEEE. Once gcc is able to distinguish all forms trapping and nontrapping comparisons, we can make inequality comparisons trapping again, since it results in better code when using FCOM based compares. */ return TARGET_IEEE_FP ? CCFPUmode : CCFPmode; } -- Joey dot ye at intel dot com changed: What|Removed |Added Summary| Wrong insn for |Wrong insn for _mm_comieq_sd |_mm_comieq_sd | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37158
[Bug rtl-optimization/37124] New: ICE with attribute(option(no-mmx))
$ cat opt.c extern void abort (void); double foo (int arg) { if (arg != 116) abort(); return arg + 1; } inline double #if HAS_ATTR __attribute__ ((__option__ (no-mmx))) #endif bar (int arg) { foo (arg); __builtin_return (__builtin_apply ((void (*) ()) foo, __builtin_apply_args (), 16)); } $ gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --disable-bootstrap --enable-languages=c,c++,fortran --enable-checking=assert Thread model: posix gcc version 4.4.0 20080810 (experimental) [trunk revision 138935] (GCC) $ gcc -c -m32 -O3 -mmmx opt.c -DHAS_ATTR=1 opt.c: In function 'bar': opt.c:20: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. $ gcc -c -m32 -O3 -mmmx opt.c -DHAS_ATTR=0 $ -- Summary: ICE with attribute(option(no-mmx)) Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Joey dot ye at intel dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37124
[Bug middle-end/36983] Trunk 138207 miscompiles 172.mgrid on x86-64
--- Comment #6 from Joey dot ye at intel dot com 2008-08-11 05:52 --- (In reply to comment #4) If you remove -ffast-math, does it miscompare? Passes without -ffast-math. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36983
[Bug middle-end/36983] Trunk 138207 miscompiles 172.mgrid on x86-64
--- Comment #3 from Joey dot ye at intel dot com 2008-08-07 07:55 --- Although 138318 fixes the compiler ICE, it miscompile with -O3 -ffast-math on x86-64: Running 172.mgrid ref base o3 default *** Miscompare of mgrid.out, see /home/jye2/cpu2000/benchspec/CFP2000/172.mgrid/run/0003/mgrid.out.mis No small case available yet -- Joey dot ye at intel dot com changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|FIXED | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36983
[Bug middle-end/34921] Misalign stack variable referenced by nested function
--- Comment #9 from Joey dot ye at intel dot com 2008-08-06 08:05 --- Fixed -- Joey dot ye at intel dot com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34921
[Bug c++/37012] numerous stackalign related testsuite failures on i686-apple-darwin9
--- Comment #18 from Joey dot ye at intel dot com 2008-08-04 07:24 --- (In reply to comment #9) Joey, I think the problem is the usage of STACK_BOUNDARY / BITS_PER_UNIT for stack alignment. On MacOS, STACK_BOUNDARY 128 on ia32. Shouldn't we use UNITS_PER_WORD in some cases? Please double check all usages of STACK_BOUNDARY / BITS_PER_UNIT in our stack alignment codes. That's exactly what I worried about 128 bits STACK_BOUNDARY. For example following code won't work on Darwin: int param_ptr_offset = (call_used_regs[REGNO (crtl-drap_reg)] ? 0 : STACK_BOUNDARY / BITS_PER_UNIT); UNITS_PER_WORD should be used instead. Working on the patch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37012
[Bug target/37010] -mno-accumulate-outgoing-args doesn't work with stack alignment
--- Comment #6 from Joey dot ye at intel dot com 2008-08-04 08:28 --- (In reply to comment #3) Joey, when we compute frame layout, we don't count the duplicated return address pushed onto stack when DRAP is used. Also when we push return address, shouldn't we use -UNITS_PER_WORD, instead of -(STACK_BOUNDARY / BITS_PER_UNIT))? On MacOS, STACK_BOUNDARY is 128 on ia32. Does this patch make sense? --- ./i386.c.drap 2008-08-03 09:50:05.0 -0700 +++ ./i386.c2008-08-03 11:36:40.0 -0700 @@ -7291,6 +7291,10 @@ ix86_compute_frame_layout (struct ix86_f if (stack_realign_fp) offset = (offset + stack_alignment_needed -1) -stack_alignment_needed; + /* Duplicated return address when DRAP is used. */ + if (crtl-drap_reg crtl-stack_realign_needed) +offset += UNITS_PER_WORD; + /* Register save area */ offset += frame-nregs * UNITS_PER_WORD; @@ -7692,8 +7696,7 @@ ix86_expand_prologue (void) expand_builtin_return_addr etc. */ x = crtl-drap_reg; x = gen_frame_mem (Pmode, - plus_constant (x, - -(STACK_BOUNDARY / BITS_PER_UNIT))); + plus_constant (x, -UNITS_PER_WORD)); insn = emit_insn (gen_push (x)); RTX_FRAME_RELATED_P (insn) = 1; } I suspect this patch is incorrect. /* Skip return address and saved base pointer. */ offset = frame_pointer_needed ? UNITS_PER_WORD * 2 : UNITS_PER_WORD; already count the duplicated address in. I'm analyzing what makes this case fail. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37010
[Bug target/37010] -mno-accumulate-outgoing-args doesn't work with stack alignment
--- Comment #7 from Joey dot ye at intel dot com 2008-08-04 09:03 --- This problem is associated with -mpreferred-stack-boundary=2, rather than with stack alignment. Following case fails on trunk before merging with stack branch: $ cat y1.c /* PR middle-end/37010 */ /* { dg-do run { target { { i?86-*-* x86_64-*-* } ilp32 } } } */ /* { dg-options -msse2 } */ typedef __PTRDIFF_TYPE__ ptrdiff_t; extern void abort (void); int __attribute__ ((noinline)) check (void *i, int align) { if ptrdiff_t) i) (align - 1)) != 0) { abort (); } return 0; } typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); void __attribute__ ((noinline)) foo (__m128 x, __m128 y ,__m128 z ,__m128 a, int size) { check(a, __alignof__(a)); } int main (void) { __m128 x = { 1.0 }; foo (x, x, x, x, 5); return 0; } $ gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --disable-bootstrap --enable-languages=c,c++,fortran --enable-checking=assert Thread model: posix gcc version 4.4.0 20080707 (experimental) [trunk revision 137572] (GCC) $ gcc -o y1.exe y1.c -m32 -Os -msse2 -mpreferred-stack-boundary=2 $ ./y.exe Aborted -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37010
[Bug target/37010] -mno-accumulate-outgoing-args doesn't work with stack alignment
--- Comment #8 from Joey dot ye at intel dot com 2008-08-04 09:11 --- Root cause is that outgoing parameter frame is aligned based on stack pointer. Namely, address_of_stack_param = SP + offset + fixed_padding. With -mpreferred-stack-boundary=2, alignment of SP is only 4 bytes. Outgoing frame won't be possibly aligned with 16 bytes without additional 'and $-16, sp'. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37010
[Bug target/37010] -mno-accumulate-outgoing-args doesn't work with stack alignment
--- Comment #11 from Joey dot ye at intel dot com 2008-08-04 14:11 --- (In reply to comment #10) Did you mean we needed 2 additional 'and $-16, sp insns to align the stack? I don't think so. Definitely not. Solution 1: Just ignore it. __m128 parameter shouldn't be passed with -mpreferred-stack-boundary=2, or Solution 2. Record max alignment of all outgoing parameter, and crtl-preferred_stack_boundary = max_parameter_alignment -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37010
[Bug target/37010] -mno-accumulate-outgoing-args doesn't work with stack alignment
--- Comment #15 from Joey dot ye at intel dot com 2008-08-05 01:01 --- (In reply to comment #12) I think the problem is in /* Set offset to aligned because the realigned frame tarts from here. */ if (stack_realign_fp) offset = (offset + stack_alignment_needed -1) -stack_alignment_needed; This code assumes that offset 0 is properly aligned to any alignment, which isn't true. It happens to work with -maccumulate-outgoing-args. I still believe #8 is the right reason. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37010
[Bug middle-end/36983] New: Trunk 138207 miscompiles 172.mgrid on x86-64
$ cat mgrid.f SUBROUTINE RESID(U,V,R,N,A) INTEGER N REAL*8 U(N,N,N),V(N,N,N),R(N,N,N),A(0:3) INTEGER I3, I2, I1 DO 600 I3=2,N-1 DO 600 I2=2,N-1 DO 600 I1=2,N-1 600 R(I1,I2,I3)=V(I1,I2,I3) -A(0)*( U(I1, I2, I3 ) ) -A(1)*( U(I1-1,I2, I3 ) + U(I1+1,I2, I3 ) + U(I1, I2-1,I3 ) + U(I1, I2+1,I3 ) + U(I1, I2, I3-1) + U(I1, I2, I3+1) ) -A(2)*( U(I1-1,I2-1,I3 ) + U(I1+1,I2-1,I3 ) + U(I1-1,I2+1,I3 ) + U(I1+1,I2+1,I3 ) + U(I1, I2-1,I3-1) + U(I1, I2+1,I3-1) + U(I1, I2-1,I3+1) + U(I1, I2+1,I3+1) + U(I1-1,I2, I3-1) + U(I1-1,I2, I3+1) + U(I1+1,I2, I3-1) + U(I1+1,I2, I3+1) ) -A(3)*( U(I1-1,I2-1,I3-1) + U(I1+1,I2-1,I3-1) + U(I1-1,I2+1,I3-1) + U(I1+1,I2+1,I3-1) + U(I1-1,I2-1,I3+1) + U(I1+1,I2-1,I3+1) + U(I1-1,I2+1,I3+1) + U(I1+1,I2+1,I3+1) ) RETURN END $ gfortran -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --prefix=/home/jye2/rrs/138207/usr --enable-languages=c,c++,fortran --disable-bootstrap Thread model: posix gcc version 4.4.0 20080728 (experimental) [trunk revision 138207] (GCC) $ gfortran -O3 -ffast-math mgrid.f -c mgrid.f: In function 'resid': mgrid.f:1: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. $ gfortran -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --prefix=/home/jye2/rrs/138206/usr --enable-languages=c,c++,fortran --disable-bootstrap Thread model: posix gcc version 4.4.0 20080728 (experimental) [trunk revision 138206] (GCC) $ gfortran -O3 -ffast-math mgrid.f -c $ -- Summary: Trunk 138207 miscompiles 172.mgrid on x86-64 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Joey dot ye at intel dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36983
[Bug middle-end/36983] Trunk 138207 miscompiles 172.mgrid on x86-64
--- Comment #2 from Joey dot ye at intel dot com 2008-07-31 10:50 --- Yes. Just notice that latest trunk passes. -- Joey dot ye at intel dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36983
[Bug middle-end/36986] New: Trunk 138207 miscompiles 447.dealII
This bug is also caused by 138207, and latest trunk still fails (138353) $ g++ -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --disable-bootstrap --enable-languages=c,c++,fortran --enable-checking=assert Thread model: posix gcc version 4.4.0 20080731 (experimental) [trunk revision 138353] (GCC) $ g++ -c -O3 -ffast-math dof_tools.i.cc dof_tools.cc: In static member function 'static void DoFTools::make_flux_sparsity_pattern(const DoFHandlerdim, SparsityPattern, const FullMatrixdouble, const FullMatrixdouble) [with int dim = 3, SparsityPattern = CompressedBlockSparsityPattern]': dof_tools.cc:485: internal compiler error: in gimple_cond_get_ops_from_tree, at gimple.c:493 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. -- Summary: Trunk 138207 miscompiles 447.dealII Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Joey dot ye at intel dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36986
[Bug middle-end/36986] Trunk 138207 miscompiles 447.dealII
--- Comment #1 from Joey dot ye at intel dot com 2008-07-31 11:33 --- Created an attachment (id=15982) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15982action=view) Preprocessed test case -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36986
[Bug tree-optimization/36835] Trunk 137774 miscompile cpu2006.473.astar
--- Comment #1 from Joey dot ye at intel dot com 2008-07-16 13:14 --- Fixed by revision 137859 -- Joey dot ye at intel dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36835
[Bug tree-optimization/36835] New: Trunk 137774 miscompile cpu2006.473.astar
gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --enable-languages=c,c++ --disable-bootstrap Thread model: posix gcc version 4.4.0 20080714 (experimental) [trunk revision 137774] (GCC) Running 473.astar test base lnx32e default Error with '/home/jye2/cpu2006/bin/specinvoke -E -d /home/jye2/cpu2006/benchspec/CPU2006/473.astar/run/run_base_test_lnx32e. -c 1 -e compare.err -o compare.stdout -f compare.cmd': check file '/home/jye2/cpu2006/benchspec/CPU2006/473.astar/run/run_base_test_lnx32e./.err' *** Miscompare of lake.out, see /home/jye2/cpu2006/benchspec/CPU2006/473.astar/run/run_base_test_lnx32e./lake.out.mis Invalid run; unable to continue. If you wish to ignore errors please use '-I' or ignore_errors -- Summary: Trunk 137774 miscompile cpu2006.473.astar Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Joey dot ye at intel dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36835
[Bug tree-optimization/36765] [4.4 Regression] Revision 137573 miscompiles 464.h264ref in SPEC CPU 2006
--- Comment #1 from Joey dot ye at intel dot com 2008-07-11 05:46 --- Created an attachment (id=15897) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15897action=view) Small test case reduced from cpu2006.464.h264ref /home/jye2/work/bug-37665 gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --disable-bootstrap --enable-languages=c,c++,fortran --enable-checking=assert Thread model: posix gcc version 4.4.0 20080707 (experimental) [trunk revision 137573] (GCC) /home/jye2/work/bug-37665 make -B ./m.exe echo PASS gcc -c main.c -g gcc -O2 -ffast-math -g -c -o l5.o l5.c gcc -o m.exe main.o l5.o Bmin[0]=21 Aborted /home/jye2/work/bug-37665 gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --disable-bootstrap --enable-languages=c,c++,fortran --enable-checking=assert Thread model: posix gcc version 4.4.0 20080707 (experimental) [trunk revision 137572] (GCC) /home/jye2/work/bug-37665 make -B ./m.exe echo PASS gcc -c main.c -g gcc -O2 -ffast-math -g -c -o l5.o l5.c gcc -o m.exe main.o l5.o PASS -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36765
[Bug tree-optimization/36765] [4.4 Regression] Revision 137573 miscompiles 464.h264ref in SPEC CPU 2006
--- Comment #2 from Joey dot ye at intel dot com 2008-07-11 05:49 --- Effect of line 76 buffer_frame[0] = InitFullness; is eliminated by optimizer due to bug in GCC. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36765
[Bug tree-optimization/36054] bad code generation with -ftree-vectorize
--- Comment #13 from Joey dot ye at intel dot com 2008-05-05 07:22 --- It is helpful. Root cause is that memory allocated by new is only aligned to 8 bytes under i386. In your case, object Environment is allocated by new and its constructor tried to use movdqa to initialize its members. Following small case shows the problem: /* Compile with option -m32 -msse2 Current behavior: runtime segment fault */ #include stdio.h #include emmintrin.h struct A { public: __m128i m; void init() { m = _mm_setzero_si128(); } }; int main() { A * a = new A; printf(Address of A: %p\n, a); a-init(); delete a; return 0; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36054
[Bug tree-optimization/36054] bad code generation with -ftree-vectorize
--- Comment #14 from Joey dot ye at intel dot com 2008-05-05 07:29 --- HJ, AVX will have the similar problem on x86_64, whose new only returns object aligned at 16 bytes. Dynamically allocated __m256 won't be guaranteed at 32 bytes boundary. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36054
[Bug tree-optimization/36054] bad code generation with -ftree-vectorize
--- Comment #8 from Joey dot ye at intel dot com 2008-04-30 10:53 --- (In reply to comment #6) (In reply to comment #4) have you tried to compile with -march=core2 -mfpmath=sse -msse? Yes, I've compiled it as following: % g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 -Wnon-virtual-dtor -fPIC kernel_build.ii -m32 ? -m32 doesn't work. You have to use 4.3.0 release branch. Recent mainline change of ia32 intrinsic conflict with 4.3.0 header files. I'm using 4.3.0. Compilation passes but I still got link errors like: /tmp/ccfJXXcV.o:(.rodata._ZTVN9portaudio20MemFunCallbackStreamIN4nova16PortAudioBackendEEE[vtable for portaudio::MemFunCallbackStreamnova::PortAudioBackend]+0x10): undefined reference to `portaudio::Stream::close()' /tmp/ccfJXXcV.o:(.rodata._ZTIN9portaudio20MemFunCallbackStreamIN4nova16PortAudioBackendEEE[typeinfo for portaudio::MemFunCallbackStream -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36054
[Bug tree-optimization/36054] bad code generation with -ftree-vectorize
--- Comment #9 from Joey dot ye at intel dot com 2008-04-30 10:56 --- (In reply to comment #8) -m32 doesn't work. You have to use 4.3.0 release branch. Recent mainline change Correction: -m32 is a must, but doesn't fix all. Options I'm using: g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 -Wnon-virtual-dtor -m32 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36054
[Bug tree-optimization/36054] bad code generation with -ftree-vectorize
--- Comment #11 from Joey dot ye at intel dot com 2008-05-01 04:31 --- Tim, Since it doesn't link, I can only check the .s file. There are a couple of constructor called Environment, which one is the problemetic function? grep Environment kernel_build.s|grep glob ... .globl _ZN4nova11EnvironmentD1Ev .globl _ZN4nova11EnvironmentD2Ev .globl _ZN4nova11EnvironmentC1Ev -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36054
[Bug middle-end/36078] New: gfortran fails to build cpu2006/465.tonto
Start from trunk 134730, still fail by 134775: $ cat f2.f90 subroutine foo(func,p,eval) real(kind=kind(1.0d0)), dimension(3,0:4,0:4,0:4) :: p logical(kind=kind(.true.)), dimension(5,5,5) :: eval interface subroutine func(values,pt) real(kind=kind(1.0d0)), dimension(:), intent(out) :: values real(kind=kind(1.0d0)), dimension(:,:), intent(in) :: pt end subroutine end interface real(kind=kind(1.0d0)), dimension(125,3) :: pt integer(kind=kind(1)) :: n_pt n_pt = 1 pt(1:n_pt,:) = reshape( pack( transpose(reshape(p,(/3,125/))), spread(reshape(eval,(/125/)),dim=2,ncopies=3)), (/n_pt,3/)) end subroutine end $ gfortran -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --disable-bootstrap --enable-languages=c,fortran Thread model: posix gcc version 4.4.0 20080427 (experimental) [trunk revision 134730] (GCC) $ gfortran -c -O2 f2.f90 f2.f90: In function 'foo': f2.f90:1: internal compiler error: in execute_todo, at passes.c:991 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. -- Summary: gfortran fails to build cpu2006/465.tonto Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Joey dot ye at intel dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36078
[Bug middle-end/36074] [4.4 Regression]: 447.dealII in SPEC CPU 2006 failed to compile
--- Comment #5 from Joey dot ye at intel dot com 2008-04-29 10:41 --- Can be related to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36078, where I do have a small case. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36074
[Bug middle-end/34921] Misalign stack variable referenced by nested function
--- Comment #5 from Joey dot ye at intel dot com 2008-01-23 01:45 --- (In reply to comment #2) I bet if you put jj in struct and don't have a nested function, this will be the same issue. Not the same. In fact it passes if not referenced by a nested function. The root is in tree-nested.c $ cat nested-3.c #include stdio.h #include stdlib.h typedef int aligned __attribute__((aligned(16))); int global; void check (int *i) { *i = 20; if int) i) (__alignof__(aligned) - 1)) != 0) { printf(\nUnalign address (%d): %p!\n, __alignof__(aligned), i); abort (); } } void foo (void) { aligned jj; int j2; void bar () { j2 = -20; } jj = 0; bar (); check (jj); } int main() { foo (); return 0; } $ diff -p nested-2.c nested-3.c *** nested-2.c 2008-01-22 14:24:39.0 +0800 --- nested-3.c 2008-01-23 09:38:47.0 +0800 *** void *** 19,27 foo (void) { aligned jj; void bar () { ! jj = -20; } jj = 0; bar (); --- 19,28 foo (void) { aligned jj; + int j2; void bar () { ! j2 = -20; } jj = 0; bar (); $ gcc -m32 -o nested-3.exe nested-3.c $ ./nested-3.exe $ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34921
[Bug c/34921] Misalign stack variable referenced by nested function
--- Comment #1 from Joey dot ye at intel dot com 2008-01-22 06:38 --- This patch should fix it: Index: gcc/tree-nested.c === --- gcc/tree-nested.c (revision 131342) +++ gcc/tree-nested.c (working copy) @@ -183,6 +183,10 @@ TREE_CHAIN (field) = *p; *p = field; + + /* Set correct alignment for frame struct type */ + if (TYPE_ALIGN(type) DECL_ALIGN (field)) +TYPE_ALIGN(type) = DECL_ALIGN (field); } /* Build or return the RECORD_TYPE that describes the frame state that is -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34921
[Bug c/34921] New: Misalign stack variable referenced by nested function
cat nested-2.c #include stdio.h #include stdlib.h typedef int aligned __attribute__((aligned(16))); int global; void check (int *i) { *i = 20; if int) i) (__alignof__(aligned) - 1)) != 0) { printf(\nUnalign address (%d): %p!\n, __alignof__(aligned), i); abort (); } } void foo (void) { aligned jj; void bar () { jj = -20; } jj = 0; bar (); check (jj); } int main() { foo (); return 0; } gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /home/wlin5/gcc/src-daily/configure --enable-languages=c,c++,fortran --disable-bootstrap Thread model: posix gcc version 4.3.0 20080106 (experimental) [trunk revision 131347] (GCC) gcc -m32 -o nested-2.exe nested-2.c ./nested-2.exe Unalign address (16): 0xffa137dc! Aborted -- Summary: Misalign stack variable referenced by nested function Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Joey dot ye at intel dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34921
[Bug tree-optimization/32921] [4.3 Regression] Revision 126326 causes 12% slowdown
--- Comment #28 from Joey dot ye at intel dot com 2007-10-23 02:23 --- Got similar result on x86_64, Core 2 improves 24% from 129469 to 129504. That's great. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
[Bug libmudflap/33119] New: Missing mf-runtime.h after make -j2 install
mf-runtime.h won't be installed with make -j2 install under x86_64 target. From the log file apparantly it is installed at first and then removed when installing rest of gcc headers files. Can be caused by incorrect dependence between mudflap and other target. -- Summary: Missing mf-runtime.h after make -j2 install Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: minor Priority: P3 Component: libmudflap AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Joey dot ye at intel dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33119
[Bug libmudflap/33119] Missing mf-runtime.h after make -j2 install
--- Comment #2 from Joey dot ye at intel dot com 2007-08-20 08:53 --- (In reply to comment #1) Nobody does make install with -j. I guess so, that's why I set it minor. But does that mean error is expected with -j? My script had -j by accident and it costed me hours to identify the root cause. I doubt I'm the only lucky guy. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33119
[Bug rtl-optimization/32755] Seg fault when compile CPU2000 with -fsee
--- Comment #1 from Joey dot ye at intel dot com 2007-07-13 09:21 --- Created an attachment (id=13909) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13909action=view) Reduced testcase GCC crashes with gcc -O2 -fsee case-see.c -c Fails at all recent 4.3 trunk. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32755
[Bug rtl-optimization/32755] New: Seg fault when compile CPU2000 with -fsee
4.3 trunk fails to build any 2006 with -fsee on x86_64: gcc -c -o av.o -DSPEC_CPU -DNDEBUG -DPERL_CORE -O2 -fsee -DSPEC_CPU_LP64 -DSPEC_CPU_LINUX_X64 av.c av.c: In function 'Perl_av_reify': av.c:50: internal compiler error: Segmentation fault -- Summary: Seg fault when compile CPU2000 with -fsee Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Joey dot ye at intel dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32755
[Bug rtl-optimization/32755] Seg fault when compile CPU2000 with -fsee
--- Comment #2 from Joey dot ye at intel dot com 2007-07-13 09:27 --- Root cause looks like at see.c line 1643: emit_insn_after (merged_ref, ref); delete_insn (ref); where merged_ref and ref have the same INSN_UID. delete_insn will clear the df information of that UID, resulted as no df information for merged_ref. I tried inserting following line and it works: + INSN_UID(merged_ref)=cfun-emit-x_cur_insn_uid++; But it is apparantly ugly. Anyone can share the right approach to replace a insn with another one who has the same UID? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32755
[Bug middle-end/32598] [4.3 Regression]: 27_io/basic_stringbuf/setbuf/wchar_t/4.cc needs more than 6GB memory to compile
--- Comment #4 from Joey dot ye at intel dot com 2007-07-04 01:17 --- 126198 brought the regression -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32598