[Bug rtl-optimization/55845] 454.calculix miscompares with -march=btver2 -O3 -ffastmath -fschedule-insns -mvzeroupper for test data run
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55845 --- Comment #3 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2013-01-04 06:40:44 UTC --- The test fails corei7-avx also. I build a simple reproducer. - #include stdio.h #define N 100 double foo ( int size, double y[], double x[] ) { double sum = 0.0 ; int i ; for (i = 0, sum = 0. ; i size ; i++) sum += y[i] * x[i] ; return(sum); } int main () { double x[N]; double y[N]; double s; int i; for (i = 0; i N; i++) { x[i] = i; y[i] = i; } s = foo (N, y, x); printf(%s\n, s == 328350 ? pass : fail); } -- $ gcc -mavx -g -static -o t -O3 -ffast-math -march=corei7 t.c -fno-inline ./t pass $ gcc -fschedule-insns -mavx -g -static -o t -O3 -ffast-math -march=corei7 t.c -fno-inline ./t fail Responsible phase is jump2. To switch off the phase I did changes diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c index 5d142e9..be04c5d 100644 --- a/gcc/cfgcleanup.c +++ b/gcc/cfgcleanup.c @@ -3070,6 +3070,7 @@ struct rtl_opt_pass pass_jump = static unsigned int execute_jump2 (void) { + if (!getenv(NOJMP2)) cleanup_cfg (flag_crossjumping ? CLEANUP_CROSSJUMP : 0); return 0; } env NOJMP2=1 gcc -fschedule-insns -mavx -g -static -o t -O3 -ffast-math -march=corei7 t.c -fno-inline ./t pass I used compiler Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure CFLAGS='-O0 -g3' CXXFLAGS='-O0 -g3' --prefix=/gnumnt/msticlxl16_users/vbyakovl/workspaces/gcc/install --enable-clocale=gnu --disable-bootstrap --with-system-zlib --enable-shared --with-demangler-in-ld --with-fpmath=sse --with-arch=corei7-avx --with-cpu=corei7-avx --enable-languages=c,c++,fortran,java,lto,objc --no-create --no-recursion
[Bug regression/54390] [AVX] FAIL: gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54390 Vladimir Yakovlev vbyakovl23 at gmail dot com changed: What|Removed |Added CC||vbyakovl23 at gmail dot com --- Comment #2 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-12-28 15:40:53 UTC --- Compiler has different behavior on the test gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c with –mavx and –mno-avx. With –mno-avx routine get_vectype_for_scalar_type (scalar_type) at tree-vect-data-refs.c:3265 returns NULL for scalar_type “struct A” whereas with –mavx returns “vector(2) __int128 unsigned”. The test is passed if constant 16 at line 6 of the test is replaced by 32 or 64 (better 64 otherwise we will have the problem with avx2 in future). diff --git a/gcc/testsuite/gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c b/gcc/testsuite/gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c index 90dcd84..68e0bf1 100644 --- a/gcc/testsuite/gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c +++ b/gcc/testsuite/gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c @@ -3,7 +3,7 @@ typedef __complex__ float Value; typedef struct { - Value a[16 / sizeof (Value)]; + Value a[64 / sizeof (Value)]; } A; A sum(A a,A b)
[Bug lto/55660] New: ICE instead of some warning during lto build
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55660 Bug #: 55660 Summary: ICE instead of some warning during lto build Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassig...@gcc.gnu.org ReportedBy: vbyakov...@gmail.com LTO compilation fails in the link time if o-files created with -funsigned-char but linking without. R174519 is the first commit when it was appeared. $ cat t_f.c char n[3] = {'a','b','c'}; int foo(char *x) { if (*x == 'y') return (int)*x; *x = 'y'; return 0; } $ cat t_m.c #include stdio.h extern int foo (char*); extern char n[3]; int main () { int i, m = 0; for (i = 0; i 3; i++) m += foo(n[i]); printf(%d\n, m); } $ gcc -c -m32 -O2 -flto t_f.c t_m.c -funsigned-char ; gcc -m32 -O2 -flto t_f.o t_m.o -o t0 In file included from :0:0: t_m.c: In function a€?maina€™: t_m.c:7:5: error: mismatching comparison operand types int main () ^ char unsigned char if (_14 == 121) t_m.c:7:5: internal compiler error: verify_gimple failed 0x98ab8b verify_gimple_in_cfg(function*) ../../gcc/gcc/tree-cfg.c:4728 0x8794d0 execute_function_todo ../../gcc/gcc/passes.c:1973 0x8787e9 do_per_function ../../gcc/gcc/passes.c:1705 0x8795f4 execute_todo ../../gcc/gcc/passes.c:2006 0x879a5e execute_one_ipa_transform_pass ../../gcc/gcc/passes.c:2183 0x879b42 execute_all_ipa_transforms() ../../gcc/gcc/passes.c:2213 0x5ae627 expand_function ../../gcc/gcc/cgraphunit.c:1615 0x5aeb00 expand_all_functions ../../gcc/gcc/cgraphunit.c:1726 0x5af58a compile() ../../gcc/gcc/cgraphunit.c:2024 0x512f39 lto_main() ../../gcc/gcc/lto/lto.c:3399 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. lto-wrapper: gcc returned 1 exit status /bin/ld: lto-wrapper failed collect2: error: ld returned 1 exit status $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/export/users/vbyakovl/workspaces/gcc/install-ref/libexec/gcc/x86_64-unknown-linux-gnu/4.8.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure CFLAGS='-O0 -g3' --prefix=/export/users/vbyakovl/workspaces/gcc/install-ref --disable-bootstrap --enable-languages=c,c++,fortran,lto CXXFLAGS='-O0 -g3' Thread model: posix gcc version 4.8.0 20121105 (experimental) (GCC)
[Bug target/54342] OImode is used for _m256 types when using unions in a function call.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54342 Vladimir Yakovlev vbyakovl23 at gmail dot com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #14 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-11-22 07:06:12 UTC --- The vzeroupper implementation is in trunk now. To fix the problem I made use of the proposes if HJ. So the issue can be closed.
[Bug middle-end/54985] New: Dom optimization erroneous remove conditional goto.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54985 Bug #: 54985 Summary: Dom optimization erroneous remove conditional goto. Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: vbyakov...@gmail.com Attached test case fails if compiled with -O1 and higher. gcc -O0 -m32 q.c qm.c ; ./a.out echo pass || echo FAIL pass gcc -g3 -O1 -m32 q.c qm.c ; ./a.out echo pass || echo FAIL FAIL gcc -v Using built-in specs. COLLECT_GCC=/export/users/vbyakovl/workspaces/gcc/install-ref/bin/gcc COLLECT_LTO_WRAPPER=/export/users/vbyakovl/workspaces/gcc/install-ref/libexec/gcc/x86_64-unknown-linux-gnu/4.8.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure CFLAGS='-O0 -g3' --prefix=/export/users/vbyakovl/workspaces/gcc/install-ref --disable-bootstrap --enable-languages=c,c++,fortran,lto CXXFLAGS='-O0 -g3' : (reconfigured) ../gcc/configure CFLAGS='-O0 -g3' --prefix=/export/users/vbyakovl/workspaces/gcc/install-ref --disable-bootstrap CXXFLAGS='-O0 -g3' --enable-languages=c,c++,fortran,lto --no-create --no-recursion : (reconfigured) ../gcc/configure CFLAGS='-O0 -g3' --prefix=/export/users/vbyakovl/workspaces/gcc/install-ref --disable-bootstrap CXXFLAGS='-O0 -g3' --enable-languages=c,c++,fortran,lto --no-create --no-recursion Thread model: posix gcc version 4.8.0 20121015 (experimental) (GCC) Wrong compilation of a routine int foo(ST *s, int c) { int first = 1; int count = c; ST *item = s; int a = s-a; int x; while (count--) { x = item-a; if (first) first = 0; else if (x = a) return 1; a = x; item++; } return 0; } Compiler sets equivalence between ‘x’ and ‘a’ (routine tree-ssa-threadedge.c /record_temporary_equivalences_from_phis() ) and folds comparison x = a to true.
[Bug middle-end/54985] Dom optimization erroneous remove conditional goto.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54985 --- Comment #1 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-10-19 10:58:26 UTC --- Created attachment 28489 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28489 Test case
[Bug middle-end/54985] Dom optimization erroneous remove conditional goto.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54985 --- Comment #2 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-10-19 10:59:15 UTC --- Created attachment 28490 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28490 Main routine
[Bug tree-optimization/54901] [4.8 Regression] air.f90, aermod.f90, and mdbx.f90 are miscompiled with '-m64 -O3 -funroll-loops -fwhole-program' after revision 192213
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54901 Vladimir Yakovlev vbyakovl23 at gmail dot com changed: What|Removed |Added CC||vbyakovl23 at gmail dot com --- Comment #1 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-10-16 08:55:16 UTC --- Dominique, could you attach the tests.
[Bug target/47440] Use LCM for vzeroupper insertion
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47440 --- Comment #4 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-08-23 19:15:58 UTC --- As recomended Uros, I splitted up the patch by two part. First, middle end part is here http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01590.html
[Bug target/47440] Use LCM for vzeroupper insertion
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47440 Vladimir Yakovlev vbyakovl23 at gmail dot com changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #3 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-08-22 15:25:54 UTC --- I implemented vzeroupper insertion using mode switching technique. http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01429.html
[Bug rtl-optimization/54342] New: [4.8 Regression] Wrong mode of call argument
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54342 Bug #: 54342 Summary: [4.8 Regression] Wrong mode of call argument Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: vbyakov...@gmail.com The argument of call has mode OI rather than V8SF. Test case - #include immintrin.h typedef union un1 { __m256 x; float f; } UN1; UN1 u; extern __m256 y; extern int bar2(UN1); int foo2 () { u.x = y; return bar2(u); } -- Dump after expand (note 4 2 5 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 5 4 6 3 (set (reg/f:DI 62) (symbol_ref:DI (u) var_decl 0x7f312f270a00 u)) er.c:13 -1 (nil)) (insn 6 5 7 3 (set (reg:V8SF 63) (mem/c:V8SF (symbol_ref:DI (y) [flags 0x40] var_decl 0x7f312f270aa0 y) [2 y+0 S32 A256])) er.c:13 -1 (nil)) (insn 7 6 8 3 (set (mem/c:V8SF (reg/f:DI 62) [0 u.x+0 S32 A256]) (reg:V8SF 63)) er.c:13 -1 (nil)) (insn 8 7 9 3 (set (reg/f:DI 65) (symbol_ref:DI (u) var_decl 0x7f312f270a00 u)) er.c:14 -1 (nil)) (insn 9 8 10 3 (set (reg:OI 64) (mem/c:OI (reg/f:DI 65) [3 u+0 S32 A256])) er.c:14 -1 (nil)) (insn 10 9 11 3 (set (reg:OI 21 xmm0) (reg:OI 64)) er.c:14 -1 (nil)) (call_insn/j 11 10 12 3 (parallel [ (set (reg:SI 0 ax) (call (mem:QI (symbol_ref:DI (bar2) [flags 0x41] function_decl 0x7f312f27b300 bar2) [0 bar2 S1 A8]) (const_int 0 [0]))) (unspec [ (const_int 1 [0x1]) ] UNSPEC_CALL_NEEDS_VZEROUPPER) ]) er.c:14 -1 (nil) (expr_list:REG_DEP_TRUE (use (reg:OI 21 xmm0)) (nil))) Following patch fixes that. diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c index 53554a9..bb39e7f 100644 --- a/gcc/stor-layout.c +++ b/gcc/stor-layout.c @@ -1639,7 +1639,8 @@ compute_record_mode (tree type) /* If we only have one real field; use its mode if that mode's size matches the type's size. This only applies to RECORD_TYPE. This does not apply to unions. */ - if (TREE_CODE (type) == RECORD_TYPE mode != VOIDmode + if ((TREE_CODE (type) == RECORD_TYPE || TREE_CODE (type) == UNION_TYPE) + mode != VOIDmode host_integerp (TYPE_SIZE (type), 1) GET_MODE_BITSIZE (mode) == TREE_INT_CST_LOW (TYPE_SIZE (type))) SET_TYPE_MODE (type, mode);
[Bug rtl-optimization/54342] [4.8 Regression] Wrong mode of call argument
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54342 --- Comment #3 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-08-21 11:18:39 UTC --- I'm working on vzeroupper insertion and my implementation inserts vzeroupper before the call because VALID_AVX256_REG_MODE returns false.
[Bug middle-end/53616] [4.8 Regression] 416.gamess in SPEC CPU 2006 miscompiled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53616 --- Comment #15 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-07-24 15:36:16 UTC --- 416.games is passed now.
[Bug middle-end/53616] [4.8 Regression] 416.gamess in SPEC CPU 2006 miscompiled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53616 --- Comment #10 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-07-23 12:48:59 UTC --- Created attachment 27858 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27858 Reduced test case
[Bug middle-end/53616] [4.8 Regression] 416.gamess in SPEC CPU 2006 miscompiled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53616 Vladimir Yakovlev vbyakovl23 at gmail dot com changed: What|Removed |Added CC||vbyakovl23 at gmail dot com --- Comment #11 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-07-23 12:53:13 UTC --- Miscompare in 416 .games is caused by a wrong transformation of a loop in file grd2b.f, lines 113-121. DO 110 M=1,3 P12(M,1)= C(M,IAT) P12(M,2)= C(M,JAT) P12(M,3)= P12(M,2)-P12(M,1) R12= R12+P12(M,3)*P12(M,3) P34(M,1)= C(M,KAT) P34(M,2)= C(M,LAT) P34(M,3)= P34(M,2)-P34(M,1) 110 R34= R34+P34(M,3)*P34(M,3) After transformation we have P12(:,1) = C(:,IAT) P12(:,2) = C(:,jAT) DO 110 M=1,3 P12(M,3)= P12(M,2)-P12(M,1) R12= R12+P12(M,3)*P12(M,3) P34(M,3)= P34(M,2)-P34(M,1) 110 R34= R34+P34(M,3)*P34(M,3) P34(:,1) = C(:,KAT) P34(:,2) = C(:,LAT) That is we changed order of operators in the loop. Right transformation should be P12(:,1) = C(:,IAT) P12(:,2) = C(:,jAT) DO 110 M=1,3 P12(M,3)= P12(M,2)-P12(M,1) 110 R12= R12+P12(M,3)*P12(M,3) P34(:,1) = C(:,KAT) P34(:,2) = C(:,LAT) DO 111 M=1,3 P34(M,3)= P34(M,2)-P34(M,1) 111 R34= R34+P34(M,3)*P34(M,3) I attached a reduced test case and dumps with and without transformations. Command line to compile is gfortran m.f t.f -O3 The result of run is differed from a result of code compiled with -O0 opt level. I used compiler Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure --with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-cloog-backend=isl --with-fpmath=sse --enable-languages=c,c++,fortran --enable-bootstrap=no Thread model: posix gcc version 4.8.0 20120606 (experimental) (GCC)
[Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726 --- Comment #19 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-06-21 12:46:11 UTC --- (In reply to comment #13) (In reply to comment #10) I've tried without static. Runtimes is still the same. It doesn't match what I saw. On Atom D510: /export/gnu/import/git/gcc-regression/master/188261/usr/bin/gcc -ansi -O3 -ffast-math -msse2 -mfpmath=sse -m32 -march=atom m.c test.c -o new time ./new ./new 58.46s user 0.00s system 99% cpu 58.479 total /export/gnu/import/git/gcc-regression/master/188259/usr/bin/gcc -ansi -O3 -ffast-math -msse2 -mfpmath=sse -m32 -march=atom m.c test.c -o old time ./old ./old 58.38s user 0.00s system 99% cpu 58.490 total I rechecked there is no regression without static on Sundy Bridge nor Atom.
[Bug c/53726] New: [4.8 Regression] aes test performance drop for eembc_2_0_peak_32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726 Bug #: 53726 Summary: [4.8 Regression] aes test performance drop for eembc_2_0_peak_32 Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: vbyakov...@gmail.com After fix r188261 | rguenth | 2012-06-06 13:45:27 +0400 (Wed, 06 Jun 2012) | 23 lines 2012-06-06 Richard Guenther rguent...@suse.de PR tree-optimization/53081 * tree-data-ref.h (adjacent_store_dr_p): Rename to ... (adjacent_dr_p): ... this and make it work for reads, too. * tree-loop-distribution.c (enum partition_kind): Add PKIND_MEMCPY. (struct partition_s): Change main_stmt to main_dr, add secondary_dr member. (build_size_arg_loc): Change to date data-reference and not gimplify here. (build_addr_arg_loc): New function split out from ... (generate_memset_builtin): ... here. Use it and simplify. (generate_memcpy_builtin): New function. (generate_code_for_partition): Adjust. (classify_partition): Streamline pattern detection. Detect memcpy. (ldist_gen): Adjust. (tree_loop_distribution): Adjust seed statements for memcpy recognition. * gcc.dg/tree-ssa/ldist-20.c: New testcase. * gcc.dg/tree-ssa/loop-19.c: Add -fno-tree-loop-distribute-patterns. regression on Atom 11%, on Sundy Bridge 30%. The fix lead to unrecognition of memcpy. Reduced test case and assemblers are attached. Command line to reproduce gcc -ansi -O3 -ffast-math -msse2 -mfpmath=sse -m32 -static -march=corei7 -mtune=corei7 test.c
[Bug c/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726 --- Comment #1 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-06-20 06:13:26 UTC --- Created attachment 27658 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27658 Test case and assemblers
[Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726 --- Comment #3 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-06-20 10:48:11 UTC --- I added executable testcase. Command line to compile gcc -g -ansi -O3 -ffast-math -msse2 -mfpmath=sse -m32 -static -march=corei7 -mtune=corei7 test.c m.c Run results Wed Jun 20 14:39:05: /gnumnt/msticlxl25_users/vbyakovl/1020/test$ time ./test.corei7.bad.exe real0m6.317s user0m6.290s sys 0m0.002s Wed Jun 20 14:39:24: /gnumnt/msticlxl25_users/vbyakovl/1020/test$ time ./test.corei7.good.exe real0m4.815s user0m4.713s sys 0m0.000s
[Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726 --- Comment #4 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-06-20 10:50:28 UTC --- Created attachment 27664 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27664 Executable test case
[Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726 --- Comment #10 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-06-20 14:34:32 UTC --- I've tried without static. Runtimes is still the same.
[Bug c/52632] New: GCC compfail on O0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52632 Bug #: 52632 Summary: GCC compfail on O0 Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: vbyakov...@gmail.com Test case a.c gives following compfail when compiling with O0 gcc –O0 –c a.c a.c: In function 'foo': a.c:4:1: error: size of unnamed array is negative For higher opt level it's ok. gcc version 4.8.0 20120319 (experimental) (GCC) The failure happened because FE on –O0 replaces builtin call by zero if it is not folded. See gcc/builtins.c, fold_builtin_1(), line 10270 switch (fcode) { case BUILT_IN_CONSTANT_P: { tree val = fold_builtin_constant_p (arg0); /* Gimplification will pull the CALL_EXPR for the builtin out of an if condition. When not optimizing, we'll not CSE it back. To avoid link error types of regressions, return false now. */ if (!val !optimize) val = integer_zero_node; return val; } It may be fixed by a patch that disabled error message in case of not optimize. diff --git a/gcc/c-decl.c b/gcc/c-decl.c index 160d393..1ba3f51 100644 --- a/gcc/c-decl.c +++ b/gcc/c-decl.c @@ -5345,7 +5345,7 @@ grokdeclarator (const struct c_declarator *declarator, if (TREE_CODE (size) == INTEGER_CST size_maybe_const) { constant_expression_warning (size); - if (tree_int_cst_sgn (size) 0) + if ((pedantic || optimize) tree_int_cst_sgn (size) 0) { if (name) error_at (loc, size of array %qE is negative, name);
[Bug c/52632] GCC compfail on O0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52632 --- Comment #2 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-03-20 10:03:47 UTC --- Created attachment 26929 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26929 Test case
[Bug middle-end/52580] [4.8 Regression] 171.swim performance drop on x86 – vectorization doesn’t happen anymore
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52580 --- Comment #6 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-03-15 12:53:50 UTC --- I checked the fix gives 21% acceleration of 171.swim on Sundy Bridge. Thanks.
[Bug fortran/52580] New: [4.8 Regression] 171.swim performance drop on x86 – vectorization doesn’t happen anymore
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52580 Bug #: 52580 Summary: [4.8 Regression] 171.swim performance drop on x86 – vectorization doesn’t happen anymore Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran AssignedTo: unassig...@gcc.gnu.org ReportedBy: vbyakov...@gmail.com Regression could be seen on Sandy Bridge. Change set analysis points to commit commit 95539e1deabbaa9dbc84b1d81ce6d0c8e7156a0f Author: rguenth rguenth@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Fri Mar 2 14:58:55 2012 + 2012-03-02 Richard Guenther rguent...@suse.de PR tree-optimization/52406 * tree-data-ref.h: Update documentation about DR_BASE_OBJECT. (struct indices): Add unconstrained_base member. (struct dr_alias): Remove unused vops member. (DR_UNCONSTRAINED_BASE): New define. * tree-data-ref.c (dr_analyze_indices): For COMPONENT_REFs add indices to allow their disambiguation. Make DR_BASE_OBJECT be an artificial access that covers the whole indexed object, or mark it with DR_UNCONSTRAINED_BASE if we cannot do so. Canonicalize plain decl base-objects to their MEM_REF variant. (dr_may_alias_p): When the base-object of either data reference has unknown size use only points-to information. (compute_affine_dependence): Make dumps easier to read and more verbose. * tree-vect-data-ref.c (vector_alignment_reachable_p): Use DR_REF when looking for packed references. (vect_supportable_dr_alignment): Likewise. * gcc.dg/torture/pr52406.c: New testcase. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@184789 138bc75d-0d04-0410-961f-82ee72b054a4 There are vectorizer problems. Vectorization doesn’t happened for the hottest routines calc2() and calc3(). Command line to reproduce gfortran -g -static -m32 -S -O3 -funroll-loops -msse2 -mfpmath=sse -ffast-math -march=corei7 swim.f gcc –v Using built-in specs. COLLECT_GCC=/gnumnt/msticlxl16_users/vbyakovl/workspaces/619/install-exp/bin/gcc COLLECT_LTO_WRAPPER=/gnumnt/msticlxl16_users/vbyakovl/workspaces/619/install-exp/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.8.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure --prefix=/export/users/vbyakovl/workspaces/619/install-exp --disable-bootstrap --enable-languages=c,c++,fortran CFLAGS=-g3 Thread model: posix gcc version 4.8.0 20120312 (experimental) (GCC)
[Bug libstdc++/52241] Performance degradation of 447.dealII on corei7 at spec2006_base32.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52241 --- Comment #20 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-20 18:04:45 UTC --- (In reply to comment #19) Nice, so we want Paolo's patch. Out of interest, what are the 447.deal numbers when comparing linking against old (pre-Benjamin's commit) libstdc++.a and current libstdc++.a with Paolo's patch (or libstdc++.so.6, i.e. without -static)? Runspec numbers (as runspec prints) for both static and dynamic Static Base: Old: 447.dealII 11440324 35.3 * New: 447.dealII 11440302 37.9 * Peak: Old: 447.dealII 11440285 40.2 * New: 447.dealII 11440268 42.6 * Dynamic Base: Old: 447.dealII 11440327 34.9 * New: 447.dealII 11440327 35.0 * Peak: Old: 447.dealII 11440287 39.9 S New: 447.dealII 11440288 39.7 * So, no effect in dynamic case. Is it right?
[Bug c++/52241] Performance degradation of 447.dealII on corei7 at spec2006_base32.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52241 --- Comment #18 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-20 05:37:32 UTC --- I tested Paolo's patch and got acceleration on 447.dial base: +7.36% peak: +5.97% Also I looked dumps: the new routine 'local_Rb_tree_increment' in inlined now in both uses.
[Bug regression/52272] New: [4.7 regression] Performance regresswion of 410.bwaves on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52272 Bug #: 52272 Summary: [4.7 regression] Performance regresswion of 410.bwaves on x86. Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: regression AssignedTo: unassig...@gcc.gnu.org ReportedBy: vbyakov...@gmail.com Created attachment 26671 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26671 Reduced testcase Commit 2012-02-06 Richard Guenther rguent...@suse.de PR tree-optimization/50955 * tree-ssa-loop-ivopts.c (get_computation_cost_at): Artificially raise cost of expressions that replace an address with an expression based on a different pointer. causes performance regression on 410.bwaves base: -2.33% peak: -3.82% I attached a reduced testcase and dumps of compilers before and after commit. Command line to reproduce gfortran -w -g -m32 -static -S t.s -O3 -funroll-loops -msse2 -mfpmath=sse -ffast-math -march=corei7 t.f
[Bug regression/52272] [4.7 regression] Performance regresswion of 410.bwaves on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52272 --- Comment #1 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-16 08:16:27 UTC --- Created attachment 26672 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26672 Good case before commit
[Bug regression/52272] [4.7 regression] Performance regresswion of 410.bwaves on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52272 --- Comment #2 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-16 08:17:53 UTC --- Created attachment 26673 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26673 Bad case after commit
[Bug c++/52241] Performance degradation of 447.dealII on corei7 at spec2006_base32.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52241 --- Comment #3 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-16 08:58:45 UTC --- Created attachment 26674 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26674 Dump of bad case (with -fPIC -DPIC)
[Bug c++/52241] Performance degradation of 447.dealII on corei7 at spec2006_base32.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52241 --- Comment #4 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-16 09:00:32 UTC --- Created attachment 26675 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26675 Dump of good case (without -fPIC -DPIC)
[Bug c++/52241] Performance degradation of 447.dealII on corei7 at spec2006_base32.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52241 --- Comment #5 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-16 09:05:49 UTC --- (In reply to comment #2) I don't understand what you mean by inlining, since '_Rb_tree_node_base' is a *type* not a function. This is a constructor Anyway, I don't see how Benjamin's split could have caused inlining issues. I don't know also but compare inline dumps with and witout -fPIC -DPIC (attached). I used command line from log of compiler build /export/users/vbyakovl/workspaces/581/build-bad/./gcc/xgcc -shared-libgcc -B/export/users/vbyakovl/workspaces/581/build-bad/./gcc -nostdinc++ -L/export/users/vbyakovl/workspaces/581/build-bad/x86_64-unknown-linux-gnu/libstdc++-v3/src -L/export/users/vbyakovl/workspaces/581/build-bad/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -B/export/users/vbyakovl/workspaces/581/install-bad/x86_64-unknown-linux-gnu/bin/ -B/export/users/vbyakovl/workspaces/581/install-bad/x86_64-unknown-linux-gnu/lib/ -isystem /export/users/vbyakovl/workspaces/581/install-bad/x86_64-unknown-linux-gnu/include -isystem /export/users/vbyakovl/workspaces/581/install-bad/x86_64-unknown-linux-gnu/sys-include -I/export/users/vbyakovl/workspaces/581/gcc/libstdc++-v3/../libgcc -I/export/users/vbyakovl/workspaces/581/build-bad/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu -I/export/users/vbyakovl/workspaces/581/build-bad/x86_64-unknown-linux-gnu/libstdc++-v3/include -I/export/users/vbyakovl/workspaces/581/gcc/libstdc++-v3/libsupc++ -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -fdiagnostics-show-location=once -Wabi -ffunction-sections -fdata-sections -frandom-seed=tree.lo -g -O2 -D_GNU_SOURCE -c ../../../../../gcc/libstdc++-v3/src/c++98/tree.cc -fPIC -DPIC -o tree.o -fdump-ipa-inlin
[Bug tree-optimization/52272] [4.7 regression] Performance regresswion of 410.bwaves on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52272 --- Comment #6 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-16 14:42:36 UTC --- I've checked. The patch fixes the regression. Thanks.
[Bug c++/52241] New: Performance degradation of 447.dealII on corei7 at spec2006_base32.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52241 Bug #: 52241 Summary: Performance degradation of 447.dealII on corei7 at spec2006_base32. Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: vbyakov...@gmail.com Guilty commit c1e8b3edf7b5038f070c7a9732e58d066081a636 is the first bad commit commit c1e8b3edf7b5038f070c7a9732e58d066081a636 Author: bkoz bkoz@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Mon Jan 23 23:12:01 2012 + caused performance degradation of 447.dealII benchspec 2006.It was happened because there are no inlining a library routine '_Rb_tree_node_base' from libstdc++-v3/src/tree.cc const _Rb_tree_node_base* _Rb_tree_increment(const _Rb_tree_node_base* __x) throw () { return Rb_tree_increment(const_castlt;_Rb_tree_node_base*gt;(_x)); } I found out that the degradation is caused by absence of -fPIC flag in compilation command line. If I add the flag to command line the inlining is happened.
[Bug bootstrap/49829] [4.7 Regression] --disable-static --enable-shared regression: cannot find -lstdc++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49829 Vladimir Yakovlev vbyakovl23 at gmail dot com changed: What|Removed |Added CC||vbyakovl23 at gmail dot com --- Comment #25 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-13 09:49:09 UTC --- We have performance degradation of 447.dealII benchspec 2006.It was happend bcause thre are no inlining in a library routine from libstdc++-v3/src/tree.cc const _Rb_tree_node_base* _Rb_tree_increment(const _Rb_tree_node_base* __x) throw () { return Rb_tree_increment(const_castlt;_Rb_tree_node_base*gt;(_x)); }
[Bug bootstrap/49829] [4.7 Regression] --disable-static --enable-shared regression: cannot find -lstdc++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49829 --- Comment #26 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-13 09:59:31 UTC --- We have performance degradation of 447.dealII benchspec 2006.It was happened because there are no inlining a library routine '_Rb_tree_node_base' from libstdc++-v3/src/tree.cc const _Rb_tree_node_base* _Rb_tree_increment(const _Rb_tree_node_base* __x) throw () { return Rb_tree_increment(const_castlt;_Rb_tree_node_base*gt;(_x)); } I found out that the degradation is caused by absence of -fPIC flag in compilation command line. If I add the flag to command line the inlining is happened.
[Bug c/50315] New: Regreesion on Atom after fix #49958
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50315 Bug #: 50315 Summary: Regreesion on Atom after fix #49958 Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: vbyakov...@gmail.com Created attachment 25215 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25215 Test case After fix #49958 was lost reassociation that caused regression on Atom. I attached a test case (test.c) and dumps for good (test.c.003t.original.g) and bad (test.c.003t.original.b) cases. Regression is on expression at lines 16-47.
[Bug c/50315] Regreesion on Atom after fix #49958
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50315 --- Comment #1 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2011-09-07 09:32:11 UTC --- Created attachment 25216 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25216 Dump before fix
[Bug c/50315] Regreesion on Atom after fix #49958
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50315 --- Comment #2 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2011-09-07 09:33:20 UTC --- Created attachment 25217 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25217 Dump after fix
[Bug c/50195] New: Linking time eroor with -fast-math -O0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50195 Bug #: 50195 Summary: Linking time eroor with -fast-math -O0 Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: vbyakov...@gmail.com Following test fails in linking if compiled with ffast-math and O0, but it compiled successfully with ffast-math and O2. Also no problem if -lm is added. $ cat t.c #include stdio.h float foo(float x) { float y = 0; while (x 0.0001) { y += x*x*x*x*x*x*x*x*x*x*x*x*x; x = x/2; } return y; } int main (int argc, char*argv[]) { float y = atoi(argv[1]); printf(%f\n, foo(y)); return 0; } $ gcc -ffast-math -O0 t.c /tmp/cccA1sUB.o: In function `foo': t.c:(.text+0x2c): undefined reference to `powf' collect2: error: ld returned 1 exit status $ gcc -ffast-math -O2 t.c $ ./a.out 5 1220852096.00 FE with -ffast-math replaced x*x*...*x with __builtin_powf. Later with -O2 this call is replaced back into multiplications in sincos phase. The stability with -O0 is because sincos phase doesn't work on -O0. I think we must avoid doing this optimization in FE and turn off -ffast-math if -O0 is used. From Richard Guenther: No, I think we should avoid most of the builtin related folding at -O0.