Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix
Hi! I create test to reproduce issue with cpu2006/454.calculix See attached. File e_c3d.f contains cutted subroutine from calculix. tr535.f main entry point of the test. you can use go-script as a reference how i get these results. find_stall.pl script which find problem instruction combinations. Problem that new compiler generates read instruction right after write. See some dumps below. This is inner cycle near line #42 generated by rev. 119759 compiler .L13: .LBB22: .loc 1 42 0 movapd %xmm2, %xmm0 leaq(%rdx,%rbx), %rax .loc 1 38 0 addl$1, %edi addq$24, %rdx .loc 1 42 0 mulsd 72(%rcx), %xmm0 .loc 1 38 0 addq$72, %rcx cmpl$4, %edi .loc 1 42 0 mulsd %xmm3, %xmm0 mulsd -8(%rax,%r9,8), %xmm0 mulsd %xmm4, %xmm0 addsd %xmm0, %xmm1 .loc 1 38 0 jne .L13 This is for line 42 generated by rev. 119760 compiler .L13: .LBB23: .loc 1 42 0 movsd 72(%rdx), %xmm0 movq80(%rsp), %rax addq$72, %rdx mulsd -8(%r9,%r15,8), %xmm0 addq%rdi, %rax addq$24, %rdi .loc 1 38 0 cmpq$72, %rdi .loc 1 42 0 mulsd -8(%r11,%r14,8), %xmm0 mulsd -8(%rax,%r13,8), %xmm0 movq440(%rsp), %rax mulsd (%rax), %xmm0 addsd (%rsi,%r10,8), %xmm0 -| movsd %xmm0, (%rsi,%r10,8)-+- problems .loc 1 38 0 jne .L13 My output is: real0m3.781s user0m3.776s sys 0m0.004s real0m5.956s user0m5.948s sys 0m0.004s hey... we are going hey... we are going Line 31 addsd (%rsi,%r10,8), %xmm0 movsd %xmm0, (%rsi,%r10,8) Line 42 addsd (%rsi,%r10,8), %xmm0 movsd %xmm0, (%rsi,%r10,8) Feel free to ask if any problems with reproducing occurs. -Vladimir -- * From: Grigory Zagorodnev grigory_zagorodnev at linux dot intel dot com * To: gcc at gcc dot gnu dot org, dnovillo at redhat dot com * Cc: H. J. Lu hjl at lucon dot org * Date: Mon, 15 Jan 2007 17:59:31 +0300 * Subject: 27% regression of gcc 4.3 performance on cpu2k6/calculix Hi! There is a huge regression of gcc 4.3 performance detected on cpu2006/454.calculix benchmark at -O2 optimization level on x86_64-redhat-linux. Regression is caused by mem-ssa merge 12/12/2006 (revision 119760). http://gcc.gnu.org/viewcvs?view=revrevision=119760 PS: I'm trying to get a small reproducer - Grigory test_calculix.tar.bz2 Description: BZip2 compressed data
40% performance regression SPEC2006/leslie3d on gcc-4_2-branch
Hello, Daniel It looks like your changeset listed bellow makes performance regression ~40% on SPEC2006/leslie3d. I will try to create minimal test for this issue this week and update you in any case. Feel free to ask if any question. FYI: Hardware is Core2Duo. Compiler config Target: x86_64-redhat-linux Configured with: ../src/configure --prefix=/home/vlad/sandbox/bin_search/117891/usr --program-suffix=-42 --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-languages=fortran --disable-multilib --with-system-zlib --host=x86_64-redhat-linux Thread model: posix gcc version 4.2.0 20061019 (experimental) r117891 | dberlin | 2006-10-20 03:05:53 +0400 (Fri, 20 Oct 2006) | 61 lines 2006-10-19 Daniel Berlin [EMAIL PROTECTED] Fix PR tree-optimization/28778 Fix PR tree-optimization/29156 Fix PR tree-optimization/29415 * tree.h (DECL_PTA_ARTIFICIAL): New macro. (tree_decl_with_vis): Add artificial_pta_var flag. * tree-ssa-alias.c (is_escape_site): Remove alias info argument, pushed into callers. * tree-ssa-structalias.c (nonlocal_for_type): New variable. (nonlocal_all): Ditto. (struct variable_info): Add directly_dereferenced member. (var_escaped_vars): New variable. (escaped_vars_tree): Ditto. (escaped_vars_id): Ditto. (nonlocal_vars_id): Ditto. (new_var_info): Set directly_dereferenced. (graph_size): New variable (build_constraint_graph): Use graph_size. (solve_graph): Don't process constraints that cannot change the solution, don't try to propagate an empty solution to our successors. (process_constraint): Set directly_dereferenced. (could_have_pointers): New function. (get_constraint_for_component_ref): Don't process STRING_CST. (nonlocal_lookup): New function. (nonlocal_insert): Ditto. (create_nonlocal_var): Ditto. (get_nonlocal_id_for_type): Ditto. (get_constraint_for): Allow results vector to be empty in the case of string constants. Handle results of calls properly. (update_alias_info): Update alias info stats on number and type of calls. (find_func_aliases): Use could_have_pointers. (make_constraint_from_escaped): Renamed from make_constraint_to_anything, and changed to make constraints from escape variable. (make_constraint_to_escaped): New function. (find_global_initializers): Ditto. (create_variable_info_for): Make constraint from escaped to any global variable, and from any global variable to the set of escaped vars. (intra_create_variable_infos): Deal with escaped instead of pointing to anything. (set_uids_in_ptset): Do type pruning on directly dereferenced variables. (find_what_p_points_to): Adjust call to set_uids_with_ptset. (init_base_vars): Fix comment, and initialize escaped_vars. (need_to_solve): Removed. (find_escape_constraints): New function. (expand_nonlocal_solutions): Ditto. (compute_points_to_sets): Call find_escape_constraints and expand_nonlocal_solutions. (delete_points_to_sets): Don't fall off the end of the graph. (init_alias_heapvars): Initialize nonlocal_for_type and nonlocal_all. (delete_alias_heapvars): Free nonlocal_for_type and null out nonlocal_all. [ -- - Vladimir
Re: Massive SPEC failures on trunk
Hi, All Try minimal reproducer for internal compiler error attached. See go file for command line and report.log for issue reported by trunk compiler/ -Vladimir On 3/5/07, Eric Botcazou [EMAIL PROTECTED] wrote: I observe a massive compilation time regression for bootstrap on x86-64 here, in particular libjava now appears to take *ages* to build: I cannot reproduce today at the same revision: real275m23.314s user242m28.724s sys 12m18.249s Something went awry with kpowersave yesterday... sorry for the noise. -- Eric Botcazou cutted.tgz Description: GNU Zip compressed data
Re: Massive SPEC failures on trunk
Hi, All Sorry for my previous post. It was into wrong place. There is minimal reproducer for cpu2006/h264ref is attached use gcc -O2 -c ./image.c Compiler from trunk produces: image.c: In function 'UnifiedOneForthPix': image.c:35: internal compiler error: in set_value_range, at tree-vrp.c:267 -Vladimir On 3/4/07, Grigory Zagorodnev [EMAIL PROTECTED] wrote: Grigory Zagorodnev wrote: Trunk GCC shows massive (2 compile-time and 6 run-time) failures on SPEC CPU2000 and CPU2006 at i386 and x86_64 on -O2 optimization level. Regression introduced somewhere between revision 122487 and 122478. http://gcc.gnu.org/viewcvs?view=revrevision=122487 Almost all regressions are due to r122487 cpu2006: 403.gcc 416.gamess 434.zeusmp 464.h264ref 465.tonto cpu2000: 178.galgel 186.crafty http://gcc.gnu.org/viewcvs?view=revrevision=122484 cpu2006/447.dealII is due to revision 122484. I'll bring more information and try to get minimal reproducers at Monday. - Grigory typedef unsigned char byte;//! byte type definition #define max(a, b) (((a) (b)) ? (a) : (b)) #define min(a, b) (((a) (b)) ? (a) : (b)) typedef long long int64; #define imgpel unsigned short #define pel_t imgpel #define IMG_PAD_SIZE4 //! Number of pixels padded around the reference frame (=4) #define MAX_LIST_SIZE 33 #define imgpel unsigned short int**img4Y_tmp; //! for quarter pel interpolation typedef struct storable_picture { int size_x, size_y; imgpel ** imgY; //! Y picture component } StorablePicture; const int ONE_FOURTH_TAP[3][2] = { {20,20}, {-5,-4}, { 1, 0}, }; void UnifiedOneForthPix (StorablePicture *s) { int i, j, jj,is; imgpel **imgY = s-imgY; for (i = -IMG_PAD_SIZE; i s-size_x + IMG_PAD_SIZE; i++) { is = (ONE_FOURTH_TAP[0][0] * (imgY[jj][max (0, min (s-size_x - 1, i))] + imgY[jj][max (0, min (s-size_x - 1, i + 1))]) + ONE_FOURTH_TAP[1][0] * (imgY[jj][max (0, min (s-size_x - 1, i - 1))] + imgY[jj][max (0, min (s-size_x - 1, i + 2))]) + ONE_FOURTH_TAP[2][0] * (imgY[jj][max (0, min (s-size_x - 1, i - 2))] + imgY[jj][max (0, min (s-size_x - 1, i + 3))])); img4Y_tmp[j + IMG_PAD_SIZE][(i + IMG_PAD_SIZE) * 2 + 1] = is * 32; // 1/2 pix pos } }
Re: Massive SPEC failures on trunk
FYI Bug has been already reported http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31037 -Vladimir