Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-02-07 Thread Vladimir Sysoev

Hi!
I create test to reproduce issue with cpu2006/454.calculix
See attached. File e_c3d.f contains cutted subroutine from calculix.
tr535.f main entry point of the test. you can use go-script as a
reference how i get these results. find_stall.pl script which find
problem instruction combinations.

Problem that new compiler generates read instruction right after
write. See some dumps below.

This is inner cycle near line #42 generated by rev. 119759 compiler
.L13:
.LBB22:
.loc 1 42 0
movapd  %xmm2, %xmm0
leaq(%rdx,%rbx), %rax
.loc 1 38 0
addl$1, %edi
addq$24, %rdx
.loc 1 42 0
mulsd   72(%rcx), %xmm0
.loc 1 38 0
addq$72, %rcx
cmpl$4, %edi
.loc 1 42 0
mulsd   %xmm3, %xmm0
mulsd   -8(%rax,%r9,8), %xmm0
mulsd   %xmm4, %xmm0
addsd   %xmm0, %xmm1
.loc 1 38 0
jne .L13

This is for line 42 generated by rev. 119760 compiler
.L13:
.LBB23:
.loc 1 42 0
movsd   72(%rdx), %xmm0
movq80(%rsp), %rax
addq$72, %rdx
mulsd   -8(%r9,%r15,8), %xmm0
addq%rdi, %rax
addq$24, %rdi
.loc 1 38 0
cmpq$72, %rdi
.loc 1 42 0
mulsd   -8(%r11,%r14,8), %xmm0
mulsd   -8(%rax,%r13,8), %xmm0
movq440(%rsp), %rax
mulsd   (%rax), %xmm0
addsd   (%rsi,%r10,8), %xmm0 -|
movsd   %xmm0, (%rsi,%r10,8)-+- problems
.loc 1 38 0
jne .L13



My output is:
real0m3.781s
user0m3.776s
sys 0m0.004s

real0m5.956s
user0m5.948s
sys 0m0.004s
hey... we are going
hey... we are going
Line 31
   addsd   (%rsi,%r10,8), %xmm0
   movsd   %xmm0, (%rsi,%r10,8)

Line 42
   addsd   (%rsi,%r10,8), %xmm0
   movsd   %xmm0, (%rsi,%r10,8)

Feel free to ask if any problems with reproducing occurs.

-Vladimir


--
   * From: Grigory Zagorodnev grigory_zagorodnev at linux dot intel dot com
   * To: gcc at gcc dot gnu dot org, dnovillo at redhat dot com
   * Cc: H. J. Lu hjl at lucon dot org
   * Date: Mon, 15 Jan 2007 17:59:31 +0300
   * Subject: 27% regression of gcc 4.3 performance on cpu2k6/calculix

Hi!
There is a huge regression of gcc 4.3 performance detected on
cpu2006/454.calculix benchmark at -O2 optimization level on
x86_64-redhat-linux.

Regression is caused by mem-ssa merge 12/12/2006 (revision 119760).
http://gcc.gnu.org/viewcvs?view=revrevision=119760


PS: I'm trying to get a small reproducer
- Grigory


test_calculix.tar.bz2
Description: BZip2 compressed data


40% performance regression SPEC2006/leslie3d on gcc-4_2-branch

2007-02-17 Thread Vladimir Sysoev

Hello, Daniel

It looks like your changeset listed bellow makes performance
regression ~40% on SPEC2006/leslie3d. I will try to create minimal
test for this issue this week and update you in any case.

Feel free to ask if any question.

FYI:
Hardware is Core2Duo.

Compiler config
Target: x86_64-redhat-linux
Configured with: ../src/configure
--prefix=/home/vlad/sandbox/bin_search/117891/usr --program-suffix=-42
--enable-shared --enable-threads=posix --enable-__cxa_atexit
--enable-languages=fortran --disable-multilib --with-system-zlib
--host=x86_64-redhat-linux
Thread model: posix
gcc version 4.2.0 20061019 (experimental)



r117891 | dberlin | 2006-10-20 03:05:53 +0400 (Fri, 20 Oct 2006) | 61 lines

2006-10-19  Daniel Berlin  [EMAIL PROTECTED]

   Fix PR tree-optimization/28778
   Fix PR tree-optimization/29156
   Fix PR tree-optimization/29415
   * tree.h (DECL_PTA_ARTIFICIAL): New macro.
   (tree_decl_with_vis): Add artificial_pta_var flag.
   * tree-ssa-alias.c (is_escape_site): Remove alias info argument,
   pushed into callers.
   * tree-ssa-structalias.c (nonlocal_for_type): New variable.
   (nonlocal_all): Ditto.
   (struct variable_info): Add directly_dereferenced member.
   (var_escaped_vars): New variable.
   (escaped_vars_tree): Ditto.
   (escaped_vars_id): Ditto.
   (nonlocal_vars_id): Ditto.
   (new_var_info): Set directly_dereferenced.
   (graph_size): New variable
   (build_constraint_graph): Use graph_size.
   (solve_graph): Don't process constraints that cannot change the
   solution, don't try to propagate an empty solution to our
   successors.
   (process_constraint): Set directly_dereferenced.
   (could_have_pointers): New function.
   (get_constraint_for_component_ref): Don't process STRING_CST.
   (nonlocal_lookup): New function.
   (nonlocal_insert): Ditto.
   (create_nonlocal_var): Ditto.
   (get_nonlocal_id_for_type): Ditto.
   (get_constraint_for): Allow results vector to be empty in the case
   of string constants.
   Handle results of calls properly.
   (update_alias_info): Update alias info stats on number and type of
   calls.
   (find_func_aliases): Use could_have_pointers.
   (make_constraint_from_escaped): Renamed from
   make_constraint_to_anything, and changed to make constraints from
   escape variable.
   (make_constraint_to_escaped): New function.
   (find_global_initializers): Ditto.
   (create_variable_info_for): Make constraint from escaped to any
   global variable, and from any global variable to the set of
   escaped vars.
   (intra_create_variable_infos): Deal with escaped instead of
   pointing to anything.
   (set_uids_in_ptset): Do type pruning on directly dereferenced
   variables.
   (find_what_p_points_to): Adjust call to set_uids_with_ptset.
   (init_base_vars): Fix comment, and initialize escaped_vars.
   (need_to_solve): Removed.
   (find_escape_constraints): New function.
   (expand_nonlocal_solutions): Ditto.
   (compute_points_to_sets): Call find_escape_constraints and
   expand_nonlocal_solutions.
   (delete_points_to_sets): Don't fall off the end of the graph.
   (init_alias_heapvars): Initialize nonlocal_for_type and
   nonlocal_all.
   (delete_alias_heapvars): Free nonlocal_for_type and null out
   nonlocal_all.



[

--
- Vladimir


Re: Massive SPEC failures on trunk

2007-03-05 Thread Vladimir Sysoev

Hi, All

Try minimal reproducer for internal compiler error attached.
See go file for command line and report.log for issue reported by
trunk compiler/
-Vladimir


On 3/5/07, Eric Botcazou [EMAIL PROTECTED] wrote:

 I observe a massive compilation time regression for bootstrap on x86-64
 here, in particular libjava now appears to take *ages* to build:

I cannot reproduce today at the same revision:

real275m23.314s
user242m28.724s
sys 12m18.249s

Something went awry with kpowersave yesterday... sorry for the noise.

--
Eric Botcazou



cutted.tgz
Description: GNU Zip compressed data


Re: Massive SPEC failures on trunk

2007-03-05 Thread Vladimir Sysoev

Hi, All

Sorry for my previous post. It was into wrong place.

There is minimal reproducer for cpu2006/h264ref is attached
use
gcc -O2 -c ./image.c

Compiler from trunk produces:
image.c: In function 'UnifiedOneForthPix':
image.c:35: internal compiler error: in set_value_range, at tree-vrp.c:267

-Vladimir


On 3/4/07, Grigory Zagorodnev [EMAIL PROTECTED] wrote:

Grigory Zagorodnev wrote:
 Trunk GCC shows massive (2 compile-time and 6 run-time) failures on SPEC
 CPU2000 and CPU2006 at i386 and x86_64 on -O2 optimization level.
 Regression introduced somewhere between revision 122487 and 122478.

 http://gcc.gnu.org/viewcvs?view=revrevision=122487
Almost all regressions are due to r122487
cpu2006: 403.gcc 416.gamess 434.zeusmp 464.h264ref 465.tonto
cpu2000: 178.galgel 186.crafty


 http://gcc.gnu.org/viewcvs?view=revrevision=122484
cpu2006/447.dealII is due to revision 122484.

I'll bring more information and try to get minimal reproducers at Monday.

- Grigory

typedef unsigned char byte;//! byte type definition

#define max(a, b) (((a)  (b)) ? (a) : (b))
#define min(a, b) (((a)  (b)) ? (a) : (b))

typedef long long int64;

#define imgpel unsigned short
#define pel_t imgpel


#define IMG_PAD_SIZE4   //! Number of pixels padded around the reference 
frame (=4)
#define MAX_LIST_SIZE 33
#define imgpel unsigned short

int**img4Y_tmp;  //! for quarter pel interpolation

typedef struct storable_picture
{
  int size_x, size_y;
  imgpel **   imgY;  //! Y picture component
} StorablePicture;


const int ONE_FOURTH_TAP[3][2] =
{
  {20,20},
  {-5,-4},
  { 1, 0},
};

void UnifiedOneForthPix (StorablePicture *s)
{
  int i, j, jj,is;
  imgpel  **imgY = s-imgY;
  
  for (i = -IMG_PAD_SIZE; i  s-size_x + IMG_PAD_SIZE; i++)
  {
  is =
(ONE_FOURTH_TAP[0][0] *
(imgY[jj][max (0, min (s-size_x - 1, i))] +
 imgY[jj][max (0, min (s-size_x - 1, i + 1))]) +
ONE_FOURTH_TAP[1][0] *
(imgY[jj][max (0, min (s-size_x - 1, i - 1))] +
 imgY[jj][max (0, min (s-size_x - 1, i + 2))]) +
ONE_FOURTH_TAP[2][0] *
(imgY[jj][max (0, min (s-size_x - 1, i - 2))] +
 imgY[jj][max (0, min (s-size_x - 1, i + 3))]));
  img4Y_tmp[j + IMG_PAD_SIZE][(i + IMG_PAD_SIZE) * 2 + 1] = is * 32;  // 
1/2 pix pos
  }

}



Re: Massive SPEC failures on trunk

2007-03-06 Thread Vladimir Sysoev

FYI
Bug has been already reported
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31037

-Vladimir