[Bug rtl-optimization/55845] 454.calculix miscompares with -march=btver2 -O3 -ffastmath -fschedule-insns -mvzeroupper for test data run

2013-01-03 Thread vbyakovl23 at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55845



--- Comment #3 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2013-01-04 
06:40:44 UTC ---

The test fails corei7-avx also. I build a simple reproducer.



-

#include stdio.h



#define N 100



double

foo ( 

   int  size, 

   double   y[], 

   double   x[] 

)

{

  double sum = 0.0 ;

  int i ;

  for (i = 0, sum = 0. ; i  size ; i++)

sum += y[i] * x[i] ; 

  return(sum);

}



int main ()

{

  double x[N];

  double y[N];

  double s;

  int i;



  for (i = 0; i  N; i++)

{

  x[i] = i;

  y[i] = i;

}



  s = foo (N, y, x);



  printf(%s\n, s == 328350 ? pass : fail);

}

--



$ gcc -mavx -g -static  -o t  -O3 -ffast-math -march=corei7 t.c -fno-inline 

./t

pass

$ gcc  -fschedule-insns -mavx -g -static  -o t  -O3 -ffast-math -march=corei7

t.c -fno-inline  ./t  

fail



Responsible phase is jump2. To switch off the phase I did changes



diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c

index 5d142e9..be04c5d 100644

--- a/gcc/cfgcleanup.c

+++ b/gcc/cfgcleanup.c

@@ -3070,6 +3070,7 @@ struct rtl_opt_pass pass_jump =

 static unsigned int

 execute_jump2 (void)

 {

+  if (!getenv(NOJMP2))

   cleanup_cfg (flag_crossjumping ? CLEANUP_CROSSJUMP : 0);

   return 0;

 }



env NOJMP2=1 gcc  -fschedule-insns -mavx -g -static  -o t  -O3 -ffast-math

-march=corei7 t.c -fno-inline  ./t

pass



I used compiler

Target: x86_64-unknown-linux-gnu

Configured with: ../gcc/configure CFLAGS='-O0 -g3' CXXFLAGS='-O0 -g3'

--prefix=/gnumnt/msticlxl16_users/vbyakovl/workspaces/gcc/install

--enable-clocale=gnu --disable-bootstrap --with-system-zlib --enable-shared

--with-demangler-in-ld --with-fpmath=sse --with-arch=corei7-avx

--with-cpu=corei7-avx --enable-languages=c,c++,fortran,java,lto,objc

--no-create --no-recursion


[Bug regression/54390] [AVX] FAIL: gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c

2012-12-28 Thread vbyakovl23 at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54390

Vladimir Yakovlev vbyakovl23 at gmail dot com changed:

   What|Removed |Added

 CC||vbyakovl23 at gmail dot com

--- Comment #2 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-12-28 
15:40:53 UTC ---
Compiler has different behavior on the test
gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c with –mavx and –mno-avx. With –mno-avx
routine get_vectype_for_scalar_type (scalar_type) at tree-vect-data-refs.c:3265
returns NULL for scalar_type “struct A” whereas with –mavx returns “vector(2)
__int128 unsigned”. 
The test is passed if constant 16 at line 6 of the test is replaced by 32 or 64
(better 64 otherwise we will have the problem with avx2 in future).

diff --git a/gcc/testsuite/gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c
b/gcc/testsuite/gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c
index 90dcd84..68e0bf1 100644
--- a/gcc/testsuite/gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c
+++ b/gcc/testsuite/gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c
@@ -3,7 +3,7 @@

 typedef __complex__ float Value;
 typedef struct {
-  Value a[16 / sizeof (Value)];
+  Value a[64 / sizeof (Value)];
 } A;

 A sum(A a,A b)


[Bug lto/55660] New: ICE instead of some warning during lto build

2012-12-11 Thread vbyakovl23 at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55660

 Bug #: 55660
   Summary: ICE instead of some warning during lto build
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: vbyakov...@gmail.com


LTO compilation fails in the link time if o-files created with -funsigned-char
but linking without.
R174519 is the first commit when it was appeared.


$ cat t_f.c 
char n[3] = {'a','b','c'};
int foo(char *x)
{
  if (*x == 'y')
return (int)*x;
  *x = 'y';
  return 0;
}

$ cat t_m.c
#include stdio.h

extern int foo (char*);

extern char n[3];

int main ()
{
  int i, m = 0;
  for (i = 0; i  3; i++)
m += foo(n[i]);

  printf(%d\n, m);
}

$ gcc -c -m32 -O2  -flto t_f.c t_m.c -funsigned-char ; gcc -m32  -O2  -flto
t_f.o t_m.o  -o t0  
In file included from :0:0:
t_m.c: In function a€?maina€™:
t_m.c:7:5: error: mismatching comparison operand types
 int main ()
 ^
char
unsigned char
if (_14 == 121)

t_m.c:7:5: internal compiler error: verify_gimple failed
0x98ab8b verify_gimple_in_cfg(function*)
../../gcc/gcc/tree-cfg.c:4728
0x8794d0 execute_function_todo
../../gcc/gcc/passes.c:1973
0x8787e9 do_per_function
../../gcc/gcc/passes.c:1705
0x8795f4 execute_todo
../../gcc/gcc/passes.c:2006
0x879a5e execute_one_ipa_transform_pass
../../gcc/gcc/passes.c:2183
0x879b42 execute_all_ipa_transforms()
../../gcc/gcc/passes.c:2213
0x5ae627 expand_function
../../gcc/gcc/cgraphunit.c:1615
0x5aeb00 expand_all_functions
../../gcc/gcc/cgraphunit.c:1726
0x5af58a compile()
../../gcc/gcc/cgraphunit.c:2024
0x512f39 lto_main()
../../gcc/gcc/lto/lto.c:3399
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.
lto-wrapper: gcc returned 1 exit status
/bin/ld: lto-wrapper failed
collect2: error: ld returned 1 exit status

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/export/users/vbyakovl/workspaces/gcc/install-ref/libexec/gcc/x86_64-unknown-linux-gnu/4.8.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc/configure CFLAGS='-O0 -g3'
--prefix=/export/users/vbyakovl/workspaces/gcc/install-ref --disable-bootstrap
--enable-languages=c,c++,fortran,lto CXXFLAGS='-O0 -g3' 
Thread model: posix
gcc version 4.8.0 20121105 (experimental) (GCC)


[Bug target/54342] OImode is used for _m256 types when using unions in a function call.

2012-11-21 Thread vbyakovl23 at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54342



Vladimir Yakovlev vbyakovl23 at gmail dot com changed:



   What|Removed |Added



 Status|NEW |RESOLVED

 Resolution||FIXED



--- Comment #14 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-11-22 
07:06:12 UTC ---

The vzeroupper implementation is in trunk now. To fix the problem I made use of

the proposes if HJ. So the issue can be closed.


[Bug middle-end/54985] New: Dom optimization erroneous remove conditional goto.

2012-10-19 Thread vbyakovl23 at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54985

 Bug #: 54985
   Summary: Dom optimization erroneous remove conditional goto.
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: vbyakov...@gmail.com


Attached test case fails if compiled with -O1 and higher.

gcc -O0 -m32  q.c qm.c  ; ./a.out  echo pass || echo FAIL 
pass
gcc -g3 -O1 -m32  q.c qm.c  ; ./a.out  echo pass || echo FAIL 
FAIL

gcc -v
Using built-in specs.
COLLECT_GCC=/export/users/vbyakovl/workspaces/gcc/install-ref/bin/gcc
COLLECT_LTO_WRAPPER=/export/users/vbyakovl/workspaces/gcc/install-ref/libexec/gcc/x86_64-unknown-linux-gnu/4.8.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc/configure CFLAGS='-O0 -g3'
--prefix=/export/users/vbyakovl/workspaces/gcc/install-ref --disable-bootstrap
--enable-languages=c,c++,fortran,lto CXXFLAGS='-O0 -g3' : (reconfigured)
../gcc/configure CFLAGS='-O0 -g3'
--prefix=/export/users/vbyakovl/workspaces/gcc/install-ref --disable-bootstrap
CXXFLAGS='-O0 -g3' --enable-languages=c,c++,fortran,lto --no-create
--no-recursion : (reconfigured) ../gcc/configure CFLAGS='-O0 -g3'
--prefix=/export/users/vbyakovl/workspaces/gcc/install-ref --disable-bootstrap
CXXFLAGS='-O0 -g3' --enable-languages=c,c++,fortran,lto --no-create
--no-recursion
Thread model: posix
gcc version 4.8.0 20121015 (experimental) (GCC)

Wrong compilation of a routine

int foo(ST *s, int c)
{
int first = 1;
int count = c;
ST *item = s;
int a = s-a;
int x;

while (count--)
{  
x = item-a;
if (first)
first = 0;
else if (x = a)
return 1;
a = x;
item++;
}
return 0;
}

Compiler sets equivalence between ‘x’ and ‘a’ (routine tree-ssa-threadedge.c
/record_temporary_equivalences_from_phis() ) and folds comparison x = a to
true.


[Bug middle-end/54985] Dom optimization erroneous remove conditional goto.

2012-10-19 Thread vbyakovl23 at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54985



--- Comment #1 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-10-19 
10:58:26 UTC ---

Created attachment 28489

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28489

Test case


[Bug middle-end/54985] Dom optimization erroneous remove conditional goto.

2012-10-19 Thread vbyakovl23 at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54985



--- Comment #2 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-10-19 
10:59:15 UTC ---

Created attachment 28490

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28490

Main routine


[Bug tree-optimization/54901] [4.8 Regression] air.f90, aermod.f90, and mdbx.f90 are miscompiled with '-m64 -O3 -funroll-loops -fwhole-program' after revision 192213

2012-10-16 Thread vbyakovl23 at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54901



Vladimir Yakovlev vbyakovl23 at gmail dot com changed:



   What|Removed |Added



 CC||vbyakovl23 at gmail dot com



--- Comment #1 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-10-16 
08:55:16 UTC ---

Dominique, could you attach the tests.


[Bug target/47440] Use LCM for vzeroupper insertion

2012-08-23 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47440

--- Comment #4 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-08-23 
19:15:58 UTC ---
As recomended Uros, I splitted up the patch by two part. First, middle end part
is here
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01590.html


[Bug target/47440] Use LCM for vzeroupper insertion

2012-08-22 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47440

Vladimir Yakovlev vbyakovl23 at gmail dot com changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #3 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-08-22 
15:25:54 UTC ---
I implemented vzeroupper insertion using mode switching technique.
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01429.html


[Bug rtl-optimization/54342] New: [4.8 Regression] Wrong mode of call argument

2012-08-21 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54342

 Bug #: 54342
   Summary: [4.8 Regression] Wrong mode of call argument
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: vbyakov...@gmail.com


The argument of call has mode OI rather than V8SF. Test case

-
#include immintrin.h

typedef union un1
{
__m256 x;
float f;
} UN1;
UN1 u;
extern __m256 y;
extern int bar2(UN1);

int foo2 ()
{
u.x = y;
return bar2(u);
}
--

Dump after expand

(note 4 2 5 3 [bb 3] NOTE_INSN_BASIC_BLOCK) 

(insn 5 4 6 3 (set (reg/f:DI 62) 
(symbol_ref:DI (u)  var_decl 0x7f312f270a00 u)) er.c:13 -1 
 (nil)) 

(insn 6 5 7 3 (set (reg:V8SF 63) 
(mem/c:V8SF (symbol_ref:DI (y) [flags 0x40]  var_decl 0x7f312f270aa0
y) [2 y+0 S32 A256])) er.c:13 -1 
 (nil)) 

(insn 7 6 8 3 (set (mem/c:V8SF (reg/f:DI 62) [0 u.x+0 S32 A256]) 
(reg:V8SF 63)) er.c:13 -1 
 (nil)) 

(insn 8 7 9 3 (set (reg/f:DI 65) 
(symbol_ref:DI (u)  var_decl 0x7f312f270a00 u)) er.c:14 -1 
 (nil)) 

(insn 9 8 10 3 (set (reg:OI 64) 
(mem/c:OI (reg/f:DI 65) [3 u+0 S32 A256])) er.c:14 -1 
 (nil)) 

(insn 10 9 11 3 (set (reg:OI 21 xmm0) 
(reg:OI 64)) er.c:14 -1 
 (nil)) 

(call_insn/j 11 10 12 3 (parallel [ 
(set (reg:SI 0 ax) 
(call (mem:QI (symbol_ref:DI (bar2) [flags 0x41] 
function_decl 0x7f312f27b300 bar2) [0 bar2 S1 A8]) 
(const_int 0 [0]))) 
(unspec [ 
(const_int 1 [0x1]) 
] UNSPEC_CALL_NEEDS_VZEROUPPER) 
]) er.c:14 -1 
 (nil) 
(expr_list:REG_DEP_TRUE (use (reg:OI 21 xmm0)) 
(nil)))

Following patch fixes that.

diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c
index 53554a9..bb39e7f 100644
--- a/gcc/stor-layout.c
+++ b/gcc/stor-layout.c
@@ -1639,7 +1639,8 @@ compute_record_mode (tree type)
   /* If we only have one real field; use its mode if that mode's size
  matches the type's size.  This only applies to RECORD_TYPE.  This
  does not apply to unions.  */
-  if (TREE_CODE (type) == RECORD_TYPE  mode != VOIDmode
+  if ((TREE_CODE (type) == RECORD_TYPE || TREE_CODE (type) == UNION_TYPE)
+   mode != VOIDmode
host_integerp (TYPE_SIZE (type), 1)
GET_MODE_BITSIZE (mode) == TREE_INT_CST_LOW (TYPE_SIZE (type)))
 SET_TYPE_MODE (type, mode);


[Bug rtl-optimization/54342] [4.8 Regression] Wrong mode of call argument

2012-08-21 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54342

--- Comment #3 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-08-21 
11:18:39 UTC ---
I'm working on vzeroupper insertion and my implementation inserts vzeroupper
before the call because VALID_AVX256_REG_MODE returns false.


[Bug middle-end/53616] [4.8 Regression] 416.gamess in SPEC CPU 2006 miscompiled

2012-07-24 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53616

--- Comment #15 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-07-24 
15:36:16 UTC ---
416.games is passed now.


[Bug middle-end/53616] [4.8 Regression] 416.gamess in SPEC CPU 2006 miscompiled

2012-07-23 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53616

--- Comment #10 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-07-23 
12:48:59 UTC ---
Created attachment 27858
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27858
Reduced test case


[Bug middle-end/53616] [4.8 Regression] 416.gamess in SPEC CPU 2006 miscompiled

2012-07-23 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53616

Vladimir Yakovlev vbyakovl23 at gmail dot com changed:

   What|Removed |Added

 CC||vbyakovl23 at gmail dot com

--- Comment #11 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-07-23 
12:53:13 UTC ---
Miscompare in 416 .games is caused by a wrong transformation of a loop in file
grd2b.f, lines 113-121.

  DO 110 M=1,3
P12(M,1)= C(M,IAT)
P12(M,2)= C(M,JAT)
P12(M,3)= P12(M,2)-P12(M,1)
  R12= R12+P12(M,3)*P12(M,3)
 P34(M,1)= C(M,KAT)
 P34(M,2)= C(M,LAT)
 P34(M,3)= P34(M,2)-P34(M,1)
  110 R34= R34+P34(M,3)*P34(M,3)

After transformation we have

  P12(:,1) = C(:,IAT)
  P12(:,2) = C(:,jAT)
  DO 110 M=1,3
P12(M,3)= P12(M,2)-P12(M,1)
  R12= R12+P12(M,3)*P12(M,3)
 P34(M,3)= P34(M,2)-P34(M,1)
  110 R34= R34+P34(M,3)*P34(M,3)
  P34(:,1) = C(:,KAT)
  P34(:,2) = C(:,LAT)

That is we changed order of operators in the loop. Right transformation should
be

  P12(:,1) = C(:,IAT)
  P12(:,2) = C(:,jAT)
  DO 110 M=1,3
P12(M,3)= P12(M,2)-P12(M,1)
  110   R12= R12+P12(M,3)*P12(M,3)
  P34(:,1) = C(:,KAT)
  P34(:,2) = C(:,LAT)
  DO 111 M=1,3
 P34(M,3)= P34(M,2)-P34(M,1)
  111 R34= R34+P34(M,3)*P34(M,3)

I attached a reduced test case and dumps with and without transformations.
Command line to compile is 

gfortran   m.f t.f -O3
The result of run is differed from a result of code compiled with -O0 opt
level.
I used compiler

Target: x86_64-unknown-linux-gnu
Configured with: ../gcc/configure --with-arch=corei7 --with-cpu=corei7
--enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld
--enable-cloog-backend=isl --with-fpmath=sse --enable-languages=c,c++,fortran
--enable-bootstrap=no
Thread model: posix
gcc version 4.8.0 20120606 (experimental) (GCC)


[Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32

2012-06-21 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726

--- Comment #19 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-06-21 
12:46:11 UTC ---
(In reply to comment #13)
 (In reply to comment #10)
  I've tried without static. Runtimes is still the same.
 
 It doesn't match what I saw.  On Atom D510:
 
 /export/gnu/import/git/gcc-regression/master/188261/usr/bin/gcc -ansi -O3
 -ffast-math -msse2 -mfpmath=sse -m32   -march=atom m.c test.c -o new
 time ./new
 ./new  58.46s user 0.00s system 99% cpu 58.479 total
 /export/gnu/import/git/gcc-regression/master/188259/usr/bin/gcc -ansi -O3
 -ffast-math -msse2 -mfpmath=sse -m32   -march=atom m.c test.c -o old
 time ./old
 ./old  58.38s user 0.00s system 99% cpu 58.490 total

I rechecked there is no regression without static on Sundy Bridge nor Atom.


[Bug c/53726] New: [4.8 Regression] aes test performance drop for eembc_2_0_peak_32

2012-06-20 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726

 Bug #: 53726
   Summary: [4.8 Regression] aes test performance drop for
eembc_2_0_peak_32
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: vbyakov...@gmail.com


After fix

r188261 | rguenth | 2012-06-06 13:45:27 +0400 (Wed, 06 Jun 2012) | 23 lines

2012-06-06  Richard Guenther  rguent...@suse.de

PR tree-optimization/53081
* tree-data-ref.h (adjacent_store_dr_p): Rename to ...
(adjacent_dr_p): ... this and make it work for reads, too.
* tree-loop-distribution.c (enum partition_kind): Add PKIND_MEMCPY.
(struct partition_s): Change main_stmt to main_dr, add
secondary_dr member.
(build_size_arg_loc): Change to date data-reference and not
gimplify here.
(build_addr_arg_loc): New function split out from ...
(generate_memset_builtin): ... here.  Use it and simplify.
(generate_memcpy_builtin): New function.
(generate_code_for_partition): Adjust.
(classify_partition): Streamline pattern detection.  Detect
memcpy.
(ldist_gen): Adjust.
(tree_loop_distribution): Adjust seed statements for memcpy
recognition.

* gcc.dg/tree-ssa/ldist-20.c: New testcase.
* gcc.dg/tree-ssa/loop-19.c: Add -fno-tree-loop-distribute-patterns.

regression on Atom 11%, on Sundy Bridge 30%. The fix lead to unrecognition of
memcpy. Reduced test case and assemblers are attached. Command line to
reproduce

gcc -ansi -O3 -ffast-math -msse2 -mfpmath=sse -m32 -static  -march=corei7
-mtune=corei7   test.c


[Bug c/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32

2012-06-20 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726

--- Comment #1 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-06-20 
06:13:26 UTC ---
Created attachment 27658
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27658
Test case and assemblers


[Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32

2012-06-20 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726

--- Comment #3 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-06-20 
10:48:11 UTC ---
I added executable testcase. Command line to compile

gcc -g -ansi -O3 -ffast-math -msse2 -mfpmath=sse -m32 -static  -march=corei7
-mtune=corei7   test.c m.c

Run results

Wed Jun 20 14:39:05: /gnumnt/msticlxl25_users/vbyakovl/1020/test$ time
./test.corei7.bad.exe   

real0m6.317s
user0m6.290s
sys 0m0.002s
Wed Jun 20 14:39:24: /gnumnt/msticlxl25_users/vbyakovl/1020/test$ time
./test.corei7.good.exe  

real0m4.815s
user0m4.713s
sys 0m0.000s


[Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32

2012-06-20 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726

--- Comment #4 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-06-20 
10:50:28 UTC ---
Created attachment 27664
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27664
Executable test case


[Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32

2012-06-20 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726

--- Comment #10 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-06-20 
14:34:32 UTC ---
I've tried without static. Runtimes is still the same.


[Bug c/52632] New: GCC compfail on O0

2012-03-20 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52632

 Bug #: 52632
   Summary: GCC compfail on O0
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: vbyakov...@gmail.com


Test case a.c gives following compfail when compiling with O0

gcc –O0 –c a.c
a.c: In function 'foo':
a.c:4:1: error: size of unnamed array is negative

For higher opt level it's ok.
gcc version 4.8.0 20120319 (experimental) (GCC)

The failure happened because FE on –O0 replaces builtin call by zero if it is
not folded. See gcc/builtins.c, fold_builtin_1(), line 10270
  switch (fcode)
{
case BUILT_IN_CONSTANT_P:
  { 
tree val = fold_builtin_constant_p (arg0);

/* Gimplification will pull the CALL_EXPR for the builtin out of
   an if condition.  When not optimizing, we'll not CSE it back.
   To avoid link error types of regressions, return false now.  */
if (!val  !optimize)
  val = integer_zero_node;

return val;
  }

It may be fixed by a patch that disabled error message in case of not optimize. 

diff --git a/gcc/c-decl.c b/gcc/c-decl.c
index 160d393..1ba3f51 100644
--- a/gcc/c-decl.c
+++ b/gcc/c-decl.c
@@ -5345,7 +5345,7 @@ grokdeclarator (const struct c_declarator *declarator,
if (TREE_CODE (size) == INTEGER_CST  size_maybe_const)
  {
constant_expression_warning (size);
-   if (tree_int_cst_sgn (size)  0)
+   if ((pedantic || optimize)  tree_int_cst_sgn (size)  0)
  {
if (name)
  error_at (loc, size of array %qE is negative,
name);


[Bug c/52632] GCC compfail on O0

2012-03-20 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52632

--- Comment #2 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-03-20 
10:03:47 UTC ---
Created attachment 26929
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26929
Test case


[Bug middle-end/52580] [4.8 Regression] 171.swim performance drop on x86 – vectorization doesn’t happen anymore

2012-03-15 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52580

--- Comment #6 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-03-15 
12:53:50 UTC ---
I checked the fix gives 21% acceleration of 171.swim on Sundy Bridge. Thanks.


[Bug fortran/52580] New: [4.8 Regression] 171.swim performance drop on x86 – vectorization doesn’t happen anymore

2012-03-13 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52580

 Bug #: 52580
   Summary: [4.8 Regression] 171.swim performance drop on x86 –
vectorization doesn’t happen anymore
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: vbyakov...@gmail.com


Regression could be seen on Sandy Bridge. Change set analysis points to commit 

commit 95539e1deabbaa9dbc84b1d81ce6d0c8e7156a0f
Author: rguenth rguenth@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Fri Mar 2 14:58:55 2012 +

2012-03-02  Richard Guenther  rguent...@suse.de

PR tree-optimization/52406
* tree-data-ref.h: Update documentation about DR_BASE_OBJECT.
(struct indices): Add unconstrained_base member.
(struct dr_alias): Remove unused vops member.
(DR_UNCONSTRAINED_BASE): New define.
* tree-data-ref.c (dr_analyze_indices): For COMPONENT_REFs
add indices to allow their disambiguation.  Make DR_BASE_OBJECT
be an artificial access that covers the whole indexed object,
or mark it with DR_UNCONSTRAINED_BASE if we cannot do so.  Canonicalize
plain decl base-objects to their MEM_REF variant.
(dr_may_alias_p): When the base-object of either data reference
has unknown size use only points-to information.
(compute_affine_dependence): Make dumps easier to read and
more verbose.
* tree-vect-data-ref.c (vector_alignment_reachable_p): Use
DR_REF when looking for packed references.
(vect_supportable_dr_alignment): Likewise.

* gcc.dg/torture/pr52406.c: New testcase.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@184789
138bc75d-0d04-0410-961f-82ee72b054a4

There are vectorizer problems. Vectorization doesn’t happened for the hottest
routines calc2() and calc3().

Command line to reproduce
gfortran -g -static -m32 -S -O3 -funroll-loops -msse2 -mfpmath=sse -ffast-math
-march=corei7 swim.f

gcc –v
Using built-in specs.
COLLECT_GCC=/gnumnt/msticlxl16_users/vbyakovl/workspaces/619/install-exp/bin/gcc
COLLECT_LTO_WRAPPER=/gnumnt/msticlxl16_users/vbyakovl/workspaces/619/install-exp/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.8.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc/configure
--prefix=/export/users/vbyakovl/workspaces/619/install-exp --disable-bootstrap
--enable-languages=c,c++,fortran CFLAGS=-g3 
Thread model: posix
gcc version 4.8.0 20120312 (experimental) (GCC)


[Bug libstdc++/52241] Performance degradation of 447.dealII on corei7 at spec2006_base32.

2012-02-20 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52241

--- Comment #20 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-20 
18:04:45 UTC ---
(In reply to comment #19)
 Nice, so we want Paolo's patch.  Out of interest, what are the 447.deal 
 numbers
 when comparing linking against old (pre-Benjamin's commit) libstdc++.a and
 current libstdc++.a with Paolo's patch (or libstdc++.so.6, i.e. without
 -static)?

Runspec numbers (as runspec prints) for both static and dynamic

Static
Base:
Old: 447.dealII  11440324   35.3 * 
New: 447.dealII  11440302   37.9 * 

Peak:
Old: 447.dealII  11440285   40.2 *
New: 447.dealII  11440268   42.6 *  

Dynamic
Base: 
Old: 447.dealII  11440327   34.9 * 
New: 447.dealII  11440327   35.0 * 

Peak:
Old: 447.dealII  11440287   39.9 S
New: 447.dealII  11440288   39.7 *

So, no effect in dynamic case. Is it right?


[Bug c++/52241] Performance degradation of 447.dealII on corei7 at spec2006_base32.

2012-02-19 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52241

--- Comment #18 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-20 
05:37:32 UTC ---
I tested Paolo's patch and got acceleration on 447.dial

base: +7.36%
peak: +5.97%

Also I looked  dumps: the new routine 'local_Rb_tree_increment' in inlined now
in both uses.


[Bug regression/52272] New: [4.7 regression] Performance regresswion of 410.bwaves on x86.

2012-02-16 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52272

 Bug #: 52272
   Summary: [4.7 regression] Performance regresswion of 410.bwaves
on x86.
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: vbyakov...@gmail.com


Created attachment 26671
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26671
Reduced testcase

Commit 
2012-02-06  Richard Guenther  rguent...@suse.de

PR tree-optimization/50955
* tree-ssa-loop-ivopts.c (get_computation_cost_at): Artificially
raise cost of expressions that replace an address with an
expression based on a different pointer.

causes performance regression on 410.bwaves
base: -2.33%
peak: -3.82%
I attached a reduced testcase and dumps of compilers before and after commit.
Command line to reproduce

gfortran -w -g -m32  -static -S t.s -O3 -funroll-loops -msse2 -mfpmath=sse
-ffast-math -march=corei7 t.f


[Bug regression/52272] [4.7 regression] Performance regresswion of 410.bwaves on x86.

2012-02-16 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52272

--- Comment #1 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-16 
08:16:27 UTC ---
Created attachment 26672
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26672
Good case before commit


[Bug regression/52272] [4.7 regression] Performance regresswion of 410.bwaves on x86.

2012-02-16 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52272

--- Comment #2 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-16 
08:17:53 UTC ---
Created attachment 26673
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26673
Bad case after commit


[Bug c++/52241] Performance degradation of 447.dealII on corei7 at spec2006_base32.

2012-02-16 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52241

--- Comment #3 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-16 
08:58:45 UTC ---
Created attachment 26674
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26674
Dump of bad case (with -fPIC -DPIC)


[Bug c++/52241] Performance degradation of 447.dealII on corei7 at spec2006_base32.

2012-02-16 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52241

--- Comment #4 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-16 
09:00:32 UTC ---
Created attachment 26675
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26675
Dump of good case (without -fPIC -DPIC)


[Bug c++/52241] Performance degradation of 447.dealII on corei7 at spec2006_base32.

2012-02-16 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52241

--- Comment #5 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-16 
09:05:49 UTC ---
(In reply to comment #2)
 I don't understand what you mean by inlining, since '_Rb_tree_node_base' is 
 a
 *type* not a function.

This is a constructor

 Anyway, I don't see how Benjamin's split could have caused inlining issues.

I don't know also but compare inline dumps with and witout -fPIC -DPIC
(attached).

I used command line from log of compiler build

/export/users/vbyakovl/workspaces/581/build-bad/./gcc/xgcc -shared-libgcc
-B/export/users/vbyakovl/workspaces/581/build-bad/./gcc -nostdinc++
-L/export/users/vbyakovl/workspaces/581/build-bad/x86_64-unknown-linux-gnu/libstdc++-v3/src
-L/export/users/vbyakovl/workspaces/581/build-bad/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
-B/export/users/vbyakovl/workspaces/581/install-bad/x86_64-unknown-linux-gnu/bin/
-B/export/users/vbyakovl/workspaces/581/install-bad/x86_64-unknown-linux-gnu/lib/
-isystem
/export/users/vbyakovl/workspaces/581/install-bad/x86_64-unknown-linux-gnu/include
-isystem
/export/users/vbyakovl/workspaces/581/install-bad/x86_64-unknown-linux-gnu/sys-include
-I/export/users/vbyakovl/workspaces/581/gcc/libstdc++-v3/../libgcc
-I/export/users/vbyakovl/workspaces/581/build-bad/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu
-I/export/users/vbyakovl/workspaces/581/build-bad/x86_64-unknown-linux-gnu/libstdc++-v3/include
-I/export/users/vbyakovl/workspaces/581/gcc/libstdc++-v3/libsupc++
-fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual
-fdiagnostics-show-location=once -Wabi -ffunction-sections -fdata-sections
-frandom-seed=tree.lo -g -O2 -D_GNU_SOURCE -c
../../../../../gcc/libstdc++-v3/src/c++98/tree.cc  -fPIC -DPIC -o tree.o
-fdump-ipa-inlin


[Bug tree-optimization/52272] [4.7 regression] Performance regresswion of 410.bwaves on x86.

2012-02-16 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52272

--- Comment #6 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-16 
14:42:36 UTC ---
I've checked. The patch fixes the regression. Thanks.


[Bug c++/52241] New: Performance degradation of 447.dealII on corei7 at spec2006_base32.

2012-02-14 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52241

 Bug #: 52241
   Summary: Performance degradation of 447.dealII on corei7 at
spec2006_base32.
Classification: Unclassified
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: vbyakov...@gmail.com


Guilty commit

c1e8b3edf7b5038f070c7a9732e58d066081a636 is the first bad commit
commit c1e8b3edf7b5038f070c7a9732e58d066081a636
Author: bkoz bkoz@138bc75d-0d04-0410-961f-82ee72b054a4
Date: Mon Jan 23 23:12:01 2012 +

caused performance degradation of 447.dealII benchspec 2006.It was happened
because there are no inlining a library routine '_Rb_tree_node_base' from
libstdc++-v3/src/tree.cc

const _Rb_tree_node_base*  _Rb_tree_increment(const _Rb_tree_node_base* __x)
throw ()
{
  return Rb_tree_increment(const_castlt;_Rb_tree_node_base*gt;(_x));
}

I found out that the degradation is caused by absence of -fPIC flag in
compilation command line. If I add the flag to command line the inlining is
happened.


[Bug bootstrap/49829] [4.7 Regression] --disable-static --enable-shared regression: cannot find -lstdc++

2012-02-13 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49829

Vladimir Yakovlev vbyakovl23 at gmail dot com changed:

   What|Removed |Added

 CC||vbyakovl23 at gmail dot com

--- Comment #25 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-13 
09:49:09 UTC ---
We have performance degradation of 447.dealII benchspec 2006.It was happend
bcause thre are no inlining in a library routine from libstdc++-v3/src/tree.cc

const _Rb_tree_node_base*  _Rb_tree_increment(const _Rb_tree_node_base* __x)
throw ()
{
return Rb_tree_increment(const_castlt;_Rb_tree_node_base*gt;(_x));  }


[Bug bootstrap/49829] [4.7 Regression] --disable-static --enable-shared regression: cannot find -lstdc++

2012-02-13 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49829

--- Comment #26 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2012-02-13 
09:59:31 UTC ---
We have performance degradation of 447.dealII benchspec 2006.It was happened
because there are no inlining a library routine '_Rb_tree_node_base' from
libstdc++-v3/src/tree.cc

const _Rb_tree_node_base*  _Rb_tree_increment(const _Rb_tree_node_base* __x)
throw ()
{
  return Rb_tree_increment(const_castlt;_Rb_tree_node_base*gt;(_x));
}

I found out that the degradation is caused by absence of -fPIC flag in
compilation command line. If I add the flag to command line the inlining is
happened.


[Bug c/50315] New: Regreesion on Atom after fix #49958

2011-09-07 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50315

 Bug #: 50315
   Summary: Regreesion on Atom after fix #49958
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: vbyakov...@gmail.com


Created attachment 25215
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25215
Test case

After fix #49958 was lost reassociation that caused regression on Atom.

I attached a test case (test.c) and dumps for good (test.c.003t.original.g) and
bad (test.c.003t.original.b) cases. 

Regression is on expression at lines 16-47.


[Bug c/50315] Regreesion on Atom after fix #49958

2011-09-07 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50315

--- Comment #1 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2011-09-07 
09:32:11 UTC ---
Created attachment 25216
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25216
Dump before fix


[Bug c/50315] Regreesion on Atom after fix #49958

2011-09-07 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50315

--- Comment #2 from Vladimir Yakovlev vbyakovl23 at gmail dot com 2011-09-07 
09:33:20 UTC ---
Created attachment 25217
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25217
Dump after fix


[Bug c/50195] New: Linking time eroor with -fast-math -O0

2011-08-26 Thread vbyakovl23 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50195

 Bug #: 50195
   Summary: Linking time eroor with -fast-math -O0
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: vbyakov...@gmail.com


Following test fails in linking if compiled with ffast-math and O0,
but it compiled successfully with ffast-math and O2. Also no problem
if -lm is added.

$ cat t.c
#include stdio.h

float foo(float x)
{
  float y = 0;
  while (x  0.0001) {
y += x*x*x*x*x*x*x*x*x*x*x*x*x;
x = x/2;
  }
  return y;
}

int main (int argc, char*argv[])
{
 float y = atoi(argv[1]);
 printf(%f\n, foo(y));
 return 0;
}


$ gcc  -ffast-math -O0   t.c
/tmp/cccA1sUB.o: In function `foo':
t.c:(.text+0x2c): undefined reference to `powf'
collect2: error: ld returned 1 exit status
$ gcc  -ffast-math -O2   t.c
$ ./a.out 5
1220852096.00


FE with -ffast-math replaced x*x*...*x with __builtin_powf. Later with
-O2 this call is replaced back into multiplications in sincos phase.
The stability with -O0 is because sincos phase doesn't work on -O0.

I think we must avoid doing this optimization in FE and turn off
-ffast-math if -O0 is used. 

From Richard Guenther:
No, I think we should avoid most of the builtin related folding at -O0.