[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3

2009-02-10 Thread dwarak dot rajagopal at amd dot com


--- Comment #20 from dwarak dot rajagopal at amd dot com  2009-02-10 16:28 
---
Paulo,
(a)   movaps  (%rax, %rsi), %xmm0
  addps  %xmm0, %xmm1

(b)   movaps  %xmm0, %xmm1
  addps  (%rax, %rsi), %xmm1

Yes, case (a) is slightly better than case (b). It shouldn't matter much though
in amdfam10(shanghai) processors. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824



[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3

2009-02-06 Thread dwarak dot rajagopal at amd dot com


--- Comment #13 from dwarak dot rajagopal at amd dot com  2009-02-06 22:35 
---

 The patch makes GCC to generate movaps load followed by addps.  On Core 2 it
 speeds up the testcase from 7s to 6.2s so I guess it works as expected.
 
 The same however does not reproduce on AMD box and I am not sure if it is just
 coincidence here or if really core preffer to split read-execute SSE 
 operations
 (it is not recommended by the manual).

fyi, AMD (amdfam10) prefers load-execute rather than having separate load and
execute instructions. 


-- 

dwarak dot rajagopal at amd dot com changed:

   What|Removed |Added

 CC||dwarak dot rajagopal at amd
   ||dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824



[Bug target/38201] -mfma/-mavx and -msse5/-msse4a don't work together

2008-11-20 Thread dwarak dot rajagopal at amd dot com


--- Comment #1 from dwarak dot rajagopal at amd dot com  2008-11-20 16:48 
---
1) -msse5 includes -mfma switch (because fma is a part of sse5 instructions).
So having -msse5 -mfma is same as having just msse5, though you can just
have -fma (without -msse5).

2) -mavx -msse5 = Yes. This would not make sense since no machine can run
this.

- Dwarak


(In reply to comment #0)
 Both Intel FMA and AMD SSE5 support FMA. For -mfma, which enables
 Intel FMA and is a dummy at the moment, or -msse5, we will
 generate FMA instructions for
 
 double f;
 
 void
 foo (double x, double y, double z)
 {
   f = x * y + z;
 }
 
 What FMA should -mfma -msse5 generate? Also currently, with
 -O2 -mavx -msse5, we generate
 
 foo:
 fmaddsd %xmm2, %xmm1, %xmm0, %xmm0
 vmovsd  %xmm0, f(%rip)
 ret
 
 which won't run on any machines. For -mfma -msse5 and
 -mavx -msse5,
 
 1. Should these combinations be allowed? If allowed,
 2. Should the last option turn off the first one?
 

(In reply to comment #0)
 Both Intel FMA and AMD SSE5 support FMA. For -mfma, which enables
 Intel FMA and is a dummy at the moment, or -msse5, we will
 generate FMA instructions for
 
 double f;
 
 void
 foo (double x, double y, double z)
 {
   f = x * y + z;
 }
 
 What FMA should -mfma -msse5 generate? Also currently, with
 -O2 -mavx -msse5, we generate
 
 foo:
 fmaddsd %xmm2, %xmm1, %xmm0, %xmm0
 vmovsd  %xmm0, f(%rip)
 ret
 
 which won't run on any machines. For -mfma -msse5 and
 -mavx -msse5,
 
 1. Should these combinations be allowed? If allowed,
 2. Should the last option turn off the first one?
 


-- 

dwarak dot rajagopal at amd dot com changed:

   What|Removed |Added

 CC||dwarak dot rajagopal at amd
   ||dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38201



[Bug target/38201] -mfma/-mavx and -msse5/-msse4a don't work together

2008-11-20 Thread dwarak dot rajagopal at amd dot com


--- Comment #4 from dwarak dot rajagopal at amd dot com  2008-11-20 19:35 
---
Yes, you are right. -mfma -msse5 does not make sense. I mistook -mfma for
-mfused-madd and hence the confusion.

Hence these combinations (1 and 2) does not make sense. 

Thanks,
Dwarak


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38201



[Bug target/38201] -mfma/-mavx and -msse5/-msse4a don't work together

2008-11-20 Thread dwarak dot rajagopal at amd dot com


--- Comment #6 from dwarak dot rajagopal at amd dot com  2008-11-20 19:49 
---

 Should we disallow such combinations?
 
Yes.
- Dwarak


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38201



[Bug middle-end/37851] New: [graphite] ICE in expand_scalar_variables_expr, at graphite.c:3617

2008-10-16 Thread dwarak dot rajagopal at amd dot com
gfortran -O2 -floop-block 939.f90 
939.f90: In function 'solvep':
939.f90:6: internal compiler error: in expand_scalar_variables_expr, at
graphite.c:3617
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

This was tested on the graphite branch. The reduced testcase from polyhedron
benchmark is attached.

- Dwarak


-- 
   Summary: [graphite] ICE in expand_scalar_variables_expr, at
graphite.c:3617
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dwarak dot rajagopal at amd dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37851



[Bug middle-end/37851] [graphite] ICE in expand_scalar_variables_expr, at graphite.c:3617

2008-10-16 Thread dwarak dot rajagopal at amd dot com


--- Comment #1 from dwarak dot rajagopal at amd dot com  2008-10-16 15:00 
---
Created an attachment (id=16509)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16509action=view)
Testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37851



[Bug middle-end/37828] [graphite] in expand_scalar_variables_expr, at graphite.c:3421

2008-10-14 Thread dwarak dot rajagopal at amd dot com


--- Comment #1 from dwarak dot rajagopal at amd dot com  2008-10-14 15:29 
---
Created an attachment (id=16492)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16492action=view)
Testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37828



[Bug middle-end/37828] New: [graphite] in expand_scalar_variables_expr, at graphite.c:3421

2008-10-14 Thread dwarak dot rajagopal at amd dot com
g++ -c -floop-block -O3 bug_rep.cpp 
bug_rep.cpp: In function ‘int sort_and_split(foo**, foo**, long int)’:
bug_rep.cpp:9: internal compiler error: in expand_scalar_variables_expr, at
graphite.c:3421
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

Testcase attached.

- Dwarak


-- 
   Summary: [graphite] in expand_scalar_variables_expr, at
graphite.c:3421
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dwarak dot rajagopal at amd dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37828



[Bug rtl-optimization/33482] New: Invalid operands for pshifts with -O1

2007-09-18 Thread dwarak dot rajagopal at amd dot com
Testcase (test1.c):

#include emmintrin.h
__m128i test_fn1(__m128i x)
{
  __m128i y;
  return _mm_srl_epi64(x,_mm_set_epi32(0,0,31,31));
}

gcc -O1 -c test1.c
/tmp/ccBc8BO7.s: Assembler messages:
/tmp/ccBc8BO7.s:7: Error: suffix or operands invalid for `psrlq'

gcc -O1 -S test1.s

test_fn1:
.LFB501:
psrlq   $133143986207, %xmm0
ret

As we can see that the operands are invalid for psrlq. Similar errors occur for
other pshifts instructions such as psra*, psrl*, and psll*.

A patch to fix this issue is as follows, basically having the right output
modifier for these insns in sse.md.

diff -purwN gcc-4.2.2-RC-20070909/gcc/config/i386/sse.md
gcc-4.2.2-RC-20070909-fix/gcc/config/i386/sse.md
--- gcc-4.2.2-RC-20070909/gcc/config/i386/sse.md2007-09-01
10:28:30.0 -0500
+++ gcc-4.2.2-RC-20070909-fix/gcc/config/i386/sse.md2007-09-17
16:33:26.790117000 -0500
@@ -2724,7 +2724,7 @@
   [(set (match_operand:SSEMODE24 0 register_operand =x)
(ashiftrt:SSEMODE24
  (match_operand:SSEMODE24 1 register_operand 0)
- (match_operand:TI 2 nonmemory_operand xn)))]
+ (match_operand:TI 2 nonmemory_operand xN)))]
   TARGET_SSE2
   psrassevecsize\t{%2, %0|%0, %2}
   [(set_attr type sseishft)
@@ -2734,7 +2734,7 @@
   [(set (match_operand:SSEMODE248 0 register_operand =x)
(lshiftrt:SSEMODE248
  (match_operand:SSEMODE248 1 register_operand 0)
- (match_operand:TI 2 nonmemory_operand xn)))]
+ (match_operand:TI 2 nonmemory_operand xN)))]
   TARGET_SSE2
   psrlssevecsize\t{%2, %0|%0, %2}
   [(set_attr type sseishft)
@@ -2744,7 +2744,7 @@
   [(set (match_operand:SSEMODE248 0 register_operand =x)
(ashift:SSEMODE248
  (match_operand:SSEMODE248 1 register_operand 0)
- (match_operand:TI 2 nonmemory_operand xn)))]
+ (match_operand:TI 2 nonmemory_operand xN)))]
   TARGET_SSE2
   psllssevecsize\t{%2, %0|%0, %2}
   [(set_attr type sseishft)

Is this ok?

- Dwarak


-- 
   Summary: Invalid operands for pshifts with -O1
   Product: gcc
   Version: 4.2.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dwarak dot rajagopal at amd dot com
 GCC build triplet: i686-unknown-linux-gnu
  GCC host triplet: i686-unknown-linux-gnu
GCC target triplet: i686-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33482



[Bug debug/32914] New: ICE with -g option

2007-07-27 Thread dwarak dot rajagopal at amd dot com
Testcase
test-ice.cpp

#include iostream
#include emmintrin.h

const __m128i tmp={0,0};

g++ -O3 -g -c -msse2 test-ice.cpp

I get the following error:
test-ice.cpp:5: internal compiler error: in rtl_for_decl_init, at
dwarf2out.c:10071
Please submit a full bug report,
with preprocessed source if appropriate.
See URL:http://gcc.gnu.org/bugs.html for instructions.

It compiles fine with -g option. This issue is there even in 4.3 mainline as
well.

I tracked this problem to this patch
(http://gcc.gnu.org/ml/gcc-patches/2006-03/msg01567.html). 

Using the following temporary patch fixes this issue. I was basically reverting
the line which causes this issue.

--- dwarf2out.c.orig2007-07-25 10:29:24.790178000 -0500
+++ dwarf2out.c 2007-07-25 10:21:41.378252000 -0500
@@ -10065,8 +10065,8 @@ rtl_for_decl_init (tree init, tree type)
  immediate RTL constant, expand it now.  We must be careful not to
  reference variables which won't be output.  */

-  else if (initializer_constant_valid_p (init, type)
-   ! walk_tree (init, reference_to_unused,NULL,NULL)
+else if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type))
+ initializer_constant_valid_p (init, type))
 {
   rtl = expand_expr (init, NULL_RTX, VOIDmode, EXPAND_INITIALIZER);

Thanks,
- Dwarak


-- 
   Summary: ICE with -g option
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: debug
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dwarak dot rajagopal at amd dot com
 GCC build triplet: x86_64
  GCC host triplet: x86_64
GCC target triplet: x86_64


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32914



[Bug c/27313] New: Does not emit conditional moves for stores

2006-04-25 Thread dwarak dot rajagopal at amd dot com
int cmov(int* A ,int B ,int C ,int* D ,int* E ,int F ,int g) {
  int k,f;
  for (k = 1; k = 1000; k++) {
A[k] = B+C;
g = D[k-1] + E[k-1];
if (g  A[k])  A[k]=g;  /* This is not converted to cmov*/
f += g;
  }
  return f;
}

In the above code, the if-then statement is not converted to conditional move.
It fails for noce_mem_write_may_trap_or_fault_p () condition in ifcvt.c as
it thinks that there is a chance for A[k] access to trap.
The fact here is that in this case, A[k] will never trap because the A[k] is
already been written once along the path from Entry to the A[k] = g. So it is
safe to convert it to a cmov statement. Though there might be two extra moves
(mem to reg and vice versa) statement, it is still better to avoid the branch
especially if it is unpredictable data like for the eg above.

There is a typical case like this in Spec 2006 456.hmmer benchmark. Using
contional moves will make the code faster by 13%-17%. 

-Dwarak


-- 
   Summary: Does not emit conditional moves for stores
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dwarak dot rajagopal at amd dot com
 GCC build triplet: x86_64
  GCC host triplet: x86_64
GCC target triplet: x86_64


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27313



[Bug middle-end/27313] Does not emit conditional moves for stores

2006-04-25 Thread dwarak dot rajagopal at amd dot com


--- Comment #3 from dwarak dot rajagopal at amd dot com  2006-04-25 19:07 
---
Yes this is true. The example I posted was a simplest case where it fails.
Below mmight be a typical case where you have to do two stores. 
int cmov(int* A ,int B ,int C ,int* D ,int* E ,int F ,int g) {
  int k,f;
  for (k = 1; k = 1000; k++) {
A[k] = B+C;
D[k] = C; /* D[k] may alias with A[k] */ 
g = D[k-1] + E[k-1];
if (g  A[k])  A[k]=g;  /* This is not converted to cmov*/
f += g;
  }
  return f;
}

In this case, you cannot reduce the number of stores (becasue D[k] may alias
with A[k]) but you still want the if conversion to take place. I think it is
good to have a mechanism to track if a memory is already been written in ifcvt.
I'm not sure how it can be done at this level though.  

-Dwarak


(In reply to comment #2)
 The other way of getting this is to have the code converted so there is only
 one store instead of two:
 
 int cmov(int* A ,int B ,int C ,int* D ,int* E ,int F ,int g) {
   int k,f;
   for (k = 1; k = 1000; k++) {
 int t = B+C;
 g = D[k-1] + E[k-1];
 if (g  t)  t=g;  /* This is not converted to cmov*/
 A[K] = t;
 f += g;
   }
   return f;
 }
 Which is most likely better anyways as one it is smaller.
 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27313



[Bug fortran/20244] internal compiler error: in fold_convert, at fold-const.c:2003

2005-11-17 Thread dwarak dot rajagopal at amd dot com


--- Comment #12 from dwarak dot rajagopal at amd dot com  2005-11-17 17:30 
---
(In reply to comment #9)
 (In reply to comment #8)
  I got the same ICE with one of the SPEC2006 candidate benchmarks on
  x86_64-linux-gnu.
 
 Was this before or after my fix for PR 18157 went in?  Because this and that
 bug had the same ICE but are really different bugs.
 
Tried with gcc version 4.0.1 20050630 (prerelease) (without the patch) and the
current head (with the patch).I see the same ICE for both before and after your
patch in wrf(spec 2006). Tried with gcc version 4.0.1 20050630 (prerelease)
(without the patch) and the current head (with the patch).

- Dwarak


-- 

dwarak dot rajagopal at amd dot com changed:

   What|Removed |Added

 CC||dwarak dot rajagopal at amd
   ||dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20244