[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3

2009-02-10 Thread dwarak dot rajagopal at amd dot com


--- Comment #20 from dwarak dot rajagopal at amd dot com  2009-02-10 16:28 
---
Paulo,
(a)   movaps  (%rax, %rsi), %xmm0
  addps  %xmm0, %xmm1

(b)   movaps  %xmm0, %xmm1
  addps  (%rax, %rsi), %xmm1

Yes, case (a) is slightly better than case (b). It shouldn't matter much though
in amdfam10(shanghai) processors. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824



[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3

2009-02-06 Thread dwarak dot rajagopal at amd dot com


--- Comment #13 from dwarak dot rajagopal at amd dot com  2009-02-06 22:35 
---

> The patch makes GCC to generate movaps load followed by addps.  On Core 2 it
> speeds up the testcase from 7s to 6.2s so I guess it works as expected.
> 
> The same however does not reproduce on AMD box and I am not sure if it is just
> coincidence here or if really core preffer to split read-execute SSE 
> operations
> (it is not recommended by the manual).

fyi, AMD (amdfam10) prefers load-execute rather than having separate load and
execute instructions. 


-- 

dwarak dot rajagopal at amd dot com changed:

   What|Removed |Added

 CC|        |dwarak dot rajagopal at amd
   |        |dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824



[Bug target/38201] -mfma/-mavx and -msse5/-msse4a don't work together

2008-11-20 Thread dwarak dot rajagopal at amd dot com


--- Comment #6 from dwarak dot rajagopal at amd dot com  2008-11-20 19:49 
---

> Should we disallow such combinations?
> 
Yes.
- Dwarak


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38201



[Bug target/38201] -mfma/-mavx and -msse5/-msse4a don't work together

2008-11-20 Thread dwarak dot rajagopal at amd dot com


--- Comment #4 from dwarak dot rajagopal at amd dot com  2008-11-20 19:35 
---
Yes, you are right. "-mfma -msse5" does not make sense. I mistook -mfma for
-mfused-madd and hence the confusion.

Hence these combinations (1 and 2) does not make sense. 

Thanks,
Dwarak


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38201



[Bug target/38201] -mfma/-mavx and -msse5/-msse4a don't work together

2008-11-20 Thread dwarak dot rajagopal at amd dot com


--- Comment #1 from dwarak dot rajagopal at amd dot com  2008-11-20 16:48 
---
1) -msse5 includes -mfma switch (because fma is a part of sse5 instructions).
So having "-msse5 -mfma" is same as having just "msse5", though you can just
have -fma (without -msse5).

2) "-mavx -msse5" => Yes. This would not make sense since no machine can run
this.

- Dwarak


(In reply to comment #0)
> Both Intel FMA and AMD SSE5 support FMA. For -mfma, which enables
> Intel FMA and is a dummy at the moment, or -msse5, we will
> generate FMA instructions for
> 
> double f;
> 
> void
> foo (double x, double y, double z)
> {
>   f = x * y + z;
> }
> 
> What FMA should "-mfma -msse5" generate? Also currently, with
> "-O2 -mavx -msse5", we generate
> 
> foo:
> fmaddsd %xmm2, %xmm1, %xmm0, %xmm0
> vmovsd  %xmm0, f(%rip)
> ret
> 
> which won't run on any machines. For "-mfma -msse5" and
> "-mavx -msse5",
> 
> 1. Should these combinations be allowed? If allowed,
> 2. Should the last option turn off the first one?
> 

(In reply to comment #0)
> Both Intel FMA and AMD SSE5 support FMA. For -mfma, which enables
> Intel FMA and is a dummy at the moment, or -msse5, we will
> generate FMA instructions for
> 
> double f;
> 
> void
> foo (double x, double y, double z)
> {
>   f = x * y + z;
> }
> 
> What FMA should "-mfma -msse5" generate? Also currently, with
> "-O2 -mavx -msse5", we generate
> 
> foo:
> fmaddsd %xmm2, %xmm1, %xmm0, %xmm0
> vmovsd  %xmm0, f(%rip)
>         ret
> 
> which won't run on any machines. For "-mfma -msse5" and
> "-mavx -msse5",
> 
> 1. Should these combinations be allowed? If allowed,
> 2. Should the last option turn off the first one?
> 


-- 

dwarak dot rajagopal at amd dot com changed:

   What|Removed |Added

 CC||dwarak dot rajagopal at amd
   ||dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38201



[Bug middle-end/37851] [graphite] ICE in expand_scalar_variables_expr, at graphite.c:3617

2008-10-16 Thread dwarak dot rajagopal at amd dot com


--- Comment #1 from dwarak dot rajagopal at amd dot com  2008-10-16 15:00 
---
Created an attachment (id=16509)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16509&action=view)
Testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37851



[Bug middle-end/37851] New: [graphite] ICE in expand_scalar_variables_expr, at graphite.c:3617

2008-10-16 Thread dwarak dot rajagopal at amd dot com
gfortran -O2 -floop-block 939.f90 
939.f90: In function 'solvep':
939.f90:6: internal compiler error: in expand_scalar_variables_expr, at
graphite.c:3617
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

This was tested on the graphite branch. The reduced testcase from polyhedron
benchmark is attached.

- Dwarak


-- 
   Summary: [graphite] ICE in expand_scalar_variables_expr, at
graphite.c:3617
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dwarak dot rajagopal at amd dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37851



[Bug middle-end/37828] [graphite] in expand_scalar_variables_expr, at graphite.c:3421

2008-10-14 Thread dwarak dot rajagopal at amd dot com


--- Comment #1 from dwarak dot rajagopal at amd dot com  2008-10-14 15:29 
---
Created an attachment (id=16492)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16492&action=view)
Testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37828



[Bug middle-end/37828] New: [graphite] in expand_scalar_variables_expr, at graphite.c:3421

2008-10-14 Thread dwarak dot rajagopal at amd dot com
g++ -c -floop-block -O3 bug_rep.cpp 
bug_rep.cpp: In function ‘int sort_and_split(foo**, foo**&, long int)’:
bug_rep.cpp:9: internal compiler error: in expand_scalar_variables_expr, at
graphite.c:3421
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

Testcase attached.

- Dwarak


-- 
   Summary: [graphite] in expand_scalar_variables_expr, at
graphite.c:3421
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dwarak dot rajagopal at amd dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37828



[Bug rtl-optimization/33482] New: Invalid operands for pshifts with -O1

2007-09-18 Thread dwarak dot rajagopal at amd dot com
Testcase (test1.c):

#include 
__m128i test_fn1(__m128i x)
{
  __m128i y;
  return _mm_srl_epi64(x,_mm_set_epi32(0,0,31,31));
}

gcc -O1 -c test1.c
/tmp/ccBc8BO7.s: Assembler messages:
/tmp/ccBc8BO7.s:7: Error: suffix or operands invalid for `psrlq'

gcc -O1 -S test1.s

test_fn1:
.LFB501:
psrlq   $133143986207, %xmm0
ret

As we can see that the operands are invalid for psrlq. Similar errors occur for
other pshifts instructions such as psra*, psrl*, and psll*.

A patch to fix this issue is as follows, basically having the right output
modifier for these insns in sse.md.

diff -purwN gcc-4.2.2-RC-20070909/gcc/config/i386/sse.md
gcc-4.2.2-RC-20070909-fix/gcc/config/i386/sse.md
--- gcc-4.2.2-RC-20070909/gcc/config/i386/sse.md2007-09-01
10:28:30.0 -0500
+++ gcc-4.2.2-RC-20070909-fix/gcc/config/i386/sse.md2007-09-17
16:33:26.790117000 -0500
@@ -2724,7 +2724,7 @@
   [(set (match_operand:SSEMODE24 0 "register_operand" "=x")
(ashiftrt:SSEMODE24
  (match_operand:SSEMODE24 1 "register_operand" "0")
- (match_operand:TI 2 "nonmemory_operand" "xn")))]
+ (match_operand:TI 2 "nonmemory_operand" "xN")))]
   "TARGET_SSE2"
   "psra\t{%2, %0|%0, %2}"
   [(set_attr "type" "sseishft")
@@ -2734,7 +2734,7 @@
   [(set (match_operand:SSEMODE248 0 "register_operand" "=x")
(lshiftrt:SSEMODE248
  (match_operand:SSEMODE248 1 "register_operand" "0")
- (match_operand:TI 2 "nonmemory_operand" "xn")))]
+ (match_operand:TI 2 "nonmemory_operand" "xN")))]
   "TARGET_SSE2"
   "psrl\t{%2, %0|%0, %2}"
   [(set_attr "type" "sseishft")
@@ -2744,7 +2744,7 @@
   [(set (match_operand:SSEMODE248 0 "register_operand" "=x")
(ashift:SSEMODE248
  (match_operand:SSEMODE248 1 "register_operand" "0")
- (match_operand:TI 2 "nonmemory_operand" "xn")))]
+ (match_operand:TI 2 "nonmemory_operand" "xN")))]
   "TARGET_SSE2"
   "psll\t{%2, %0|%0, %2}"
   [(set_attr "type" "sseishft")

Is this ok?

- Dwarak


-- 
   Summary: Invalid operands for pshifts with -O1
   Product: gcc
   Version: 4.2.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dwarak dot rajagopal at amd dot com
 GCC build triplet: i686-unknown-linux-gnu
  GCC host triplet: i686-unknown-linux-gnu
GCC target triplet: i686-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33482



[Bug debug/32914] New: ICE with -g option

2007-07-27 Thread dwarak dot rajagopal at amd dot com
Testcase
"test-ice.cpp"

#include 
#include 

const __m128i tmp={0,0};

g++ -O3 -g -c -msse2 test-ice.cpp

I get the following error:
test-ice.cpp:5: internal compiler error: in rtl_for_decl_init, at
dwarf2out.c:10071
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.

It compiles fine with "-g" option. This issue is there even in 4.3 mainline as
well.

I tracked this problem to this patch
(http://gcc.gnu.org/ml/gcc-patches/2006-03/msg01567.html). 

Using the following temporary patch fixes this issue. I was basically reverting
the line which causes this issue.

--- dwarf2out.c.orig2007-07-25 10:29:24.790178000 -0500
+++ dwarf2out.c 2007-07-25 10:21:41.378252000 -0500
@@ -10065,8 +10065,8 @@ rtl_for_decl_init (tree init, tree type)
  immediate RTL constant, expand it now.  We must be careful not to
  reference variables which won't be output.  */

-  else if (initializer_constant_valid_p (init, type)
-  && ! walk_tree (&init, reference_to_unused,NULL,NULL)
+else if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type))
+&& initializer_constant_valid_p (init, type))
 {
   rtl = expand_expr (init, NULL_RTX, VOIDmode, EXPAND_INITIALIZER);

Thanks,
- Dwarak


-- 
   Summary: ICE with -g option
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: debug
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dwarak dot rajagopal at amd dot com
 GCC build triplet: x86_64
  GCC host triplet: x86_64
GCC target triplet: x86_64


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32914



[Bug middle-end/27313] Does not emit conditional moves for stores

2006-04-25 Thread dwarak dot rajagopal at amd dot com


--- Comment #3 from dwarak dot rajagopal at amd dot com  2006-04-25 19:07 
---
Yes this is true. The example I posted was a simplest case where it fails.
Below mmight be a typical case where you have to do two stores. 
int cmov(int* A ,int B ,int C ,int* D ,int* E ,int F ,int g) {
  int k,f;
  for (k = 1; k <= 1000; k++) {
A[k] = B+C;
D[k] = C; /* D[k] may alias with A[k] */ 
g = D[k-1] + E[k-1];
if (g > A[k])  A[k]=g;  /* This is not converted to cmov*/
f += g;
  }
  return f;
}

In this case, you cannot reduce the number of stores (becasue D[k] may alias
with A[k]) but you still want the if conversion to take place. I think it is
good to have a mechanism to track if a memory is already been written in ifcvt.
I'm not sure how it can be done at this level though.  

-Dwarak


(In reply to comment #2)
> The other way of getting this is to have the code converted so there is only
> one store instead of two:
> 
> int cmov(int* A ,int B ,int C ,int* D ,int* E ,int F ,int g) {
>   int k,f;
>   for (k = 1; k <= 1000; k++) {
> int t = B+C;
> g = D[k-1] + E[k-1];
> if (g > t)  t=g;  /* This is not converted to cmov*/
> A[K] = t;
> f += g;
>   }
>   return f;
> }
> Which is most likely better anyways as one it is smaller.
> 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27313



[Bug c/27313] New: Does not emit conditional moves for stores

2006-04-25 Thread dwarak dot rajagopal at amd dot com
int cmov(int* A ,int B ,int C ,int* D ,int* E ,int F ,int g) {
  int k,f;
  for (k = 1; k <= 1000; k++) {
A[k] = B+C;
g = D[k-1] + E[k-1];
if (g > A[k])  A[k]=g;  /* This is not converted to cmov*/
f += g;
  }
  return f;
}

In the above code, the if-then statement is not converted to conditional move.
It fails for "noce_mem_write_may_trap_or_fault_p ()" condition in "ifcvt.c" as
it thinks that there is a chance for A[k] access to trap.
The fact here is that in this case, A[k] will never trap because the A[k] is
already been written once along the path from Entry to the "A[k] = g". So it is
safe to convert it to a cmov statement. Though there might be two extra moves
(mem to reg and vice versa) statement, it is still better to avoid the branch
especially if it is unpredictable data like for the eg above.

There is a typical case like this in Spec 2006 456.hmmer benchmark. Using
contional moves will make the code faster by 13%-17%. 

-Dwarak


-- 
   Summary: Does not emit conditional moves for stores
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: dwarak dot rajagopal at amd dot com
 GCC build triplet: x86_64
  GCC host triplet: x86_64
GCC target triplet: x86_64


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27313



[Bug fortran/20244] internal compiler error: in fold_convert, at fold-const.c:2003

2005-11-17 Thread dwarak dot rajagopal at amd dot com


--- Comment #12 from dwarak dot rajagopal at amd dot com  2005-11-17 17:30 
---
(In reply to comment #9)
> (In reply to comment #8)
> > I got the same ICE with one of the SPEC2006 candidate benchmarks on
> > x86_64-linux-gnu.
> 
> Was this before or after my fix for PR 18157 went in?  Because this and that
> bug had the same ICE but are really different bugs.
> 
Tried with gcc version 4.0.1 20050630 (prerelease) (without the patch) and the
current head (with the patch).I see the same ICE for both before and after your
patch in "wrf"(spec 2006). Tried with gcc version 4.0.1 20050630 (prerelease)
(without the patch) and the current head (with the patch).

- Dwarak


-- 

dwarak dot rajagopal at amd dot com changed:

   What|Removed |Added

 CC|            |dwarak dot rajagopal at amd
       ||dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20244