[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3
--- Comment #20 from dwarak dot rajagopal at amd dot com 2009-02-10 16:28 --- Paulo, (a) movaps (%rax, %rsi), %xmm0 addps %xmm0, %xmm1 (b) movaps %xmm0, %xmm1 addps (%rax, %rsi), %xmm1 Yes, case (a) is slightly better than case (b). It shouldn't matter much though in amdfam10(shanghai) processors. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824
[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3
--- Comment #13 from dwarak dot rajagopal at amd dot com 2009-02-06 22:35 --- > The patch makes GCC to generate movaps load followed by addps. On Core 2 it > speeds up the testcase from 7s to 6.2s so I guess it works as expected. > > The same however does not reproduce on AMD box and I am not sure if it is just > coincidence here or if really core preffer to split read-execute SSE > operations > (it is not recommended by the manual). fyi, AMD (amdfam10) prefers load-execute rather than having separate load and execute instructions. -- dwarak dot rajagopal at amd dot com changed: What|Removed |Added CC| |dwarak dot rajagopal at amd | |dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824
[Bug target/38201] -mfma/-mavx and -msse5/-msse4a don't work together
--- Comment #6 from dwarak dot rajagopal at amd dot com 2008-11-20 19:49 --- > Should we disallow such combinations? > Yes. - Dwarak -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38201
[Bug target/38201] -mfma/-mavx and -msse5/-msse4a don't work together
--- Comment #4 from dwarak dot rajagopal at amd dot com 2008-11-20 19:35 --- Yes, you are right. "-mfma -msse5" does not make sense. I mistook -mfma for -mfused-madd and hence the confusion. Hence these combinations (1 and 2) does not make sense. Thanks, Dwarak -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38201
[Bug target/38201] -mfma/-mavx and -msse5/-msse4a don't work together
--- Comment #1 from dwarak dot rajagopal at amd dot com 2008-11-20 16:48 --- 1) -msse5 includes -mfma switch (because fma is a part of sse5 instructions). So having "-msse5 -mfma" is same as having just "msse5", though you can just have -fma (without -msse5). 2) "-mavx -msse5" => Yes. This would not make sense since no machine can run this. - Dwarak (In reply to comment #0) > Both Intel FMA and AMD SSE5 support FMA. For -mfma, which enables > Intel FMA and is a dummy at the moment, or -msse5, we will > generate FMA instructions for > > double f; > > void > foo (double x, double y, double z) > { > f = x * y + z; > } > > What FMA should "-mfma -msse5" generate? Also currently, with > "-O2 -mavx -msse5", we generate > > foo: > fmaddsd %xmm2, %xmm1, %xmm0, %xmm0 > vmovsd %xmm0, f(%rip) > ret > > which won't run on any machines. For "-mfma -msse5" and > "-mavx -msse5", > > 1. Should these combinations be allowed? If allowed, > 2. Should the last option turn off the first one? > (In reply to comment #0) > Both Intel FMA and AMD SSE5 support FMA. For -mfma, which enables > Intel FMA and is a dummy at the moment, or -msse5, we will > generate FMA instructions for > > double f; > > void > foo (double x, double y, double z) > { > f = x * y + z; > } > > What FMA should "-mfma -msse5" generate? Also currently, with > "-O2 -mavx -msse5", we generate > > foo: > fmaddsd %xmm2, %xmm1, %xmm0, %xmm0 > vmovsd %xmm0, f(%rip) > ret > > which won't run on any machines. For "-mfma -msse5" and > "-mavx -msse5", > > 1. Should these combinations be allowed? If allowed, > 2. Should the last option turn off the first one? > -- dwarak dot rajagopal at amd dot com changed: What|Removed |Added CC||dwarak dot rajagopal at amd ||dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38201
[Bug middle-end/37851] [graphite] ICE in expand_scalar_variables_expr, at graphite.c:3617
--- Comment #1 from dwarak dot rajagopal at amd dot com 2008-10-16 15:00 --- Created an attachment (id=16509) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16509&action=view) Testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37851
[Bug middle-end/37851] New: [graphite] ICE in expand_scalar_variables_expr, at graphite.c:3617
gfortran -O2 -floop-block 939.f90 939.f90: In function 'solvep': 939.f90:6: internal compiler error: in expand_scalar_variables_expr, at graphite.c:3617 Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. This was tested on the graphite branch. The reduced testcase from polyhedron benchmark is attached. - Dwarak -- Summary: [graphite] ICE in expand_scalar_variables_expr, at graphite.c:3617 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dwarak dot rajagopal at amd dot com GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37851
[Bug middle-end/37828] [graphite] in expand_scalar_variables_expr, at graphite.c:3421
--- Comment #1 from dwarak dot rajagopal at amd dot com 2008-10-14 15:29 --- Created an attachment (id=16492) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16492&action=view) Testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37828
[Bug middle-end/37828] New: [graphite] in expand_scalar_variables_expr, at graphite.c:3421
g++ -c -floop-block -O3 bug_rep.cpp bug_rep.cpp: In function int sort_and_split(foo**, foo**&, long int): bug_rep.cpp:9: internal compiler error: in expand_scalar_variables_expr, at graphite.c:3421 Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. Testcase attached. - Dwarak -- Summary: [graphite] in expand_scalar_variables_expr, at graphite.c:3421 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dwarak dot rajagopal at amd dot com GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37828
[Bug rtl-optimization/33482] New: Invalid operands for pshifts with -O1
Testcase (test1.c): #include __m128i test_fn1(__m128i x) { __m128i y; return _mm_srl_epi64(x,_mm_set_epi32(0,0,31,31)); } gcc -O1 -c test1.c /tmp/ccBc8BO7.s: Assembler messages: /tmp/ccBc8BO7.s:7: Error: suffix or operands invalid for `psrlq' gcc -O1 -S test1.s test_fn1: .LFB501: psrlq $133143986207, %xmm0 ret As we can see that the operands are invalid for psrlq. Similar errors occur for other pshifts instructions such as psra*, psrl*, and psll*. A patch to fix this issue is as follows, basically having the right output modifier for these insns in sse.md. diff -purwN gcc-4.2.2-RC-20070909/gcc/config/i386/sse.md gcc-4.2.2-RC-20070909-fix/gcc/config/i386/sse.md --- gcc-4.2.2-RC-20070909/gcc/config/i386/sse.md2007-09-01 10:28:30.0 -0500 +++ gcc-4.2.2-RC-20070909-fix/gcc/config/i386/sse.md2007-09-17 16:33:26.790117000 -0500 @@ -2724,7 +2724,7 @@ [(set (match_operand:SSEMODE24 0 "register_operand" "=x") (ashiftrt:SSEMODE24 (match_operand:SSEMODE24 1 "register_operand" "0") - (match_operand:TI 2 "nonmemory_operand" "xn")))] + (match_operand:TI 2 "nonmemory_operand" "xN")))] "TARGET_SSE2" "psra\t{%2, %0|%0, %2}" [(set_attr "type" "sseishft") @@ -2734,7 +2734,7 @@ [(set (match_operand:SSEMODE248 0 "register_operand" "=x") (lshiftrt:SSEMODE248 (match_operand:SSEMODE248 1 "register_operand" "0") - (match_operand:TI 2 "nonmemory_operand" "xn")))] + (match_operand:TI 2 "nonmemory_operand" "xN")))] "TARGET_SSE2" "psrl\t{%2, %0|%0, %2}" [(set_attr "type" "sseishft") @@ -2744,7 +2744,7 @@ [(set (match_operand:SSEMODE248 0 "register_operand" "=x") (ashift:SSEMODE248 (match_operand:SSEMODE248 1 "register_operand" "0") - (match_operand:TI 2 "nonmemory_operand" "xn")))] + (match_operand:TI 2 "nonmemory_operand" "xN")))] "TARGET_SSE2" "psll\t{%2, %0|%0, %2}" [(set_attr "type" "sseishft") Is this ok? - Dwarak -- Summary: Invalid operands for pshifts with -O1 Product: gcc Version: 4.2.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dwarak dot rajagopal at amd dot com GCC build triplet: i686-unknown-linux-gnu GCC host triplet: i686-unknown-linux-gnu GCC target triplet: i686-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33482
[Bug debug/32914] New: ICE with -g option
Testcase "test-ice.cpp" #include #include const __m128i tmp={0,0}; g++ -O3 -g -c -msse2 test-ice.cpp I get the following error: test-ice.cpp:5: internal compiler error: in rtl_for_decl_init, at dwarf2out.c:10071 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. It compiles fine with "-g" option. This issue is there even in 4.3 mainline as well. I tracked this problem to this patch (http://gcc.gnu.org/ml/gcc-patches/2006-03/msg01567.html). Using the following temporary patch fixes this issue. I was basically reverting the line which causes this issue. --- dwarf2out.c.orig2007-07-25 10:29:24.790178000 -0500 +++ dwarf2out.c 2007-07-25 10:21:41.378252000 -0500 @@ -10065,8 +10065,8 @@ rtl_for_decl_init (tree init, tree type) immediate RTL constant, expand it now. We must be careful not to reference variables which won't be output. */ - else if (initializer_constant_valid_p (init, type) - && ! walk_tree (&init, reference_to_unused,NULL,NULL) +else if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type)) +&& initializer_constant_valid_p (init, type)) { rtl = expand_expr (init, NULL_RTX, VOIDmode, EXPAND_INITIALIZER); Thanks, - Dwarak -- Summary: ICE with -g option Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: debug AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dwarak dot rajagopal at amd dot com GCC build triplet: x86_64 GCC host triplet: x86_64 GCC target triplet: x86_64 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32914
[Bug middle-end/27313] Does not emit conditional moves for stores
--- Comment #3 from dwarak dot rajagopal at amd dot com 2006-04-25 19:07 --- Yes this is true. The example I posted was a simplest case where it fails. Below mmight be a typical case where you have to do two stores. int cmov(int* A ,int B ,int C ,int* D ,int* E ,int F ,int g) { int k,f; for (k = 1; k <= 1000; k++) { A[k] = B+C; D[k] = C; /* D[k] may alias with A[k] */ g = D[k-1] + E[k-1]; if (g > A[k]) A[k]=g; /* This is not converted to cmov*/ f += g; } return f; } In this case, you cannot reduce the number of stores (becasue D[k] may alias with A[k]) but you still want the if conversion to take place. I think it is good to have a mechanism to track if a memory is already been written in ifcvt. I'm not sure how it can be done at this level though. -Dwarak (In reply to comment #2) > The other way of getting this is to have the code converted so there is only > one store instead of two: > > int cmov(int* A ,int B ,int C ,int* D ,int* E ,int F ,int g) { > int k,f; > for (k = 1; k <= 1000; k++) { > int t = B+C; > g = D[k-1] + E[k-1]; > if (g > t) t=g; /* This is not converted to cmov*/ > A[K] = t; > f += g; > } > return f; > } > Which is most likely better anyways as one it is smaller. > -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27313
[Bug c/27313] New: Does not emit conditional moves for stores
int cmov(int* A ,int B ,int C ,int* D ,int* E ,int F ,int g) { int k,f; for (k = 1; k <= 1000; k++) { A[k] = B+C; g = D[k-1] + E[k-1]; if (g > A[k]) A[k]=g; /* This is not converted to cmov*/ f += g; } return f; } In the above code, the if-then statement is not converted to conditional move. It fails for "noce_mem_write_may_trap_or_fault_p ()" condition in "ifcvt.c" as it thinks that there is a chance for A[k] access to trap. The fact here is that in this case, A[k] will never trap because the A[k] is already been written once along the path from Entry to the "A[k] = g". So it is safe to convert it to a cmov statement. Though there might be two extra moves (mem to reg and vice versa) statement, it is still better to avoid the branch especially if it is unpredictable data like for the eg above. There is a typical case like this in Spec 2006 456.hmmer benchmark. Using contional moves will make the code faster by 13%-17%. -Dwarak -- Summary: Does not emit conditional moves for stores Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dwarak dot rajagopal at amd dot com GCC build triplet: x86_64 GCC host triplet: x86_64 GCC target triplet: x86_64 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27313
[Bug fortran/20244] internal compiler error: in fold_convert, at fold-const.c:2003
--- Comment #12 from dwarak dot rajagopal at amd dot com 2005-11-17 17:30 --- (In reply to comment #9) > (In reply to comment #8) > > I got the same ICE with one of the SPEC2006 candidate benchmarks on > > x86_64-linux-gnu. > > Was this before or after my fix for PR 18157 went in? Because this and that > bug had the same ICE but are really different bugs. > Tried with gcc version 4.0.1 20050630 (prerelease) (without the patch) and the current head (with the patch).I see the same ICE for both before and after your patch in "wrf"(spec 2006). Tried with gcc version 4.0.1 20050630 (prerelease) (without the patch) and the current head (with the patch). - Dwarak -- dwarak dot rajagopal at amd dot com changed: What|Removed |Added CC| |dwarak dot rajagopal at amd ||dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20244