[Bug tree-optimization/45241] [4.5/4.6 Regression] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-30 Thread changpeng dot fang at amd dot com
--- Comment #9 from changpeng dot fang at amd dot com 2010-08-30 16:37 --- Review approval for the trunk: http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00931.html Review Approval for 4.5 branch: http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02112.html -- http://gcc.gnu.org/bugzilla

[Bug tree-optimization/45241] [4.5/4.6 Regression] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-30 Thread changpeng dot fang at amd dot com
--- Comment #10 from changpeng dot fang at amd dot com 2010-08-30 16:39 --- r163207 - in /trunk/gcc: ChangeLog testsuite/Ch... * From: cfang at gcc dot gnu dot org * To: gcc-cvs at gcc dot gnu dot org * Date: Thu, 12 Aug 2010 22:18:34 - * Subject: r163207

[Bug tree-optimization/45241] [4.5/4.6 Regression] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-30 Thread changpeng dot fang at amd dot com
--- Comment #11 from changpeng dot fang at amd dot com 2010-08-30 16:40 --- r163286 - in /branches/gcc-4_5-branch/gcc: Chan... * From: cfang at gcc dot gnu dot org * To: gcc-cvs at gcc dot gnu dot org * Date: Mon, 16 Aug 2010 21:02:30 - * Subject: r163286

[Bug tree-optimization/45241] [4.5/4.6 Regression] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-30 Thread changpeng dot fang at amd dot com
--- Comment #12 from changpeng dot fang at amd dot com 2010-08-30 16:41 --- Fixed! -- changpeng dot fang at amd dot com changed: What|Removed |Added Status

[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

2010-08-24 Thread changpeng dot fang at amd dot com
--- Comment #5 from changpeng dot fang at amd dot com 2010-08-24 22:13 --- For the test case in comment #2, if we don't vectorize the loop, the unroll_factor is incorrectly determined as 1, and insns-to-prefetch ratio (4) will then prevent prefetching, and thus no performance

[Bug tree-optimization/45260] [4.5/4.6 Regression] g++4.5: -prefetch-loop-arrays internal compiler error: in verify_expr, at tree-cfg.c:2541

2010-08-23 Thread changpeng dot fang at amd dot com
--- Comment #6 from changpeng dot fang at amd dot com 2010-08-23 18:59 --- Committed to trunk as Revision: 163475: http://gcc.gnu.org/ml/gcc-cvs/2010-08/msg00688.html Committed to 4.5 branch as Revision: 163483 http://gcc.gnu.org/ml/gcc-cvs/2010-08/msg00696.html -- http

[Bug c/45389] New: CPU2006 cactusADM: gcc 4.6 15% regression from 4.5

2010-08-23 Thread changpeng dot fang at amd dot com
gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45389

[Bug c/45390] New: CPU2006 434.zeusmp: gcc 4.6 7% regression from gcc 4.6

2010-08-23 Thread changpeng dot fang at amd dot com
Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45390

[Bug target/45391] New: CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

2010-08-23 Thread changpeng dot fang at amd dot com
prefetching of vectorized loop Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http

[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

2010-08-23 Thread changpeng dot fang at amd dot com
--- Comment #2 from changpeng dot fang at amd dot com 2010-08-24 00:03 --- float f (float *x, float *y, float *z, unsigned n) { float ret = 0.0; unsigned i; for (i = 0; i n; i++) { float diff = x[i] - y[i]; ret -= diff * diff * z[i]; } return ret

[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

2010-08-23 Thread changpeng dot fang at amd dot com
--- Comment #3 from changpeng dot fang at amd dot com 2010-08-24 00:22 --- I checked with open64 and did not find any regression. And for the above testcase, open64 generated 3 non-temporal prefetches. As a result, I am guessing that we are just unlucky that the prefetch kicks out

[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop

2010-08-23 Thread changpeng dot fang at amd dot com
--- Comment #4 from changpeng dot fang at amd dot com 2010-08-24 00:46 --- Ooops, the open64 generated code posted in last comment is for non-vectorized loop, the vectorized one is similar: .LBB23_f: .loc1 7 0 movups 0(%r10),%xmm3# [0] id:65

[Bug tree-optimization/45260] [4.5/4.6 Regression] g++4.5: -prefetch-loop-arrays internal compiler error: in verify_expr, at tree-cfg.c:2541

2010-08-20 Thread changpeng dot fang at amd dot com
--- Comment #5 from changpeng dot fang at amd dot com 2010-08-20 22:48 --- I have a fix: http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01625.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45260

[Bug middle-end/44206] [4.6 Regression] ICE: Inline clone with address taken

2010-08-18 Thread changpeng dot fang at amd dot com
--- Comment #3 from changpeng dot fang at amd dot com 2010-08-18 19:43 --- *** Bug 45269 has been marked as a duplicate of this bug. *** -- changpeng dot fang at amd dot com changed: What|Removed |Added

[Bug c++/45269] CPU2006 450.soplex: verify_cgraph_node failed with -fprofile-generate

2010-08-18 Thread changpeng dot fang at amd dot com
--- Comment #2 from changpeng dot fang at amd dot com 2010-08-18 19:43 --- http://gcc.gnu.org/ml/gcc-cvs/2010-05/msg00406.html Verified. If I back out the above change, the bug goes away. So it is a duplicate of bug 44206 *** This bug has been marked as a duplicate of 44206

[Bug tree-optimization/45260] [4.5/4.6 Regression] g++4.5: -prefetch-loop-arrays internal compiler error: in verify_expr, at tree-cfg.c:2541

2010-08-16 Thread changpeng dot fang at amd dot com
--- Comment #4 from changpeng dot fang at amd dot com 2010-08-16 22:39 --- This bug should be related to VIEW_CONVERT_EXPR. If I use the following statement to filter the prefetch, the bug will go away: if (contains_view_convert_expr_p (ref)) return false; Otherwise

[Bug c/45268] New: CPU2006 458.sjeng: type mismatch in array reference with -fwhole-program -combine

2010-08-12 Thread changpeng dot fang at amd dot com
Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla

[Bug c++/45269] New: CPU2006 450.soplex: verify_cgraph_node failed with -fprofile-generate

2010-08-12 Thread changpeng dot fang at amd dot com
: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45269

[Bug c/45270] New: CPU2006 435.gromacs: Segmentation fault with -fprofile-generate

2010-08-12 Thread changpeng dot fang at amd dot com
at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45270

[Bug tree-optimization/45260] [4.5/4.6 Regression] g++4.5: -prefetch-loop-arrays internal compiler error: in verify_expr, at tree-cfg.c:2541

2010-08-11 Thread changpeng dot fang at amd dot com
--- Comment #3 from changpeng dot fang at amd dot com 2010-08-12 00:38 --- (In reply to comment #2) It was caused by revision 153878: http://gcc.gnu.org/ml/gcc-cvs/2009-11/msg00094.html I think the same patch was also committed to 4.4 branch. Maybe some prefetch work(s) in 4.5

[Bug tree-optimization/45241] [4.5/4.6 Regression] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-10 Thread changpeng dot fang at amd dot com
--- Comment #7 from changpeng dot fang at amd dot com 2010-08-10 21:44 --- (In reply to comment #5) (In reply to comment #1) This patch should be a valid fix, because the recognition of the dot_prod pattern is known to be fail at this point if the stmt is outside the loop. (I am

[Bug tree-optimization/45239] New: CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-09 Thread changpeng dot fang at amd dot com
org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45239

[Bug tree-optimization/45241] New: CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-09 Thread changpeng dot fang at amd dot com
org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45241

[Bug tree-optimization/45241] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-09 Thread changpeng dot fang at amd dot com
--- Comment #1 from changpeng dot fang at amd dot com 2010-08-09 17:52 --- This patch should be a valid fix, because the recognition of the dot_prod pattern is known to be fail at this point if the stmt is outside the loop. (I am not sure whether we should not see this case

[Bug tree-optimization/45022] No prefetch for the vectorized loop

2010-07-29 Thread changpeng dot fang at amd dot com
--- Comment #4 from changpeng dot fang at amd dot com 2010-07-29 19:14 --- (In reply to comment #1) The misaligned indirect-refs will vanish soon. I saw your patch that remove ALIGNED_INDIRECT_REF. Do you also plan to remove MISALIGNED_INDIRECT_REF? Thanks. -- http

[Bug tree-optimization/45021] Redundant prefetches for some loops (vectorizer produced ones too)

2010-07-28 Thread changpeng dot fang at amd dot com
--- Comment #4 from changpeng dot fang at amd dot com 2010-07-28 18:22 --- Andrew's example is exactly what the prefetch sees for the test case (in the bug description). Unfortunately, the prefetch pass could not recognize that vect_pa.6_24 and vect_pa.20_38 are exactly the same

[Bug tree-optimization/45021] Redundant prefetches for some loops (vectorizer produced ones too)

2010-07-28 Thread changpeng dot fang at amd dot com
--- Comment #5 from changpeng dot fang at amd dot com 2010-07-28 18:28 --- Thing is a little complicate if we change the code to: a[i] = a[i+1] + beta * b[i]; The prefetch pass want to group a[i] and a[i+1], i.e. they have the same base address with an offset of 4 bytes. -- http

[Bug tree-optimization/45022] No prefetch for the vectorized loop

2010-07-22 Thread changpeng dot fang at amd dot com
--- Comment #2 from changpeng dot fang at amd dot com 2010-07-22 20:52 --- (In reply to comment #1) The misaligned indirect-refs will vanish soon. From the prefetching point of view, is there any reason that we can not prefetch for mis-aligned or indirect refs? I understand

[Bug tree-optimization/45021] New: Redundant prefetches for the vectorized loop

2010-07-21 Thread changpeng dot fang at amd dot com
: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45021

[Bug tree-optimization/45022] New: No prefetch for the vectorized loop

2010-07-21 Thread changpeng dot fang at amd dot com
Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45022

[Bug tree-optimization/45021] Redundant prefetches for the vectorized loop

2010-07-21 Thread changpeng dot fang at amd dot com
--- Comment #1 from changpeng dot fang at amd dot com 2010-07-21 18:26 --- The direct reason is that prefetching could not differentiate the base addresses of the vectorized load and store (of a[i]): *vect_pa.6_24 *vect_pa.19_37 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45021

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-07-21 Thread changpeng dot fang at amd dot com
--- Comment #23 from changpeng dot fang at amd dot com 2010-07-21 21:30 --- Fixed -- changpeng dot fang at amd dot com changed: What|Removed |Added Status

[Bug tree-optimization/44955] New: over-prefetched for arrays of complex number

2010-07-15 Thread changpeng dot fang at amd dot com
-prefetched for arrays of complex number Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang

[Bug tree-optimization/44955] over-prefetched for arrays of complex number

2010-07-15 Thread changpeng dot fang at amd dot com
--- Comment #1 from changpeng dot fang at amd dot com 2010-07-15 17:20 --- This is a piece of code that shows the two prefetches for b. mulss %xmm4, %xmm5 addq$8, %rdx prefetcht0 96(%r11) prefetcht0 100(%r11) subss %xmm2, %xmm1

[Bug tree-optimization/44794] pre- and post-loops should not be unrolled.

2010-07-14 Thread changpeng dot fang at amd dot com
--- Comment #4 from changpeng dot fang at amd dot com 2010-07-15 01:50 --- Created an attachment (id=21205) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21205action=view) Do not unroll pre and post loops I did a quick test on polyhedron before and after applying the preliminary

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-07-08 Thread changpeng dot fang at amd dot com
--- Comment #20 from changpeng dot fang at amd dot com 2010-07-09 01:59 --- I submitted a patch for review to completely fix the problem. The patch is an extension to Christian's speedup.patch. It splits the cost analysis into three small functions and quits further prefetching

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-07-07 Thread changpeng dot fang at amd dot com
--- Comment #19 from changpeng dot fang at amd dot com 2010-07-07 19:00 --- (In reply to comment #18) Changpeng, should this PR be closed now? No. I am still looking at the dependence computation cost. I just found the most of the time is spent in memory allocation and freeing

[Bug tree-optimization/44794] pre- and post-loops should not be unrolled.

2010-07-06 Thread changpeng dot fang at amd dot com
--- Comment #2 from changpeng dot fang at amd dot com 2010-07-06 17:58 --- We also need to handle the post loop of unrolling. Suppose the unroll_factor is 16, then the post-loop should have up to 15 iterations. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

[Bug tree-optimization/44794] pre- and post-loops should not be unrolled.

2010-07-06 Thread changpeng dot fang at amd dot com
--- Comment #3 from changpeng dot fang at amd dot com 2010-07-06 18:35 --- Here is the impact of loop unrolling on the compilation time and code size on polyhedron test_fpu.f90: -O3 -ftree-vectorize -fno-prefetch-loop-arrays -fno-unroll-loops: timing: 12.62s, size: 67069 bytes -O3

[Bug tree-optimization/44794] New: pre- and post-loops should not be unrolled.

2010-07-02 Thread changpeng dot fang at amd dot com
fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-07-02 Thread changpeng dot fang at amd dot com
--- Comment #17 from changpeng dot fang at amd dot com 2010-07-02 23:58 --- (In reply to comment #15) I have opened PR44794 for the unrolling of pre- and post-loop issue. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-30 Thread changpeng dot fang at amd dot com
--- Comment #15 from changpeng dot fang at amd dot com 2010-07-01 00:34 --- Unrolling of the peeled loop is partially the reason for test_fpu.f90 compilation time and code size increase. Vectorization peeled a few iteration of the the loop, the prefetching and unrolling passes does

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-29 Thread changpeng dot fang at amd dot com
--- Comment #13 from changpeng dot fang at amd dot com 2010-06-30 00:23 --- Here is the current status of this work: patch1: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02956.html patch2: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg03049.html On my system with -O3 zero_sized_1.f90

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-29 Thread changpeng dot fang at amd dot com
--- Comment #14 from changpeng dot fang at amd dot com 2010-06-30 00:36 --- (In reply to comment #7) A good chunk of time seems to be spent in the RTL loop unroller, triggered by array prefetching (testing with -O3 -funroll-loops). Otherwise it might as well be just excessive code

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-28 Thread changpeng dot fang at amd dot com
--- Comment #11 from changpeng dot fang at amd dot com 2010-06-29 00:07 --- I have a patch that partially fixes the problem: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02956.html Note that for this test case, the compile time doubled even though I don't compute the miss rate at all

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-28 Thread changpeng dot fang at amd dot com
--- Comment #12 from changpeng dot fang at amd dot com 2010-06-29 00:49 --- Created an attachment (id=21034) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21034action=view) Early return in miss rate computation The attached patch improves the computation of miss rate. We can stop

[Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling

2010-06-25 Thread changpeng dot fang at amd dot com
--- Comment #4 from changpeng dot fang at amd dot com 2010-06-25 17:08 --- (In reply to comment #3) Created an attachment (id=21001) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21001action=view) [edit] Potential fix for compile time regression Here is a potential fix. We

[Bug tree-optimization/44503] control flow in the middle of basic block with -fprefetch-loop-arrays

2010-06-14 Thread changpeng dot fang at amd dot com
--- Comment #3 from changpeng dot fang at amd dot com 2010-06-14 18:28 --- Actually, the prefetching is for the following loop: for (i = 0; i p[2]; i++) q[i] = 0; I do not understand why unrolling of this loop affects other part of the program that has longjmp. -- http

[Bug tree-optimization/44503] control flow in the middle of basic block with -fprefetch-loop-arrays

2010-06-14 Thread changpeng dot fang at amd dot com
--- Comment #4 from changpeng dot fang at amd dot com 2010-06-14 22:22 --- There is nothing wrong in the prefetch itself. The problem is __builtin_prefetch call used for prefetch instruction. Whenever, there is a non-local lable in the current function, the __builtin_prefetch

[Bug c/44503] New: control flow in the middle of basic block with -fprefetch-loop-arrays

2010-06-11 Thread changpeng dot fang at amd dot com
Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id

[Bug c/44503] control flow in the middle of basic block with -fprefetch-loop-arrays

2010-06-11 Thread changpeng dot fang at amd dot com
--- Comment #1 from changpeng dot fang at amd dot com 2010-06-11 16:32 --- Created an attachment (id=20894) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20894action=view) prefetching for the while loop? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44503

[Bug tree-optimization/44503] control flow in the middle of basic block with -fprefetch-loop-arrays

2010-06-11 Thread changpeng dot fang at amd dot com
--- Comment #2 from changpeng dot fang at amd dot com 2010-06-11 18:45 --- Bug 39398 looks similar but that one seems with except handling instead of setjmp. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44503

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-08 Thread changpeng dot fang at amd dot com
--- Comment #21 from changpeng dot fang at amd dot com 2010-06-08 16:23 --- Just for the record, non-constant step prefetching improves 459.GemsFDTD by 5.5% (under -O3 + prefetch) on amd-linux64 systems. And the gains are from the following set of loops: NFT.fppized.f90:1268

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-07 Thread changpeng dot fang at amd dot com
--- Comment #14 from changpeng dot fang at amd dot com 2010-06-07 18:27 --- Here is the current status of my investigation: (1) 465.tonto regression (~9%): The regressions mainly comes from loops which have array references with both constant (prefetch_mod = 8) and non-constant

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-07 Thread changpeng dot fang at amd dot com
--- Comment #15 from changpeng dot fang at amd dot com 2010-06-07 18:30 --- Created an attachment (id=20860) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20860action=view) Don't consider effect of unrolling in the computation of insn-to-prefetch ratio -- http://gcc.gnu.org

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-07 Thread changpeng dot fang at amd dot com
--- Comment #16 from changpeng dot fang at amd dot com 2010-06-07 18:32 --- Created an attachment (id=20861) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20861action=view) Limit non-constant step prefetching only to the innermost loops -- http://gcc.gnu.org/bugzilla

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-07 Thread changpeng dot fang at amd dot com
--- Comment #17 from changpeng dot fang at amd dot com 2010-06-07 18:37 --- (In reply to comment #15) Created an attachment (id=20860) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20860action=view) [edit] Don't consider effect of unrolling in the computation of insn-to-prefetch

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-07 Thread changpeng dot fang at amd dot com
--- Comment #19 from changpeng dot fang at amd dot com 2010-06-07 22:30 --- Created an attachment (id=20862) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20862action=view) Account prefetch_mod and unroll_factor for the computation of the prefetch count Ooops. Attached a wrong

[Bug tree-optimization/43529] G++ doesn't optimize away empty loop when index is a double

2010-06-04 Thread changpeng dot fang at amd dot com
--- Comment #2 from changpeng dot fang at amd dot com 2010-06-04 23:15 --- Interesting! What's the difference between 17 and 18? int main() { double i; for(i=0; i18; i+=1); /* gcc -O3, empty loop not removed */ } int main() { double i; for(i=0; i17

[Bug tree-optimization/43529] G++ doesn't optimize away empty loop when index is a double

2010-06-04 Thread changpeng dot fang at amd dot com
--- Comment #3 from changpeng dot fang at amd dot com 2010-06-04 23:29 --- (In reply to comment #2) Interesting! What's the difference between 17 and 18? int main() { double i; for(i=0; i18; i+=1); /* gcc -O3, empty loop not removed */ } The funny thing occurs

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-01 Thread changpeng dot fang at amd dot com
--- Comment #11 from changpeng dot fang at amd dot com 2010-06-01 17:40 --- (In reply to comment #10) Created an attachment (id=20783) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20783action=view) [edit] experimental patch to have separate values

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-06-01 Thread changpeng dot fang at amd dot com
--- Comment #13 from changpeng dot fang at amd dot com 2010-06-01 19:59 --- (In reply to comment #12) Ok. So I will let you continue to look into that and wait for your results? Do you have any feedback on separate.patch and its influence on performance? + for (; groups

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-28 Thread changpeng dot fang at amd dot com
--- Comment #6 from changpeng dot fang at amd dot com 2010-05-28 16:46 --- (In reply to comment #4) Created an attachment (id=20767) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20767action=view) [edit] Patch that makes loop invariant prefetches backend specfic Actually, I

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-28 Thread changpeng dot fang at amd dot com
--- Comment #7 from changpeng dot fang at amd dot com 2010-05-28 16:56 --- (In reply to comment #5) An alternative approach might be have different values for prefetch-min-insn-to-mem-ratio and min-insn-to-prefetch-ratio depending on constant/non-constant step size. It may

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-28 Thread changpeng dot fang at amd dot com
--- Comment #8 from changpeng dot fang at amd dot com 2010-05-28 18:30 --- (In reply to comment #4) Created an attachment (id=20767) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20767action=view) [edit] Patch that makes loop invariant prefetches backend specfic Three

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-28 Thread changpeng dot fang at amd dot com
--- Comment #9 from changpeng dot fang at amd dot com 2010-05-28 18:36 --- (In reply to comment #8) Looks like this is a fix to the regressions. That is, the regressions are actually caused by the wrong calculation. This bug could be considered fixed, even though performance tuning

[Bug middle-end/44297] New: Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-27 Thread changpeng dot fang at amd dot com
dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-27 Thread changpeng dot fang at amd dot com
--- Comment #1 from changpeng dot fang at amd dot com 2010-05-27 20:49 --- The regressions are most likely from the patch that added non-constant step prefetching: * From: Andreas Krebbel krebbel at linux dot vnet dot ibm dot com * To: Christian Borntraeger borntraeger at de

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-27 Thread changpeng dot fang at amd dot com
--- Comment #2 from changpeng dot fang at amd dot com 2010-05-27 20:55 --- To me, non-constant step prefetching seems not fit into the existing prefetching framework. non-constant stride prevent any reuse analysis, and thus prefetching is kind of blindly. -- http://gcc.gnu.org

[Bug middle-end/44297] Big spec cpu2006 prefetch regressions on gcc 4.6 on x86

2010-05-27 Thread changpeng dot fang at amd dot com
--- Comment #3 from changpeng dot fang at amd dot com 2010-05-27 23:51 --- I did a quick look at 434.zeusmp and found that prefetching for the following simple loop is responsible: linpck.f: 131: c ccode for increment not equal to 1 c ix = 1 smax = abs(sx(1

[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion

2010-05-24 Thread changpeng dot fang at amd dot com
--- Comment #9 from changpeng dot fang at amd dot com 2010-05-24 22:47 --- (In reply to comment #8) -fgraphite-identity does iteration splitting for this case. Do you know why it could not be vectorized after iteration range splitting? -- http://gcc.gnu.org/bugzilla

[Bug middle-end/44185] [4.6 regression] New prefetch test failures

2010-05-21 Thread changpeng dot fang at amd dot com
--- Comment #6 from changpeng dot fang at amd dot com 2010-05-21 21:36 --- (In reply to comment #5) The fix introduced: FAIL: gcc.dg/tree-ssa/prefetch-7.c scan-assembler-times movnti 18 FAIL: gcc.dg/tree-ssa/prefetch-7.c scan-tree-dump-times optimized ={nt} 18 on Linux/ia32

[Bug middle-end/44185] [4.6 regression] New prefetch test failures

2010-05-18 Thread changpeng dot fang at amd dot com
--- Comment #2 from changpeng dot fang at amd dot com 2010-05-18 19:39 --- I have a patch to fix the test cases: http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01359.html For prefetch-6.c, patch http://gcc.gnu.org/ml/gcc-cvs/2010-05/msg00567.html applies the insn to prefetch ratio

[Bug tree-optimization/43425] gcc should vectorize this loop by substitution

2010-05-07 Thread changpeng dot fang at amd dot com
--- Comment #3 from changpeng dot fang at amd dot com 2010-05-07 21:33 --- I just found that the test case in the same as (similar to) bug 35229. The subject of this bug is wrong. Scalar expansion is not appropriate for this case. Actually the loop can be transform to: void foo(int n

[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion

2010-05-07 Thread changpeng dot fang at amd dot com
--- Comment #7 from changpeng dot fang at amd dot com 2010-05-07 21:41 --- (In reply to comment #4) (In reply to comment #3) Subject: Re: gcc should vectorize this loop through iteration range splitting You mean that the problem is the if-conversion of the stores a[i

[Bug tree-optimization/43543] New: Reorder the statements in the loop can vectorize it

2010-03-26 Thread changpeng dot fang at amd dot com
Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43543

[Bug tree-optimization/42906] [4.5 Regression] Empty loop not removed

2010-03-18 Thread changpeng dot fang at amd dot com
--- Comment #20 from changpeng dot fang at amd dot com 2010-03-18 17:24 --- (In reply to comment #19) Splitting critical edges for CDDCE will probably also solve this problem. Richard. Yes, splitting critical edges is an enhancement to CDDCE and can solve this problem

[Bug c/43422] New: reversed loop is not vectorized

2010-03-18 Thread changpeng dot fang at amd dot com
Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43422

[Bug c/43423] New: gcc should vectorize this loop through iteration range splitting

2010-03-18 Thread changpeng dot fang at amd dot com
AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423

[Bug c/43425] New: enhance scalar expansion to vectorize this loop

2010-03-18 Thread changpeng dot fang at amd dot com
: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43425

[Bug c/43427] New: The loop is not interchanged and thus could not be vectorized.

2010-03-18 Thread changpeng dot fang at amd dot com
: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43427

[Bug tree-optimization/43428] New: vectorizer should invoke loop distribution to partially vectorize this loop

2010-03-18 Thread changpeng dot fang at amd dot com
: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43428

[Bug tree-optimization/32824] Missed reduction vectorizer after store to global is LIM'd

2010-03-17 Thread changpeng dot fang at amd dot com
--- Comment #8 from changpeng dot fang at amd dot com 2010-03-17 21:22 --- Created an attachment (id=20133) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20133action=view) patch with the testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32824

[Bug tree-optimization/42906] [4.5 Regression] Empty loop not removed

2010-03-16 Thread changpeng dot fang at amd dot com
--- Comment #17 from changpeng dot fang at amd dot com 2010-03-17 00:18 --- (In reply to comment #8) And int foo (int b, int j) { if (b) { int i; for (i = 0; i1000; ++i) ; j = b; } return j; } With j=b, b is not folded as a phi

[Bug tree-optimization/42906] [4.5 Regression] Empty loop not removed

2010-03-16 Thread changpeng dot fang at amd dot com
--- Comment #18 from changpeng dot fang at amd dot com 2010-03-17 00:22 --- (In reply to comment #16) In this case, the loop itself is empty and we can replace every use of the phi with n (exit value of the iv). I don't think that is done by remove_empty_loop anyways

[Bug middle-end/43238] GCC 4.5 ICE segfault on any -O flag

2010-03-02 Thread changpeng dot fang at amd dot com
--- Comment #4 from changpeng dot fang at amd dot com 2010-03-02 21:56 --- I have verified that the patch proposed in bug 43209 did fix this problem. I am going to checkin the change soon. Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43238

[Bug tree-optimization/43209] [4.5 Regression] ICE in try_improve_iv_set, at tree-ssa-loop-ivopts.c:5238

2010-03-01 Thread changpeng dot fang at amd dot com
--- Comment #5 from changpeng dot fang at amd dot com 2010-03-01 18:02 --- I have a fix for this problem. We should not decrease the cost if the cost is infinite. diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 74dadf7..9accda9 100644 --- a/gcc/tree-ssa-loop

[Bug middle-end/43182] GCC does not pull out a[0] from loop that changes a[i] for i:[1,n]

2010-02-26 Thread changpeng dot fang at amd dot com
--- Comment #4 from changpeng dot fang at amd dot com 2010-02-26 18:53 --- Here is another similar case but more general. We know that a(j) and a(i) never access the same memory location. intel ifort can vectorize this triangular loop: do 10 j = 1,n do 20 i = j+1, n

[Bug middle-end/43182] GCC does not pull out a[0] from loop that changes a[i] for i:[1,n]

2010-02-26 Thread changpeng dot fang at amd dot com
--- Comment #6 from changpeng dot fang at amd dot com 2010-02-26 19:06 --- Actually it is a totally different case. Please file a new bug with that case; though there might already be a bug about that one. I could not see the difference even though j is not a compile-time

[Bug middle-end/43182] New: gcc could not vectorize this simple loop (un-handled data-ref)

2010-02-25 Thread changpeng dot fang at amd dot com
Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43182

[Bug middle-end/43184] New: gcc could not vectorize floating point reduction statements

2010-02-25 Thread changpeng dot fang at amd dot com
Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43184

[Bug middle-end/43184] gcc could not vectorize floating point reduction statements

2010-02-25 Thread changpeng dot fang at amd dot com
--- Comment #2 from changpeng dot fang at amd dot com 2010-02-26 00:28 --- Subject: RE: gcc could not vectorize floating point reduction statements Thanks for pointing this out. Actually I am working on a fortran program and found the the reduction statement. The fortran code can

[Bug tree-optimization/42906] [4.5 Regression] Empty loop not removed

2010-02-16 Thread changpeng dot fang at amd dot com
--- Comment #15 from changpeng dot fang at amd dot com 2010-02-16 19:54 --- Hello, I am not sure whether CD-DCE can fully replace remove_empty_loop. However, I would prefer to keep remove_empty_loop pass. There are two reasons for this proposal: (1) remove_empty_loop was at level -O1