[Bug tree-optimization/25371] -ftree-vectorize results in internal compiler error on AMD64
--- Comment #10 from sebastian dot pop at cri dot ensmp dot fr 2007-05-10 09:32 --- Subject: Re: -ftree-vectorize results in internal compiler error on AMD64 Zdenek's patch for cleaning the dataref analysis is also fixing this bug. http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00634.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25371
[Bug tree-optimization/29738] Missed constant propagation into loops
--- Comment #1 from sebastian dot pop at cri dot ensmp dot fr 2006-11-06 11:48 --- Subject: Re: Missed constant propagation into loops I think the problem is that i is a global variable and thus foo is potentially considered as modifying i. Have you tried void foo (void); void bar (void) { int i, j; i = 0; for (j = 0; j 1; j++) if (i) foo (); } This should be a simpler case that either is correctly simplified or I know what to do for making it work. For the original problem, why don't we propagate constants in # i_3 = V_MUST_DEF i_2; i = 0; # NONLOCAL.6_19 = PHI NONLOCAL.6_9(5), NONLOCAL.6_11(2); # i_18 = PHI i_7(5), i_3(2); i.e. replacing the use of i_3 with just the constant 0? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29738
[Bug middle-end/26900] Number of iterations not know for simple loop
--- Comment #11 from sebastian dot pop at cri dot ensmp dot fr 2006-06-19 07:50 --- Subject: Re: Number of iterations not know for simple loop I thought that this bug should have been fixed by now: http://gcc.gnu.org/ml/gcc-patches/2006-03/msg01749.html what is the status of that patch? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26900
[Bug tree-optimization/27331] [4.2 Regression] segfault in fold_convert with -ftree-vectorize
--- Comment #12 from sebastian dot pop at cri dot ensmp dot fr 2006-06-16 08:55 --- Subject: Re: [4.2 Regression] segfault in fold_convert with -ftree-vectorize rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: The patch is in predcom branch. I should have missed this patch on gcc-patches, sorry. The relevant part (fixing also another similar bug) is below. I think that this is better than what I have proposed, as there is no reason to keep the dr if we know that its access function is not known. So your patch is okay for trunk if it bootstraps, and pass regtest. Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27331
[Bug tree-optimization/27331] [4.2 Regression] segfault in fold_convert with -ftree-vectorize
--- Comment #9 from sebastian dot pop at cri dot ensmp dot fr 2006-06-15 13:19 --- Subject: Re: [4.2 Regression] segfault in fold_convert with -ftree-vectorize rakdver at gcc dot gnu dot org wrote: Ccing people responsible for data dependence analysis. Thanks, for the ping, this bug was not on my todo list. I'll take a look. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27331
[Bug tree-optimization/27331] [4.2 Regression] segfault in fold_convert with -ftree-vectorize
--- Comment #10 from sebastian dot pop at cri dot ensmp dot fr 2006-06-15 14:52 --- Subject: Re: [4.2 Regression] segfault in fold_convert with -ftree-vectorize You said that you had a fix in predcom, is that fix in your local tree, or have you sent a patch to gcc-patches? Here is a fix for this PR. I don't like the data-ref code that deals with component_refs, as it seems very fragile. Index: tree-data-ref.c === --- tree-data-ref.c (revision 114678) +++ tree-data-ref.c (working copy) @@ -1947,7 +1947,8 @@ create_data_ref (tree memref, tree stmt, object_analysis divided by the size of data type. */ if (!DR_BASE_OBJECT (dr) - || (TREE_CODE (memref) == COMPONENT_REF DR_NUM_DIMENSIONS (dr) == 1)) + || (TREE_CODE (memref) == COMPONENT_REF DR_NUM_DIMENSIONS (dr) == 1 + !automatically_generated_chrec_p (DR_ACCESS_FN (dr, 0 { tree access_fn; tree new_step; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27331
[Bug tree-optimization/27332] [4.2 Regression] ICE in try_interchange_loops with -ftree-loop-linear
--- Comment #3 from sebastian dot pop at cri dot ensmp dot fr 2006-05-10 12:26 --- Created an attachment (id=11429) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11429action=view) fix proposed fix. I didn't tested it other than making sure it fixed the bug. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27332
[Bug tree-optimization/26435] [4.1/4.2 regression] ICE with -O1 -ftree-loop-linear and higher optimization
--- Comment #5 from sebastian dot pop at cri dot ensmp dot fr 2006-05-10 12:32 --- Created an attachment (id=11430) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11430action=view) first shot at fixing this didn't tested the patch, other than with RUNTESTFLAGS=tree-ssa.exp and the current pr report. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26435
[Bug tree-optimization/19590] IVs with the same evolution not eliminated
--- Comment #12 from sebastian dot pop at cri dot ensmp dot fr 2006-04-10 09:14 --- Created an attachment (id=11235) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11235action=view) proposed fix This patch fixes the problem, but probably it is a more general optimization fix than the one described in the PR. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19590
[Bug bootstrap/27063] New: Fail to build gcc-core-4.2 snapshots
If you download only the gcc-core-4.2 snapshot, for example ftp://ftp.uvsq.fr/pub/gcc/snapshots/4.2-20060401/gcc-core-4.2-20060401.tar.bz2 and do a ../configure --enable-languages=c make make will stop complaining about some objC file that is not in the package: make[2]: *** No rule to make target `../../gcc/objc/objc-act.c', needed by `s-gtype'. Stop. -- Summary: Fail to build gcc-core-4.2 snapshots Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sebastian dot pop at cri dot ensmp dot fr http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27063
[Bug tree-optimization/26939] loop number of iterations analysis confused by what PRE did (PRE is doing its job)
--- Comment #12 from sebastian dot pop at cri dot ensmp dot fr 2006-04-02 08:12 --- Created an attachment (id=11184) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11184action=view) fix This patch fixes this problem. I'll bootntestncommit. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26939
[Bug middle-end/23411] [data deps] Distance on outer loops for self output deps
--- Comment #3 from sebastian dot pop at cri dot ensmp dot fr 2006-03-29 20:39 --- Fixed by the recent changes. -- sebastian dot pop at cri dot ensmp dot fr changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23411
[Bug middle-end/23412] [data deps] Overflow problem in Omega
--- Comment #3 from sebastian dot pop at cri dot ensmp dot fr 2006-03-29 20:40 --- Fixed on autovect. -- sebastian dot pop at cri dot ensmp dot fr changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23412
[Bug middle-end/23413] [data deps] Overflow problem in Omega
--- Comment #2 from sebastian dot pop at cri dot ensmp dot fr 2006-03-29 20:42 --- Fixed on autovect. -- sebastian dot pop at cri dot ensmp dot fr changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23413
[Bug tree-optimization/26719] [4.1/4.2 Regression] Computed (integer) table changes with -O
--- Comment #4 from sebastian dot pop at cri dot ensmp dot fr 2006-03-28 22:44 --- Created an attachment (id=11146) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11146action=view) first step with this patch scev returns (int) (char) {0,+,1} but then chrec_convert_aggressive is removing the casts and the result is just {0,+,1}. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26719
[Bug tree-optimization/26859] [4.2 Regression] ICE Segmentation Fault
--- Comment #9 from sebastian dot pop at cri dot ensmp dot fr 2006-03-29 00:27 --- Created an attachment (id=11147) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11147action=view) proposed fix this patch (not tested yet) fixes the problem: it avoids a division by zero. Part of the problem is that convert returns a step for which it sets the overflow flag when converting unsigned ffe to int -2. The patch resets the overflow flag that has no sense for the step of a sequence. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26859
[Bug tree-optimization/23821] [4.0/4.1/4.2 Regression] DOM and VRP creating harder to optimize code
--- Comment #8 from sebastian dot pop at cri dot ensmp dot fr 2006-02-13 08:45 --- Subject: Re: [4.0/4.1/4.2 Regression] DOM and VRP creating harder to optimize code This case reminds me the peeled chrec unification that I had to disable on autovect branch (I probably have to run the transformation as a stand alone pass outside the analyzer for not disturbing the user passes). In that case we're looking at a code like loop x = phi (0, a) a = phi (1, a + 1) endloop such that a simple transformation can make x a simple iv. This case is also quite important, as it occurs about 300 times during a bootstrap. Now for the current problem, we could run a pass just after loop_init for cleaning all these constructs. As suggested, we would have to build an equivalence relation either like VRP, or on demand. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23821
[Bug tree-optimization/25371] -ftree-vectorize results in internal compiler error on AMD64
--- Comment #4 from sebastian dot pop at cri dot ensmp dot fr 2006-01-05 16:32 --- (In reply to comment #3) Vectorizer's dump file can help. vect dump ends with: /home/seb/ex/pr25371.c:26: note: Access function of PHI: {0, +, 1}_4(get_loop_exit_condition if (i_99 D.1700_228) goto L54; else goto L55;) /home/seb/ex/pr25371.c:26: note: === vec_transform_loop === /home/seb/ex/pr25371.c:26: note: === vect_do_peeling_for_alignment === and the problem seems to come from the fact that in vect_create_addr_base_for_vector_ref, we're calling base_offset = force_gimple_operand (base_offset, new_stmt, false, dest); with a base_offset that contains a chrec: ((num_tD.1607 *) (long unsigned intD.4) {0, +, nD.1614_23}_4 * 16B); -- sebastian dot pop at cri dot ensmp dot fr changed: What|Removed |Added CC||sebastian dot pop at cri dot ||ensmp dot fr http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25371
[Bug tree-optimization/24670] New: VRP ICE in compare_name_with_value
This ICE occured during a compile of libfloat with mainline. Attached is a delta reduced test that produces the ICE when compiled with -O2. softfloat/bits64/softfloat.c: In function float128_rem: softfloat/bits64/softfloat.c:4483: internal compiler error: in compare_name_with_value, at tree-vrp.c:3064 __inline__ void shift128Right (int count, long long int *z1Ptr) { long long int z1; if (count == 0); else if (count 64); else z1 = (count 64) ? count : 0; *z1Ptr = z1; } float128_rem () { signed int expDiff; long long int aSig1; long long int sigMean1; if (-64 expDiff) shift128Right (-expDiff, aSig1); add128 (sigMean1); } -- Summary: VRP ICE in compare_name_with_value Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sebastian dot pop at cri dot ensmp dot fr http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24670
[Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
--- Comment #56 from sebastian dot pop at cri dot ensmp dot fr 2005-11-03 13:24 --- Subject: Re: [4.0/4.1 Regression] IV-OPTS is O(N^3) So, I'd suggest that we add a --param here for max-loop-nest-depth, and then just not do this stuff on deeper nests, or ignore all the outer loops in that case. Is that feasible? Mark, I think that this has already been fixed by the patch that added PARAM_SCEV_MAX_EXPR_SIZE: we give up when the expressions that we handle are longer than this size (if I remember correctly this defaults to 20). For the example in this bug, { int j0, j1, j2, j3, j4, j5, j6, j7, j8, j9, a; a = 0; for (j0 = 0; j0 2; j0 ++) for (j1 = j0; j1 2; j1 ++) for (j2 = j1; j2 2; j2 ++) for (j3 = j2; j3 2; j3 ++) for (j4 = j3; j4 2; j4 ++) for (j5 = j4; j5 2; j5 ++) for (j6 = j5; j6 2; j6 ++) for (j7 = j6; j7 2; j7 ++) for (j8 = j7; j8 2; j8 ++) for (j9 = j8; j9 2; j9 ++) a += j0 + j1 + j2 + j3 + j4 + j5 + j6 + j7 + j8 + j9; return a; } we would have a scev with 10 dimensions, i.e. an evolution in each dimension, thus an expression with about 10 components. Some numbers for mainline with -O2 (on amd64 with cpufreq): time ./gcc/cc1 -O2 ~/ex/pr18595_10.c real0m0.345s user0m0.089s sys 0m0.017s time ./gcc/cc1 -O2 ~/ex/pr18595_100.c tree PRE : 0.28 (21%) usr 0.02 (50%) sys 0.29 (20%) wall 7742 kB (59%) ggc tree loop bounds : 0.14 (10%) usr 0.00 ( 0%) sys 0.15 (10%) wall 0 kB ( 0%) ggc real0m1.776s user0m1.348s sys 0m0.050s time ./gcc/cc1 -O2 ~/ex/pr18595_200.c tree PRE : 1.53 (24%) usr 0.09 (60%) sys 1.63 (25%) wall 31149 kB (74%) ggc tree loop bounds : 1.21 (19%) usr 0.00 ( 0%) sys 1.21 (18%) wall 0 kB ( 0%) ggc real0m7.077s user0m6.363s sys 0m0.167s time ./gcc/cc1 -O2 ~/ex/pr18595_300.c tree PRE : 4.69 (28%) usr 0.21 (68%) sys 5.03 (29%) wall 69961 kB (80%) ggc tree loop bounds : 4.03 (24%) usr 0.00 ( 0%) sys 4.06 (23%) wall 0 kB ( 0%) ggc real0m18.126s user0m17.022s sys 0m0.333s Now, if we remove PRE, we can even compile the case with 1000 nested loops in a reasonable time: time ./gcc/cc1 -O2 -fno-tree-pre ~/ex/pr18595_300.c tree loop bounds : 0.09 ( 4%) usr 0.00 ( 0%) sys 0.24 ( 9%) wall 0 kB ( 0%) ggc real0m3.184s user0m2.147s sys 0m0.054s time ./gcc/cc1 -O2 -fno-tree-pre ~/ex/pr18595_1000.c tree loop bounds : 2.00 ( 8%) usr 0.00 ( 0%) sys 2.00 ( 8%) wall 0 kB ( 0%) ggc real0m27.171s user0m26.298s sys 0m0.169s So, I think that we can safely close this PR. Seb -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
[Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
--- Comment #57 from sebastian dot pop at cri dot ensmp dot fr 2005-11-03 15:02 --- Subject: Re: [4.0/4.1 Regression] IV-OPTS is O(N^3) sebastian dot pop at cri dot ensmp dot fr wrote: So, I think that we can safely close this PR. The previous numbers were with mainline + patch, otherwise mainline gets an ice as described in PR24483. I only now realized this, sorry. Seb -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
[Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
--- Comment #58 from sebastian dot pop at cri dot ensmp dot fr 2005-11-03 19:31 --- Subject: Re: [4.0/4.1 Regression] IV-OPTS is O(N^3) Here are again the numbers for mainline with no other patch: time ./gcc/cc1 -O2 ~/ex/pr18595_10.c real0m0.164s user0m0.116s sys 0m0.018s time ./gcc/cc1 -O2 ~/ex/pr18595_100.c tree PRE : 0.33 ( 3%) usr 0.01 ( 3%) sys 0.34 ( 3%) wall 7742 kB ( 2%) ggc tree loop bounds : 0.77 ( 7%) usr 0.03 (10%) sys 0.80 ( 7%) wall 53463 kB (17%) ggc tree iv optimization : 7.70 (74%) usr 0.19 (66%) sys 7.97 (73%) wall 220347 kB (71%) ggc real0m10.912s user0m10.448s sys 0m0.317s For 200 nested loops it begins to swap even with -fno-tree-pre, so we'd need a patch for limiting this. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
[Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
--- Comment #59 from sebastian dot pop at cri dot ensmp dot fr 2005-11-03 20:10 --- Subject: Re: [4.0/4.1 Regression] IV-OPTS is O(N^3) Here is a first patch that uses PARAM_SCEV_MAX_EXPR_SIZE for limiting the size of expressions that we want to handle. I will send it to gcc-patches once it bootstraps, etc. time ./gcc/cc1 -O2 ~/ex/pr18595_100.c tree loop bounds : 0.51 (17%) usr 0.03 (23%) sys 0.53 (17%) wall 32717 kB (29%) ggc tree iv optimization : 1.23 (41%) usr 0.06 (46%) sys 1.29 (40%) wall 64971 kB (58%) ggc real0m3.995s user0m3.015s sys 0m0.147s time ./gcc/cc1 -O2 ~/ex/pr18595_200.c tree loop bounds : 2.75 (23%) usr 0.05 (13%) sys 2.80 (22%) wall 72775 kB (22%) ggc tree iv optimization : 3.74 (31%) usr 0.20 (51%) sys 4.30 (33%) wall 210289 kB (64%) ggc real0m13.316s user0m12.163s sys 0m0.423s time ./gcc/cc1 -fno-tree-pre -O2 ~/ex/pr18595_300.c tree loop bounds : 0.16 ( 3%) usr 0.00 ( 0%) sys 0.15 ( 3%) wall 508 kB ( 1%) ggc tree iv optimization : 2.26 (47%) usr 0.08 (62%) sys 2.33 (46%) wall 74743 kB (84%) ggc real0m6.287s user0m4.849s sys 0m0.142s time ./gcc/cc1 -fno-tree-pre -O2 ~/ex/pr18595_1000.c tree loop bounds : 2.92 ( 4%) usr 0.00 ( 0%) sys 2.92 ( 4%) wall 913 kB ( 0%) ggc tree iv optimization : 37.78 (54%) usr 0.91 (65%) sys 41.15 (52%) wall 786700 kB (93%) ggc real1m19.611s user1m10.288s sys 0m1.403s Index: tree-scalar-evolution.c === --- tree-scalar-evolution.c (revision 106440) +++ tree-scalar-evolution.c (working copy) @@ -1982,7 +1982,8 @@ loop_closed_phi_def (tree var) /* Analyze all the parameters of the chrec that were left under a symbolic form, with respect to LOOP. CHREC is the chrec to instantiate. CACHE is the cache of already instantiated values. FLAGS modify the way chrecs are - instantiated. */ + instantiated. SIZE_EXPR is used for computing the size of the expression to + be instantiated, and to stop if it exceeds some limit. */ /* Values for FLAGS. */ enum @@ -1995,12 +1996,17 @@ enum }; static tree -instantiate_parameters_1 (struct loop *loop, tree chrec, int flags, htab_t cache) +instantiate_parameters_1 (struct loop *loop, tree chrec, int flags, htab_t cache, + int size_expr) { tree res, op0, op1, op2; basic_block def_bb; struct loop *def_loop; + /* Give up if the path is longer than the MAX that we allow. */ + if (size_expr++ PARAM_VALUE (PARAM_SCEV_MAX_EXPR_SIZE)) +return chrec_dont_know; + if (automatically_generated_chrec_p (chrec) || is_gimple_min_invariant (chrec)) return chrec; @@ -2068,7 +2074,7 @@ instantiate_parameters_1 (struct loop *l } else if (res != chrec_dont_know) - res = instantiate_parameters_1 (loop, res, flags, cache); + res = instantiate_parameters_1 (loop, res, flags, cache, size_expr); bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec)); @@ -2078,12 +2084,12 @@ instantiate_parameters_1 (struct loop *l case POLYNOMIAL_CHREC: op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec), - flags, cache); + flags, cache, size_expr); if (op0 == chrec_dont_know) return chrec_dont_know; op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec), - flags, cache); + flags, cache, size_expr); if (op1 == chrec_dont_know) return chrec_dont_know; @@ -2094,12 +2100,12 @@ instantiate_parameters_1 (struct loop *l case PLUS_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), - flags, cache); + flags, cache, size_expr); if (op0 == chrec_dont_know) return chrec_dont_know; op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), - flags, cache); + flags, cache, size_expr); if (op1 == chrec_dont_know) return chrec_dont_know; @@ -2110,12 +2116,12 @@ instantiate_parameters_1 (struct loop *l case MINUS_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), - flags, cache); + flags, cache, size_expr); if (op0 == chrec_dont_know) return chrec_dont_know; op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), - flags, cache); + flags, cache, size_expr); if (op1 == chrec_dont_know) return chrec_dont_know; @@ -2126,12 +2132,12 @@ instantiate_parameters_1 (struct loop *l case MULT_EXPR
[Bug tree-optimization/24262] [4.1 Regression] ICE: verify_ssa failed with -O -msse2 -ftree-vectorize
--- Comment #4 from sebastian dot pop at cri dot ensmp dot fr 2005-10-12 16:26 --- Subject: Re: [4.1 Regression] ICE: verify_ssa failed with -O -msse2 -ftree-vectorize irar at il dot ibm dot com wrote: Here scev analyzer calculates the evolution of 'D.1703_5 * 2 + i_15', where 'D.1703_5 = i_15/2'. Scev doesn't handle division, therefore for D.1703_5 we get unknown scev, but then it's combined with the rest of the expression, erroneously leading to {D.1703_5*2, +, 1}. I don't follow: unknown + something should be folded into unknown, not {D.1703_5*2, +, 1}_x: if D.1703_5 is defined in loop_x, it potentially has an evolution in loop_x. I'm thinking that a test: if (!expr_invariant_in_loop_p (loop, CHREC_LEFT (chrec))) then give up with this case, would solve this bug. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24262
[Bug tree-optimization/24262] [4.1 Regression] ICE: verify_ssa failed with -O -msse2 -ftree-vectorize
--- Comment #5 from sebastian dot pop at cri dot ensmp dot fr 2005-10-12 16:53 --- Subject: Re: [4.1 Regression] ICE: verify_ssa failed with -O -msse2 -ftree-vectorize Sebastian Pop wrote: if (!expr_invariant_in_loop_p (loop, CHREC_LEFT (chrec))) then give up with this case, would solve this bug. We already do this! By the way, why do we need the init == expr check? The following (not tested yet) patch solves this bug. Anybody remembers about why we have the extra check? *** tree-data-ref.c.~2.44.~ 2005-09-24 21:12:03.0 +0200 --- tree-data-ref.c 2005-10-12 18:54:38.0 +0200 *** *** 1124,1130 return false; init = initial_condition_in_loop_num (access_fn, loop-num); ! if (init == expr !expr_invariant_in_loop_p (loop, init)) /* Not enough information: may be not loop invariant. E.g., for a[b[i]], we get a[D], where D=b[i]. EXPR is D, its initial_condition is D, but it depends on i - loop's induction --- 1124,1130 return false; init = initial_condition_in_loop_num (access_fn, loop-num); ! if (!expr_invariant_in_loop_p (loop, init)) /* Not enough information: may be not loop invariant. E.g., for a[b[i]], we get a[D], where D=b[i]. EXPR is D, its initial_condition is D, but it depends on i - loop's induction -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24262
[Bug tree-optimization/23942] [4.1 Regression] loop problem / testcase takes very long time to compile
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-09-21 17:02 --- Subject: Re: [4.1 Regression] loop problem / testcase takes very long time to compile Random break stops things typically somewhere inside 140 nested calls in scev (follow_ssa_edge and friends). I seem to recall there is some backtracking involved, I will check. A patch around these lines should fix the problem: it limits the number of arcs that we walk before giving up. For the moment I'm returning didn't found a path from P to P, when we give up. We have to handle a don't know symbol for this case. I'll test a patch similar to the following tomorow. Index: Makefile.in === RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v retrieving revision 1.1541 diff -d -u -p -r1.1541 Makefile.in --- Makefile.in 14 Sep 2005 09:26:41 - 1.1541 +++ Makefile.in 21 Sep 2005 16:51:48 - @@ -767,7 +767,7 @@ TREE_SSA_LIVE_H = tree-ssa-live.h $(PART PRETTY_PRINT_H = pretty-print.h input.h $(OBSTACK_H) DIAGNOSTIC_H = diagnostic.h diagnostic.def $(PRETTY_PRINT_H) C_PRETTY_PRINT_H = c-pretty-print.h $(PRETTY_PRINT_H) $(C_COMMON_H) $(TREE_H) -SCEV_H = tree-scalar-evolution.h $(GGC_H) tree-chrec.h +SCEV_H = tree-scalar-evolution.h $(GGC_H) tree-chrec.h $(PARAMS_H) LAMBDA_H = lambda.h tree.h vec.h $(GGC_H) TREE_DATA_REF_H = tree-data-ref.h $(LAMBDA_H) VARRAY_H = varray.h $(MACHMODE_H) $(SYSTEM_H) coretypes.h $(TM_H) Index: tree-scalar-evolution.c === RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.c,v retrieving revision 2.38 diff -d -u -p -r2.38 tree-scalar-evolution.c --- tree-scalar-evolution.c 20 Sep 2005 07:09:20 - 2.38 +++ tree-scalar-evolution.c 21 Sep 2005 16:51:48 - @@ -251,6 +251,7 @@ Software Foundation, 51 Franklin Street, #include tree-scalar-evolution.h #include tree-pass.h #include flags.h +#include params.h static tree analyze_scalar_evolution_1 (struct loop *, tree, tree); static tree resolve_mixers (struct loop *, tree); @@ -1022,17 +1023,14 @@ select_loops_exit_conditions (struct loo /* Depth first search algorithm. */ -static bool follow_ssa_edge (struct loop *loop, tree, tree, tree *); +static bool follow_ssa_edge (struct loop *loop, tree, tree, tree *, int); /* Follow the ssa edge into the right hand side RHS of an assignment. Return true if the strongly connected component has been found. */ static bool -follow_ssa_edge_in_rhs (struct loop *loop, - tree at_stmt, - tree rhs, - tree halting_phi, - tree *evolution_of_loop) +follow_ssa_edge_in_rhs (struct loop *loop, tree at_stmt, tree rhs, + tree halting_phi, tree *evolution_of_loop, int limit) { bool res = false; tree rhs0, rhs1; @@ -1050,7 +1048,7 @@ follow_ssa_edge_in_rhs (struct loop *loo case NOP_EXPR: /* This assignment is under the form a_1 = (cast) rhs. */ res = follow_ssa_edge_in_rhs (loop, at_stmt, TREE_OPERAND (rhs, 0), - halting_phi, evolution_of_loop); + halting_phi, evolution_of_loop, limit); *evolution_of_loop = chrec_convert (TREE_TYPE (rhs), *evolution_of_loop, at_stmt); break; @@ -1063,7 +1061,7 @@ follow_ssa_edge_in_rhs (struct loop *loo case SSA_NAME: /* This assignment is under the form: a_1 = b_2. */ res = follow_ssa_edge - (loop, SSA_NAME_DEF_STMT (rhs), halting_phi, evolution_of_loop); + (loop, SSA_NAME_DEF_STMT (rhs), halting_phi, evolution_of_loop, limit); break; case PLUS_EXPR: @@ -1081,7 +1079,7 @@ follow_ssa_edge_in_rhs (struct loop *loo a = b + c. */ res = follow_ssa_edge (loop, SSA_NAME_DEF_STMT (rhs0), halting_phi, -evolution_of_loop); +evolution_of_loop, limit); if (res) *evolution_of_loop = add_to_evolution @@ -1093,7 +1091,7 @@ follow_ssa_edge_in_rhs (struct loop *loo { res = follow_ssa_edge (loop, SSA_NAME_DEF_STMT (rhs1), halting_phi, -evolution_of_loop); +evolution_of_loop, limit); if (res) *evolution_of_loop = add_to_evolution @@ -1109,7 +1107,7 @@ follow_ssa_edge_in_rhs (struct loop *loo a = b + */ res = follow_ssa_edge (loop, SSA_NAME_DEF_STMT (rhs0), halting_phi, -evolution_of_loop); +evolution_of_loop, limit); if (res) *evolution_of_loop = add_to_evolution (loop-num, chrec_convert (type_rhs
[Bug middle-end/23411] [data deps] Distance on outer loops for self output deps
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-09-14 12:38 --- Subject: Re: [data deps] Distance on outer loops for self output deps In this case neither implementation got the dependece right: there are bugs in both implementations. For the following nested loop: loop_1 loop_2 A[5] = ... end_loop_2 end_loop_1 BAD would answer: (1, 1), that would mean that for getting the same access we'd have to run loop_1 once *and* loop_2 once: this is false. BOP would answer: (0, 0), that would mean that neither loop_1 nor loop_2 carry dependences, in other words, both loops are parallel: this is false. The right answer is a set of distance vectors: (0, 1) and (1, 0). For getting to the same element in the array we have to run loop_1 once *or* loop_2 has to run once. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23411
[Bug tree-optimization/23511] [4.1 Regression] Segfault in fold_binary
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-08-22 12:34 --- The following patch fixes the problem. However I cannot boot'n'regtest it for the moment. I will commit it only tomorow once it is validated. seb *** tree-ssa-loop-niter.c.~2.39.~ 2005-08-21 12:48:06.0 +0200 --- tree-ssa-loop-niter.c 2005-08-22 14:30:17.0 +0200 *** *** 1460,1466 if (init == NULL_TREE || step == NULL_TREE || TREE_CODE (init) != INTEGER_CST ! || TREE_CODE (step) != INTEGER_CST) break; utype = unsigned_type_for (type); --- 1460,1468 if (init == NULL_TREE || step == NULL_TREE || TREE_CODE (init) != INTEGER_CST ! || TREE_CODE (step) != INTEGER_CST ! || TYPE_MIN_VALUE (type) == NULL_TREE ! || TYPE_MAX_VALUE (type) == NULL_TREE) break; utype = unsigned_type_for (type); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23511
[Bug c/8268] no compile time array index checking
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-08-22 12:38 --- (In reply to comment #14) (In reply to comment #9) If we really wanted to tackle this better a compile-time, we'd run a pass to look at all the ARRAY_REFs for those which have an out-of-range index. It wouldn't be terribly hard to stick one in just before we leave SSA form. I'll give this a try. Have a look at tree-data-ref.c:analyze_array_indexes The warning can be implemented in this function. seb -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8268
[Bug middle-end/23410] [4.1 Regression] FAIL: gcc.c-torture/execute/950612-1.c execution, at -Os and -O3
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-08-18 11:51 --- Reduced testcase is: unsigned long long f4 (unsigned long long diff) { return ((unsigned long long) ((signed long long) diff 0 ? -diff : diff)); } main () { int i; for (i = 0; i = 10; i++) if (f4 ((long long) -i) != i) abort (); exit (0); } And the problem is in VRP: Visiting statement: diff.0D.1338_6 = (long long intD.5) diffD.1337_4; (analyze_scalar_evolution (loop_nb = 1) (scalar = diff.0_6) (get_scalar_evolution (scalar = diff.0_6) (scalar_evolution = (long long int) {0, +, }_1)) (set_scalar_evolution (scalar = diff.0_6) (scalar_evolution = (long long int) {0, +, }_1)) ) (instantiate_parameters (loop_nb = 1) (chrec = (long long int) {0, +, }_1) (res = (long long int) {0, +, }_1)) Found new range for diff.0_6: [0, 0] The computation of the min bound is not correct. I'm working on a patch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23410
[Bug tree-optimization/23433] [4.1 Regression] ICE: tree check: expected real_cst, have integer_cst in const_binop, at fold-const.c:1512
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-08-17 23:23 --- Subject: Re: [4.1 Regression] ICE: tree check: expected real_cst, have integer_cst in const_binop, at fold-const.c:1512 I'm testing this patch on amd64 and i686. I will commit it once validated. Index: tree-chrec.c === RCS file: /cvs/gcc/gcc/gcc/tree-chrec.c,v retrieving revision 2.24 diff -d -u -p -r2.24 tree-chrec.c --- tree-chrec.c15 Aug 2005 12:26:07 - 2.24 +++ tree-chrec.c17 Aug 2005 23:01:09 - @@ -539,6 +539,9 @@ chrec_apply (unsigned var, if (dump_file (dump_flags TDF_DETAILS)) fprintf (dump_file, (chrec_apply \n); + if (TREE_CODE (x) == INTEGER_CST SCALAR_FLOAT_TYPE_P (type)) +x = build_real_from_int_cst (type, x); + if (evolution_function_is_affine_p (chrec)) { /* {a, +, b} (x) - a + b*x. */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23433
[Bug tree-optimization/23391] [4.1 regression] Tree checking failure due to scev
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-08-15 12:33 --- Subject: Re: [4.1 regression] Tree checking failure due to scev pinskia at gcc dot gnu dot org wrote: This is related to PR 19899 which was fixed. Yes, PR is related to PR19899, but same pattern occured in several places and the fix to PR19899 was not complete. Fixed now. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23391
[Bug tree-optimization/23391] [4.1 regression] Tree checking failure due to scev
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-08-15 12:35 --- Subject: Re: [4.1 regression] Tree checking failure due to scev Sebastian Pop wrote: Yes, PR is related to PR19899, but same pattern occured in several places and the fix to PR19899 was not complete. Fixed now. We'll probably need this fix for 4.0 branch as well... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23391
[Bug middle-end/23409] New: [meta-bug] data dependence analyzer (BAD vs. BOP)
This meta-bug tracks differences between Banerjee's Analyzer for Data-dependences (BAD) and Bill Pugh's Omega solver. -- Summary: [meta-bug] data dependence analyzer (BAD vs. BOP) Product: gcc Version: 3.3.1 Status: UNCONFIRMED Severity: normal Priority: P2 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sebastian dot pop at cri dot ensmp dot fr CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23409
[Bug middle-end/23411] New: [data deps] Distance on outer loops for self output deps
The most frequent case that shows up when bootstrapping autovect branch with BOOT_CFLAGS=-O2 -fcheck-data-deps is the following: Dist vectors from the first dependence analyzer: 10 Omega dist vectors are not the same: 00 Data dependence relation is: (Data Dep: access_fn_A: {0, +, 1}_3 access_fn_B: {0, +, 1}_3 (subscript iterations_that_access_an_element_twice_in_A: 0 last_conflict: scev_not_known; iterations_that_access_an_element_twice_in_B: 0 last_conflict: scev_not_known; (Subscript distance: 0 ) ) distance_vect: 00 direction_vect: == ) This is caused by a loop containing a data ref like the following: loop_2 loop_3 A[{0, +, 1}_3] = ... endloop_3 endloop_2 For this pattern, tree-data-ref.c says the following: /* There is a distance of 1 on all the outer loops: Example: there is a dependence of distance 1 on loop_1 for the array A. | loop_1 | A[5] = ... | endloop */ But now that Omega says that dist is (0, 0) I'm not sure anymore whether this is the standard meaning of distance vectors. AllenKennedy book states: Definition 2.9. Suppose that there is a dependence from statement S1 on iteration i of a loop nest and statement S2 on iteration j, then the dependence distance vector d(i,j) is defined as a vector of length n such that d(i,j) = j - i. -- Summary: [data deps] Distance on outer loops for self output deps Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sebastian dot pop at cri dot ensmp dot fr CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23411
[Bug middle-end/23409] [meta-bug] data dependence analyzer (BAD vs. BOP)
-- What|Removed |Added BugsThisDependsOn||23411 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23409
[Bug middle-end/23411] [data deps] Distance on outer loops for self output deps
-- What|Removed |Added OtherBugsDependingO||23409 nThis|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23411
[Bug middle-end/23412] New: [data deps] Overflow problem in Omega
This pattern shows something strange in the results of Omega. Looking at the step of array accesses it seems like Omega has just no mechanism to handle -1 evolutions i.e. 0 in unsigned with modulo arithmetics. This occurs in gcc/gcc/real.c during bootstrap on amd64-linux. Dist vectors from the first dependence analyzer: 11 Omega dist vectors are not the same: -10 Data dependence relation is: (Data Dep: access_fn_A: {1, +, 4294967295}_5 access_fn_B: {2, +, 4294967295}_5 (subscript iterations_that_access_an_element_twice_in_A: {0, +, 1}_1 last_conflict: 1 iterations_that_access_an_element_twice_in_B: {1, +, 1}_1 last_conflict: 1 (Subscript distance: 1 ) ) distance_vect: -10 direction_vect: -= ) -- Summary: [data deps] Overflow problem in Omega Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sebastian dot pop at cri dot ensmp dot fr CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23412
[Bug middle-end/23409] [meta-bug] data dependence analyzer (BAD vs. BOP)
-- What|Removed |Added BugsThisDependsOn||23412 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23409
[Bug middle-end/23412] [data deps] Overflow problem in Omega
-- What|Removed |Added OtherBugsDependingO||23409 nThis|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23412
[Bug middle-end/23413] New: [data deps] Overflow problem in Omega
This pattern shows something strange in the results of Omega. Looking at the step of array accesses it seems like Omega has just no mechanism to handle -1 evolutions i.e. 0 in unsigned with modulo arithmetics. This occurs in gcc/gcc/real.c during bootstrap on amd64-linux. Dist vectors from the first dependence analyzer: 11 Omega dist vectors are not the same: -10 Data dependence relation is: (Data Dep: access_fn_A: {1, +, 4294967295}_5 access_fn_B: {2, +, 4294967295}_5 (subscript iterations_that_access_an_element_twice_in_A: {0, +, 1}_1 last_conflict: 1 iterations_that_access_an_element_twice_in_B: {1, +, 1}_1 last_conflict: 1 (Subscript distance: 1 ) ) distance_vect: -10 direction_vect: -= ) -- Summary: [data deps] Overflow problem in Omega Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sebastian dot pop at cri dot ensmp dot fr CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23413
[Bug middle-end/23412] [data deps] Overflow problem in Omega
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-08-15 23:16 --- This one is also probably an overflow in Omega, but the dependence problem looks pretty simple... This occurs in gcc/gcc/omega.c on amd64-linux. Dist vectors from the first dependence analyzer: 110 Omega dist vectors are not the same: 0 10922 10922 Data dependence relation is: (Data Dep: access_fn_A: {1, +, 1}_7 access_fn_B: {1, +, 1}_7 (subscript iterations_that_access_an_element_twice_in_A: 0 last_conflict: scev_not_known; iterations_that_access_an_element_twice_in_B: 0 last_conflict: scev_not_known; (Subscript distance: 0 ) ) distance_vect: 0 10922 10922 direction_vect: =++ ) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23412
[Bug tree-optimization/23391] [4.1 regression] Tree checking failure due to scev
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-08-15 00:34 --- Subject: Re: [4.1 regression] Tree checking failure due to scev This patch should fix the problem. There are some more cases that use build_int_cst instead of build_real. I'll propose a more complete patch later. *** tree-scalar-evolution.c.~2.35.~ 2005-08-13 19:06:26.0 +0200 --- tree-scalar-evolution.c 2005-08-15 02:36:50.0 +0200 *** *** 1680,1686 opnd10 = TREE_OPERAND (opnd1, 0); chrec10 = analyze_scalar_evolution (loop, opnd10); chrec10 = chrec_convert (type, chrec10, at_stmt); ! res = chrec_fold_minus (type, build_int_cst (type, 0), chrec10); break; case MULT_EXPR: --- 1680,1688 opnd10 = TREE_OPERAND (opnd1, 0); chrec10 = analyze_scalar_evolution (loop, opnd10); chrec10 = chrec_convert (type, chrec10, at_stmt); ! res = chrec_fold_multiply (type, chrec10, SCALAR_FLOAT_TYPE_P (type) ! ? build_real (type, dconstm1) ! : build_int_cst_type (type, -1)); break; case MULT_EXPR: *** tree-chrec.c.~2.23.~2005-08-13 19:06:26.0 +0200 --- tree-chrec.c2005-08-15 02:20:51.0 +0200 *** *** 284,291 return build_polynomial_chrec (CHREC_VARIABLE (op1), chrec_fold_minus (type, op0, CHREC_LEFT (op1)), ! chrec_fold_multiply (type, CHREC_RIGHT (op1), ! build_int_cst_type (type, -1))); default: { --- 284,292 return build_polynomial_chrec (CHREC_VARIABLE (op1), chrec_fold_minus (type, op0, CHREC_LEFT (op1)), ! chrec_fold_multiply (type, CHREC_RIGHT (op1), SCALAR_FLOAT_TYPE_P (type) ! ? build_real (type, dconstm1) ! : build_int_cst_type (type, -1))); default: { -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23391
[Bug tree-optimization/22236] [4.1 Regression] wrong code for casts and scev
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-07-27 09:01 --- Subject: Re: [4.1 Regression] wrong code for casts and scev dberlin at dberlin dot org wrote: A sequence of unsigned char 1, 2, ..., 255 has to be converted to signed char that would wrap with -fwrapv: 1, 2, ..., 127, -128, ... So here is a patch that keeps the cast if the sequence wraps. You mean keep the cast at all, or keep it in the evolution describing the number of iterations in the loop. Sorry for not being clear. When chrec_convert fails, it simply returns fold_convert (type, chrec), and because the folder does not know about the chrecs at all it just returns a convert_expr (outer_type) {(inner_type)x, +, (inner_type)y}. If you mean keep it in the number of iterations of the loop, that seems wrong: Let's look at the case again void abort(void); static inline void foo (signed char a) { int b = a - 0x7F; if (b 1) abort(); } int main() { unsigned char b; for(b = 0;b 0xFF;b++) foo (b); } Note that foo doesn't *actually* change the value of the passed in parameter. The loop still iterates 0xFF times. Why is the cast affecting the number of iterations in the loop at all? We should still calculate the number of iterations to be 255 in any case. Once inlined, you have a loop with two exits: the first is the natural exit after 255 iterations, the second is a jump outside the loop to a bb that contains the abort function. If we have an evolution {(signed char) 0, +, (signed char) 1} - 127 can be folded to {(schar) -127, +, (schar) 1}, and the loop exit guarded by the if (b 1) becomes true for iteration 128. So, there are two exits: the first one exits after 255 iterations the second one exits after 128, and niter takes the minimum between these two estimations, that is 128. Now the failing step is that it is not possible to convert the unsigned sequence {0, +, 1} that stands for 0, 1, ..., 255, into the signed sequence {0, +, 1} that stands for 0, 1, ..., 127. Here, a part of the sequence has been lost in the translation. The solution here is to keep the cast around the first sequence: (schar) {(uchar)0, +, (uchar)1}. This is always safe with both wrapping and non-wrapping semantics. also, in *all* of the vectorizer testcases, number_of_iterations_in_loop should be determined to be n, an int. The code in those cases provably doesn't wrap (signed integer arithmetic doesn't wrap without -fwrapv), So why do we think it should wrap, when it's not casted at all? The scev_probably_wraps_p analyzer is still too weak for infering that the sequences used by the vectorizer don't wrap. Let's take the example from vect-77.c D.2035_21 = offD.2023_11 + iD.2026_30; D.2036_22 = (long unsigned intD.4) D.2035_21; D.2037_23 = D.2036_22 * 4; D.2038_24 = (aintD.2020 *) D.2037_23; D.2039_25 = ibD.2022_16 + D.2038_24; because you have this cast to unsigned and then the multiplication by 4 and an unknown initial value off_11, even if you know that the main index i_30 doesn't wrap, you still can chose an offset large enough for making D.2037_23 wrap. So we need some extra information about the range of offset if we want to convert the scev of D.2035_21. For the moment this is the point where the analyzer fails to properly convert the scev, and produces: (set_scalar_evolution (scalar = D.2036_22) (scalar_evolution = (long unsigned int) {off_11, +, 1}_1)) Is it possible to assume that a variable converted to a pointer type does not wrap? As in: D.2038_24 = (aintD.2020 *) D.2037_23; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22236
[Bug tree-optimization/22236] [4.1 Regression] wrong code for casts and scev
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-07-27 09:04 --- Subject: Re: [4.1 Regression] wrong code for casts and scev dberlin at dberlin dot org wrote: I don't think it is possible to properly convert these ivs without knowing an approximation of the number of iterations. I disagree. Okay, we could (and should) use other informations than the number of iterations for proving non-wrappingness ;-) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22236
[Bug tree-optimization/22236] [4.1 Regression] wrong code for casts and scev
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-07-26 10:06 --- Subject: Re: [4.1 Regression] wrong code for casts and scev After inlining, we end up with a loop containing the following code: b.0_3 = (signed char) b_8; D.1621_4 = (int) b.0_3; a_5 = (signed char) D.1621_4; D.1640_6 = (int) a_5; b_7 = D.1640_6 - 127; if (b_7 1) goto L3; else goto L9; that is equivalent to: b_7 = ((int) (signed char) (int) (signed char) b_8) - 127; if (b_7 1) goto L3; else goto L9; with b_8 = (unsigned char) {1, +, 1} b_7 = ((int) (signed char) (int) (signed char) {(uchar)1, +, (uchar)1}) - 127; if (b_7 1) goto L3; else goto L9; A sequence of unsigned char 1, 2, ..., 255 has to be converted to signed char that would wrap with -fwrapv: 1, 2, ..., 127, -128, ... So here is a patch that keeps the cast if the sequence wraps. The remaining problem is that with this patch, chrec_convert gets more picky about the things that it transforms: in some of the testcases of the vectorizer, we get some fails because the number of iterations is not known, making scev_probably_wraps_p to return true, and finally the conversion is kept. I have bootstrapped this patch on x86_64, but there are some regressions: FAIL: gcc.dg/vect/vect-46.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-50.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-50.c scan-tree-dump-times Vectorizing an unaligned access 2 FAIL: gcc.dg/vect/vect-50.c scan-tree-dump-times Alignment of access forced using peeling 1 FAIL: gcc.dg/vect/vect-52.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-52.c scan-tree-dump-times Vectorizing an unaligned access 2 FAIL: gcc.dg/vect/vect-58.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-58.c scan-tree-dump-times Alignment of access forced using peeling 1 FAIL: gcc.dg/vect/vect-60.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-60.c scan-tree-dump-times Vectorizing an unaligned access 2 FAIL: gcc.dg/vect/vect-92.c scan-tree-dump-times vectorized 1 loops 3 FAIL: gcc.dg/vect/vect-92.c scan-tree-dump-times Alignment of access forced using peeling 3 In all these cases, the loop bound is a parameter. If IP-constant propagation is not used, chrec_convert has not enough information for removing the cast. I propose to modify all these testcases to make the loop bound explicit. FAIL: gcc.dg/vect/vect-77.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-77.c scan-tree-dump-times Vectorizing an unaligned access 1 FAIL: gcc.dg/vect/vect-78.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-78.c scan-tree-dump-times Vectorizing an unaligned access 1 For these two regressions, the problem is the same: we end with an evolution: ib_16 + (aint *) ((long unsigned int) {off_11, +, 1}_1 * 4) in which the casts cannot be removed because the offset is not known, and even if the number of iterations is known, chrec_convert cannot prove that it does not overflow. I propose to propagate the offset by hand, or to wait for ipcp ;-) FAIL: gcc.dg/vect/vect-87.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-87.c scan-tree-dump-times Alignment of access forced using peeling 1 FAIL: gcc.dg/vect/vect-88.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-88.c scan-tree-dump-times Alignment of access forced using peeling 1 For these two testcases we'll need IP-value range propagation. I think that we'll have to modify these testcases by inlining the code. Dorit, could you look at vect-{77,78,87,88}.c testcases? Thanks. * tree-cfg.c (print_succ_bbs): Correctly print successors. * tree-chrec.c (chrec_convert): Call scev_probably_wraps_p for checking that the iv does not wrap before converting the iv. * tree-ssa-loop-ivcanon.c: Correct typo in comment. * tree-ssa-loop-ivopts.c (idx_find_step): Add a fixme note. * tree-ssa-loop-niter.c (scev_probably_wraps_p): Check that AT_STMT is not null. (convert_step): Add a comment. Index: tree-cfg.c === RCS file: /cvs/gcc/gcc/gcc/tree-cfg.c,v retrieving revision 2.211 diff -d -u -p -r2.211 tree-cfg.c --- tree-cfg.c 12 Jul 2005 13:20:28 - 2.211 +++ tree-cfg.c 26 Jul 2005 09:59:38 - @@ -4454,7 +4454,7 @@ static void print_pred_bbs (FILE *, basi static void print_succ_bbs (FILE *, basic_block bb); -/* Print the predecessors indexes of edge E on FILE. */ +/* Print on FILE the indexes for the predecessors of basic_block BB. */ static void print_pred_bbs (FILE *file, basic_block bb) @@ -4463,11 +4463,11 @@ print_pred_bbs (FILE *file, basic_block edge_iterator ei; FOR_EACH_EDGE (e, ei, bb-preds) -fprintf (file, bb_%d, e-src-index); +fprintf (file, bb_%d , e-src-index); } -/* Print the successors indexes
[Bug tree-optimization/22236] [4.1 Regression] wrong code for casts and scev
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-07-26 15:15 --- Subject: Re: [4.1 Regression] wrong code for casts and scev Dorit Naishlos wrote: The modifications you suggest will make the tests uninteresting - they were introduced with unknown loop-bound/offset on purpose. In fact, for most of these tests there are already versions with explicit constant values for the loop-bound/offset - Okay, I won't modify any of these testcases. vect-46.c with explicit loop-bound becomes vect-40.c vect-50.c with explicit loop-bound becomes vect-44.c vect-52.c with explicit loop-bound becomes vect-48.c vect-58.c with explicit loop-bound becomes vect-54.c vect-60.c with explicit loop-bound becomes vect-56.c vect-77.c and vect-78.c with explicit offset become vect-75.c and vect-92.c also uses unknown loop-bound on purpose. Can we change something else in the tests to make the evolution-analyzer return something saner? by changing types of variables? by using some flag? I don't think it is possible to properly convert these ivs without knowing an approximation of the number of iterations. We can get this either from -fipcp, but this would make the testcases redundant as you said, or having the IP alias analysis that tells us that the only pointers passed to main1 () in vect-46 point to data whose size is 256, and from this extract an estimation of the number of iterations, or last solution explained below. (by the way, where does it fail the vectorizer? (what are the last things that dump details reports?)) compiling vect-46.c produces the following: (... (set_scalar_evolution (scalar = D.2054_15) (scalar_evolution = (afloat * restrict) {0, +, 4}_1 + pb_14)) ) /home/seb/mainline/gcc/gcc/testsuite/gcc.dg/vect/vect-46.c:30: note: Access function of ptr: (afloat * restrict) {0, +, 4}_1 + pb_14 /home/seb/mainline/gcc/gcc/testsuite/gcc.dg/vect/vect-46.c:30: note: not vectorized: ptr is loop invariant. /home/seb/mainline/gcc/gcc/testsuite/gcc.dg/vect/vect-46.c:30: note: not vectorized: unhandled data ref: D.2055_16 = *D.2054_15 /home/seb/mainline/gcc/gcc/testsuite/gcc.dg/vect/vect-46.c:30: note: bad data references. /home/seb/mainline/gcc/gcc/testsuite/gcc.dg/vect/vect-46.c:30: note: vectorized 0 loops in function. Now that we keep the cast, (afloat * restrict) {(uint) 0, +, (uint) 4}_1 + pb_14 cannot be folded into {(afloat * restrict)pb_14, +, (afloat * restrict)4}_1, that's why the vectorizer cannot recognize the pattern. The main problem is that the code in the loop contains a cast of the signed iv i_18 to unsigned: bb_2 (preds = {bb_3 bb_1 }, succs = {bb_3 bb_5 }) { # TMT.5_17 = PHI TMT.5_27(3), TMT.5_26(1); # i_18 = PHI i_24(3), 0(1); L0:; D.2050_6 = (long unsigned int) i_18; D.2051_7 = D.2050_6 * 4; D.2052_8 = (afloat * restrict) D.2051_7; D.2053_10 = D.2052_8 + pa_9; D.2054_15 = D.2052_8 + pb_14; # VUSE TMT.5_17; D.2055_16 = *D.2054_15; D.2056_21 = D.2052_8 + pc_20; # VUSE TMT.5_17; D.2057_22 = *D.2056_21; D.2058_23 = D.2055_16 * D.2057_22; # TMT.5_27 = V_MAY_DEF TMT.5_17; *D.2053_10 = D.2058_23; i_24 = i_18 + 1; if (n_3 i_24) goto L9; else goto L11; } The solution is to have an estimation of the loop bound based on the fact that i_24 and i_18 do not wrap. From this estimation, I think that the other vars D.2050_6, D.2051_7 and D.2052_8 can be proved to not wrap. I'm working on some code for estimating niter from signed non wrapping ivs. Sebastian -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22236
[Bug tree-optimization/21861] [meta-bug] scalar evolution type conversion
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-06-07 20:02 --- fixed. -- What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21861
[Bug tree-optimization/21861] New: [meta-bug] scalar evolution type conversion
At the moment, there are some missed optimizations and errors linked to the same problem: type conversion of induction variables. -- Summary: [meta-bug] scalar evolution type conversion Product: gcc Version: 3.3.1 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sebastian dot pop at cri dot ensmp dot fr CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21861
[Bug tree-optimization/21861] [meta-bug] scalar evolution type conversion
-- What|Removed |Added BugsThisDependsOn||18403, 21029 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21861
[Bug tree-optimization/21029] [4.1 Regression] vrp miscompiles Ada front-end, drops loop exit test in well-defined wrap-around circumstances
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-04-29 19:51 --- Subject: Re: [PR tree-optimization/21029, RFC] harmful chrec type conversions Thanks to Roger Sayle for pointing me at this PR. Alexandre Oliva wrote: This is not a final patch; if the idea is considered sound, I'd simply remove the if and the then-dead remaining code in chrec_convert. The code following the if is not dead, at least it still is used in some cases after hand inlining count_ev_in_wider_type. I would propose this patch that is about the same as yours Index: gcc/ChangeLog from Alexandre Oliva [EMAIL PROTECTED] PR tree-optimization/21029 * tree-chrec.c (chrec_convert): Handle same-precision integral types that differ in signedness like widening conversions. with some more changes as follow: * tree-chrec.c (chrec_convert): Handle same-precision integral types that differ in signedness like widening conversions. Inline count_ev_in_wider_type. * tree-chrec.h (count_ev_in_wider_type): Remove declaration. * tree-scalar-evolution.c (count_ev_in_wider_type): Removed. Zdenek, does this change look right to you? Index: tree-chrec.c === RCS file: /cvs/gcc/gcc/gcc/tree-chrec.c,v retrieving revision 2.15 diff -c -3 -p -r2.15 tree-chrec.c *** tree-chrec.c21 Apr 2005 08:48:51 - 2.15 --- tree-chrec.c29 Apr 2005 19:31:51 - *** tree *** 1036,1085 chrec_convert (tree type, tree chrec) { ! tree ct; ! if (automatically_generated_chrec_p (chrec)) return chrec; ! ct = chrec_type (chrec); if (ct == type) return chrec; ! if (TYPE_PRECISION (ct) TYPE_PRECISION (type)) ! return count_ev_in_wider_type (type, chrec); ! ! switch (TREE_CODE (chrec)) { ! case POLYNOMIAL_CHREC: ! return build_polynomial_chrec (CHREC_VARIABLE (chrec), !chrec_convert (type, ! CHREC_LEFT (chrec)), !chrec_convert (type, ! CHREC_RIGHT (chrec))); ! ! default: ! { ! tree res = fold_convert (type, chrec); ! ! /* Don't propagate overflows. */ ! TREE_OVERFLOW (res) = 0; ! if (CONSTANT_CLASS_P (res)) ! TREE_CONSTANT_OVERFLOW (res) = 0; ! ! /* But reject constants that don't fit in their type after conversion. ! This can happen if TYPE_MIN_VALUE or TYPE_MAX_VALUE are not the ! natural values associated with TYPE_PRECISION and TYPE_UNSIGNED, ! and can cause problems later when computing niters of loops. Note ! that we don't do the check before converting because we don't want ! to reject conversions of negative chrecs to unsigned types. */ ! if (TREE_CODE (res) == INTEGER_CST !TREE_CODE (type) == INTEGER_TYPE !!int_fits_type_p (res, type)) ! res = chrec_dont_know; ! ! return res; ! } } } /* Returns the type of the chrec. */ --- 1036,1104 chrec_convert (tree type, tree chrec) { ! tree ct, base, step; ! struct loop *loop; ! if (automatically_generated_chrec_p (chrec)) return chrec; ! ct = chrec_type (chrec); if (ct == type) return chrec; ! if (!evolution_function_is_affine_p (chrec)) { ! switch (TREE_CODE (chrec)) ! { ! case POLYNOMIAL_CHREC: ! return build_polynomial_chrec (CHREC_VARIABLE (chrec), !chrec_convert (type, ! CHREC_LEFT (chrec)), !chrec_convert (type, ! CHREC_RIGHT (chrec))); ! ! default: ! { ! tree res = fold_convert (type, chrec); ! ! /* Don't propagate overflows. */ ! TREE_OVERFLOW (res) = 0; ! if (CONSTANT_CLASS_P (res)) ! TREE_CONSTANT_OVERFLOW (res) = 0; ! ! /* But reject constants that don't fit in their type after ! conversion. This can happen if TYPE_MIN_VALUE or ! TYPE_MAX_VALUE are not the natural values associated ! with TYPE_PRECISION and TYPE_UNSIGNED, and can cause ! problems later when computing niters of loops. Note ! that we don't do the check before converting because we ! don't want to reject conversions of negative chrecs to ! unsigned types. */ ! if (TREE_CODE (res) == INTEGER_CST !TREE_CODE (type) == INTEGER_TYPE !!int_fits_type_p (res, type)) ! res = chrec_dont_know; ! ! return res
[Bug tree-optimization/20742] [4.0/4.1 Regression] Hang in tree_ssa_iv_optimize_loop
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-04-04 14:50 --- Subject: Re: [4.0/4.1 Regression] Hang in tree_ssa_iv_optimize_loop rakdver at gcc dot gnu dot org wrote: Scev probably should keep track of how large expressions it produces, and just give up if it grows beyond some limit. Okay. What about putting this limit in chrec_fold_plus_1 and chrec_fold_multiply? I think that this will minimize the size of the patch for fixing this problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20742
[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-27 13:18 --- Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) With the following patch I got some speedup for depth 100. from: tree iv optimization : 2.62 (62%) usr 0.27 (82%) sys 2.92 (62%) wall tree iv optimization : 2.33 (59%) usr 0.25 (83%) sys 2.63 (54%) wall to: tree iv optimization : 1.19 (46%) usr 0.04 (40%) sys 1.26 (45%) wall tree iv optimization : 1.21 (47%) usr 0.05 (45%) sys 1.30 (46%) wall Basically we're reseting too much information each time scev_reset is called. It would even be possible to reset only the part of the scev database that contains symbols instead of erasing everything. Do somebody know how to iterate over all the elements of the hash setting to NULL_TREE the elements that answer true to chrec_contains_symbols? Index: tree-scalar-evolution.c === RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.c,v retrieving revision 2.16 diff -d -u -p -r2.16 tree-scalar-evolution.c --- tree-scalar-evolution.c 18 Jan 2005 11:36:26 - 2.16 +++ tree-scalar-evolution.c 27 Jan 2005 13:12:36 - @@ -2490,7 +2490,7 @@ scev_reset (void) for (i = 1; i current_loops-num; i++) { loop = current_loops-parray[i]; - if (loop) + if (loop chrec_contains_symbols (loop-nb_iterations)) loop-nb_iterations = NULL_TREE; } } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-27 15:12 --- Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: the patch is below (in stronger form -- only removing entries that contain invalidated symbols), and indeed helps with this testcase. Many thanks. It however causes measurable slow down on normal code (see analysis in one of the previous mails). I see. Another idea: would it be possible to insert the invalidated names during the optimization pass instead of invalidating all the symbols? This would avoid the first pass over the scev database that search for symbols. Note that your patch is not entirely correct -- in case loop is unrolled, its number of iterations is changed, so in this case scev_reset should clear its number of iterations unconditionally or the unroller should do that work on the unrolled loops? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-27 20:38 --- Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: Another idea: would it be possible to insert the invalidated names during the optimization pass instead of invalidating all the symbols? This would avoid the first pass over the scev database that search for symbols. I don't understand this proposal. Anyway, I see that I was wrong: even if you mark the SSA_NAMEs that are removed in an optimization, and that you invalidate only those symbols, you still have to walk the scev database to ensure that there is no other evolution that references the removed SSA_NAMEs. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-25 10:32 --- Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: Adding the instantiation cache was compile time neutral on normal code, so I don't think the effect on scev cost is significant. How that? You end up querying and caching an evolution for every instantiation of an SSA_NAME! I will quantify the number of queries wrt the number of times this information is useful. I think that with my patch, the instantiation cache information is used zero times on a bootstrap. The problem is that we end up calling the instantiate_parameters_1 function 83214+2453273 (outside + recursive) times (for N=100). We also call analyze_scalar_evolution_1 1146317 times. Many of these calls create POLYNOMIAL_CHREC nodes (2894131 calls to build3_stat). Large part of 5142869 of build_int_cst_wide calls is very likely also due to scev analysis. This is not going to be cheap. Removing the instantiation cache definitely would not decrease the number of instantiate_parameters_1 calls (might increase it, in fact, although I don't know how significantly). This also could be a bad use of the scev analysis. For Steven: you can have a O(N**3) algorithm very simply: loop O(N) times loop O(N) times algorithm in O(N) One part of the problem is that loop optimizers need to throw away the scev caches after each change to loops, which leads to recounting the information too much. Allowing to invalidate only changed parts helps, but seems to be relatively costly on normal code -- you need to scan the hash table for things that refer to removed or from some other reason invalidated ssa names, which is slow. Just set the SSA_NAME to not_analyzed_yet or NULL_TREE, and the next time you'll ask for the evolution of this SSA_NAME you'll analyze the evolution from scratch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-25 10:39 --- Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: How? If the reference is left in symbolic form, it means that you know nothing about its value, so the only thing you can do with it is to check its equality/inequality, in code like while (...) { i = something_weird (); a[i] = ...; (a) a[i+1] = ...; (b) a[i] = ...; (c) } The following is probably a more frequent case: i = 0 x = something_weird () or a function parameter loop i = i + 1 a[i] = ... ... = a[i + x] endloop where you end with a symbolic distance vector. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-25 11:02 --- Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: (*) I hope; scev is a mess of mutualy recursive functions -- analyze_scalar_evolution calling number_of_iterations calling instantiate_parameters calling analyze_scalar_evolution again, with instantiate_parameters hacked so that the infinite cycle cannot occur is my favourite one. Nobody can say anything sure about behavior of scev -- it is not even well defined what analyze_scalar_evolutions will return to you, It returns to you an evolution that might contain SSA_NAMEs. unless you call instantiate_parameters or resolve_mixers on the result. And once you call instantiate_parameters on the result you're guaranteed that what you get does contain only determined constants, or otherwise the result is chrec_dont_know. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-25 12:03 --- Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: --- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-01-25 11:14 --- Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: Adding the instantiation cache was compile time neutral on normal code, so I don't think the effect on scev cost is significant. How that? You end up querying and caching an evolution for every instantiation of an SSA_NAME! which is pretty cheap (constant time operation); note that we do an equivalent lookup in the analyze_scalar_evolution call in any case, also without causing any compile time problems. I haven't measured any slowdown on normal testcases. I will quantify the number of queries wrt the number of times this information is useful. I think that with my patch, the instantiation cache information is used zero times on a bootstrap. I don't think so. Please come up with some numbers, otherwise this discussion is pointless. during a bootstrap: instantiation cache queries: 1146452 instantiation cache uses: 247452 of which 143977 were scev_not_known, the other were SSA_NAMEs. this was counted with a grep|wc with the following patch: Index: tree-scalar-evolution.c === RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.c,v retrieving revision 2.16 diff -d -u -p -r2.16 tree-scalar-evolution.c --- tree-scalar-evolution.c 18 Jan 2005 11:36:26 - 2.16 +++ tree-scalar-evolution.c 25 Jan 2005 12:00:57 - @@ -1898,8 +1898,15 @@ get_instantiated_value (htab_t cache, tr pattern.var = version; info = htab_find (cache, pattern); + fprintf (stdout, IC_query \n); + if (info) -return info-chrec; +{ + fprintf (stdout, IC_used_once \n); + print_generic_expr (stdout, info-chrec, 0); + fprintf (stdout, \n); + return info-chrec; +} else return NULL_TREE; } @@ -1915,6 +1922,8 @@ set_instantiated_value (htab_t cache, tr pattern.var = version; slot = htab_find_slot (cache, pattern, INSERT); + fprintf (stdout, IC_query \n); + if (*slot) info = *slot; else @@ -1990,7 +1999,8 @@ instantiate_parameters_1 (struct loop *l result again. */ bitmap_set_bit (already_instantiated, SSA_NAME_VERSION (chrec)); res = analyze_scalar_evolution (def_loop, chrec); - res = instantiate_parameters_1 (loop, res, allow_superloop_chrecs, cache); + if (res != chrec_dont_know) + res = instantiate_parameters_1 (loop, res, allow_superloop_chrecs, cache); bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec)); /* Store the correct value to the cache. */ @@ -2000,8 +2010,14 @@ instantiate_parameters_1 (struct loop *l case POLYNOMIAL_CHREC: op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec), allow_superloop_chrecs, cache); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec), allow_superloop_chrecs, cache); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (CHREC_LEFT (chrec) != op0 || CHREC_RIGHT (chrec) != op1) chrec = build_polynomial_chrec (CHREC_VARIABLE (chrec), op0, op1); @@ -2010,8 +2026,14 @@ instantiate_parameters_1 (struct loop *l case PLUS_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs, cache); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs, cache); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (TREE_OPERAND (chrec, 0) != op0 || TREE_OPERAND (chrec, 1) != op1) chrec = chrec_fold_plus (TREE_TYPE (chrec), op0, op1); @@ -2020,8 +2042,14 @@ instantiate_parameters_1 (struct loop *l case MINUS_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs, cache); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs, cache); + if (op1 == chrec_dont_know) +return chrec_dont_know; + if (TREE_OPERAND (chrec, 0) != op0 || TREE_OPERAND (chrec, 1) != op1) chrec = chrec_fold_minus (TREE_TYPE (chrec), op0, op1); @@ -2030,8 +2058,14 @@ instantiate_parameters_1 (struct loop *l
[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-25 12:44 --- Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: More seriously -- which of the possibilities? If I have loops like while (...) { while (...) { x_1 = something (); } x_2 = phi (x_1); x_3 = x_2 + 1; } What will analyze_scalar_evolutions return for x_3? There are (at least) three possible valid values: x_3 This would be the answer if analyze_scalar_evolutions would be the identity function. If you want, you could change analyze_scalar_evolutions such that it behaves like that, and decide that the instantiation do the rest of the work (I mean moving the code that is currently in analyze_scalar_evolutions to the instantiation phase). x_2 + 1 If you decide to reconstruct the tree expression, there is no reason to stop on a phi node that has a single argument. Why would you like to get this answer as the reconstructed tree? x_1 + 1 IMO this would be the answer, although I didn't checked. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-25 16:26 --- Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) If you decide to reconstruct the tree expression, there is no reason to stop on a phi node that has a single argument. Why would you like to get this answer as the reconstructed tree? because this answer preserves loop closed ssa form -- the answer x_1 + 1 copy propagates the value outsied of the loop. In some applications this choice could make sense. Okay, if it makes sense, you could modify the analyzer such that it stops reconstructing the tree once it is on the phi following a loop. This won't change the information provided after the instantiation. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-24 21:36 --- Subject: Re: [4.0 Regression] IV-OPTS is O(N^3) Still there remain some inefficiences within the scev analysis itself. Zdenek, have you tried to revert the patch that caches the instantiation information? This could speed up things a little. On a side note, I wonder how many times we're using this instantiation cache, in other words whether we pay a high price for the scev analysis for some (probably) rare cases... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
[Bug tree-optimization/19224] [4.0 regression] Endless loop compiling simple file: Bug in tree-scalar-evolution.c (instantiate_parameters_1)?
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-02 20:37 --- (In reply to comment #5) Caused by an exponential time complexity of instantiate_parameters. I am working on a patch. The following patch solves the exponential time complexity. Sorry for the inconvenient. Sebastian *** instantiate_parameters_1 (struct loop *l *** 1946,1960 result again. Avoid the cyclic instantiation in mixers. */ bitmap_set_bit (already_instantiated, SSA_NAME_VERSION (chrec)); res = analyze_scalar_evolution (def_loop, chrec); ! res = instantiate_parameters_1 (loop, res, allow_superloop_chrecs); bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec)); return res; case POLYNOMIAL_CHREC: op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec), allow_superloop_chrecs); op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec), allow_superloop_chrecs); if (CHREC_LEFT (chrec) != op0 || CHREC_RIGHT (chrec) != op1) chrec = build_polynomial_chrec (CHREC_VARIABLE (chrec), op0, op1); --- 2005,2026 result again. Avoid the cyclic instantiation in mixers. */ bitmap_set_bit (already_instantiated, SSA_NAME_VERSION (chrec)); res = analyze_scalar_evolution (def_loop, chrec); ! if (res != chrec_dont_know) ! res = instantiate_parameters_1 (loop, res, allow_superloop_chrecs); bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec)); return res; case POLYNOMIAL_CHREC: op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec), allow_superloop_chrecs); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec), allow_superloop_chrecs); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (CHREC_LEFT (chrec) != op0 || CHREC_RIGHT (chrec) != op1) chrec = build_polynomial_chrec (CHREC_VARIABLE (chrec), op0, op1); *** instantiate_parameters_1 (struct loop *l *** 1963,1970 --- 2029,2042 case PLUS_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (TREE_OPERAND (chrec, 0) != op0 || TREE_OPERAND (chrec, 1) != op1) chrec = chrec_fold_plus (TREE_TYPE (chrec), op0, op1); *** instantiate_parameters_1 (struct loop *l *** 1973,1980 --- 2045,2058 case MINUS_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (TREE_OPERAND (chrec, 0) != op0 || TREE_OPERAND (chrec, 1) != op1) chrec = chrec_fold_minus (TREE_TYPE (chrec), op0, op1); *** instantiate_parameters_1 (struct loop *l *** 1983,1990 --- 2061,2074 case MULT_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (TREE_OPERAND (chrec, 0) != op0 || TREE_OPERAND (chrec, 1) != op1) chrec = chrec_fold_multiply (TREE_TYPE (chrec), op0, op1); *** instantiate_parameters_1 (struct loop *l *** 2018,2027 --- 2102,2120 case 3: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs); + if (op1 == chrec_dont_know) + return chrec_dont_know; + op2 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 2), allow_superloop_chrecs); + if (op2 == chrec_dont_know) + return
[Bug tree-optimization/19224] [4.0 regression] Endless loop compiling simple file: Bug in tree-scalar-evolution.c (instantiate_parameters_1)?
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-02 21:46 --- (In reply to comment #7) Subject: Re: [4.0 regression] Endless loop compiling simple file: Bug in tree-scalar-evolution.c (instantiate_parameters_1)? not really (well, maybe in this particular case, but not in general). Your patch definitely helps (and you should submit it anyway), but yhe problem is that even with your patch it may happen that instantiate_parameters is called recursively several times for one ssa name if the expression contains it several times, and if this happens for several expressions in a row, you get the exponential complexity. I am just testing the patch below that caches the results to avoid this problem. Richtig. I'm just speeding the chrec_dont_know case, whereas your patch solves the more general problem of huge expressions that have to be instantiated and that don't necessarily lead to unknown evolutions. Thanks. PS: I'll test my patch separately and will post it to gcc-patches. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19224
[Bug tree-optimization/19224] [4.0 regression] Endless loop compiling simple file: Bug in tree-scalar-evolution.c (instantiate_parameters_1)?
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2005-01-02 23:15 --- Subject: Re: [4.0 regression] Endless loop compiling simple file: Bug in tree-scalar-evolution.c (instantiate_parameters_1)? rakdver at gcc dot gnu dot org wrote: --- Additional Comments From rakdver at gcc dot gnu dot org 2005-01-02 18:55 --- Caused by an exponential time complexity of instantiate_parameters. I am working on a patch. The following patch solves the exponential time complexity. Sorry for the inconvenient. Sebastian *** instantiate_parameters_1 (struct loop *l *** 1946,1960 result again. Avoid the cyclic instantiation in mixers. */ bitmap_set_bit (already_instantiated, SSA_NAME_VERSION (chrec)); res = analyze_scalar_evolution (def_loop, chrec); ! res = instantiate_parameters_1 (loop, res, allow_superloop_chrecs); bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec)); return res; case POLYNOMIAL_CHREC: op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec), allow_superloop_chrecs); op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec), allow_superloop_chrecs); if (CHREC_LEFT (chrec) != op0 || CHREC_RIGHT (chrec) != op1) chrec = build_polynomial_chrec (CHREC_VARIABLE (chrec), op0, op1); --- 2005,2026 result again. Avoid the cyclic instantiation in mixers. */ bitmap_set_bit (already_instantiated, SSA_NAME_VERSION (chrec)); res = analyze_scalar_evolution (def_loop, chrec); ! if (res != chrec_dont_know) ! res = instantiate_parameters_1 (loop, res, allow_superloop_chrecs); bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec)); return res; case POLYNOMIAL_CHREC: op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec), allow_superloop_chrecs); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec), allow_superloop_chrecs); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (CHREC_LEFT (chrec) != op0 || CHREC_RIGHT (chrec) != op1) chrec = build_polynomial_chrec (CHREC_VARIABLE (chrec), op0, op1); *** instantiate_parameters_1 (struct loop *l *** 1963,1970 --- 2029,2042 case PLUS_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (TREE_OPERAND (chrec, 0) != op0 || TREE_OPERAND (chrec, 1) != op1) chrec = chrec_fold_plus (TREE_TYPE (chrec), op0, op1); *** instantiate_parameters_1 (struct loop *l *** 1973,1980 --- 2045,2058 case MINUS_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (TREE_OPERAND (chrec, 0) != op0 || TREE_OPERAND (chrec, 1) != op1) chrec = chrec_fold_minus (TREE_TYPE (chrec), op0, op1); *** instantiate_parameters_1 (struct loop *l *** 1983,1990 --- 2061,2074 case MULT_EXPR: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs); + if (op1 == chrec_dont_know) + return chrec_dont_know; + if (TREE_OPERAND (chrec, 0) != op0 || TREE_OPERAND (chrec, 1) != op1) chrec = chrec_fold_multiply (TREE_TYPE (chrec), op0, op1); *** instantiate_parameters_1 (struct loop *l *** 2018,2027 --- 2102,2120 case 3: op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0), allow_superloop_chrecs); + if (op0 == chrec_dont_know) + return chrec_dont_know; + op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1), allow_superloop_chrecs); + if (op1
[Bug tree-optimization/17100] Missed opportunity for value range optimization
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2004-12-19 23:43 --- Patch in autovect-branch: http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01444.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17100
[Bug middle-end/18005] [4.0 Regression] ICE with simple loop with VLA
--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr 2004-10-19 10:03 --- Subject: Re: [4.0 Regression] ICE with simple loop with VLA Patch is here: http://gcc.gnu.org/ml/gcc-patches/2004-10/msg01592.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18005