Re: [PATCH] Builtins handling in IVOPT
Hi, If a pointer typed use is plainly value passed to a func call, it is not an address use, right? But as you said, x86 lea may help here. But that's what you are matching ... (well, for builtins you know will expand that to a memory reference). What I dislike in the patch is the special-casing of some builtins via a target hook. I'd rather say treat all internal functions and all target builtins that way. Or simply all addresses. unless the architecture has lea-type instruction to compute the address, computing say b+4*i incurs some cost, while if mem[b+4*i] is accessed, the computation is for free. Thus, it does not make sense to treat the address computations the same way as memory references (or to treat all functions the same way as builtins which translate to memory references), Zdenek
Re: [PATCH] Builtins handling in IVOPT
Hi, If a pointer typed use is plainly value passed to a func call, it is not an address use, right? But as you said, x86 lea may help here. But that's what you are matching ... (well, for builtins you know will expand that to a memory reference). What I dislike in the patch is the special-casing of some builtins via a target hook. I'd rather say treat all internal functions and all target builtins that way. Or simply all addresses. unless the architecture has lea-type instruction to compute the address, computing say b+4*i incurs some cost, while if mem[b+4*i] is accessed, the computation is for free. Thus, it does not make sense to treat the address computations the same way as memory references (or to treat all functions the same way as builtins which translate to memory references), I understand that, but I think the patch tries to improve code generation not by changing the set of IVs used (thus adjust cost considerations) but by changing the way it rewrites certain address uses. actually, the costs are adjusted in the patch -- the address calculations for the handled builtins are recorded as USE_ADDRESS (not as USE_NONLINEAR_EXPR), so that their costs are calculated in the same way as for memory references, Zdenek
Re: [PATCH] Builtins handling in IVOPT
Hi, This patch works on the intrinsic calls handling issue in IVOPT mentioned here: http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01295.html In find_interesting_uses_stmt, it changes arg = expr __builtin_xxx (arg) to arg = expr; tmp = addr_expr (mem_ref(arg)); __builtin_xxx (tmp, ...) this looks a bit confusing (and wasteful) to me. It would make more sense to just record the argument as USE_ADDRESS and do the rewriting in rewrite_use_address. Zdenek
Re: [PATCH] Fix PR57343
Hi, Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk? PR tree-optimization/57343 * tree-ssa-loop-niter.c (number_of_iterations_ne_max): Do not use multiple_of_p if not TYPE_OVERFLOW_UNDEFINED. (number_of_iterations_cond): Do not build the folded tree. * gcc.dg/torture/pr57343.c: New testcase. ok, Zdenek
Re: [PATCH] Fix PR57396
Hi, Bootstrap and regtest running on x86_64-unknown-linux-gnu. Zdenek, does this look ok? double_int_constant_multiple_p seems to be only used from aff_combination_constant_multiple_p. yes, Zdenek
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
Hi, I realize you're trying to do the same, but by using the SESE copier, you're implicitly trying to do an incremental update. I think you're going to really need to look at the assumptions of that code and verify that the switch FSA optimization doesn't violate those assumptions. Indeed. I'd rather have a flag to the SESE copier that tells it the region is SEME and in that case make it not update dominators but leave that to the caller (which means, recompute them). It seems the code already handles ME regions if the other exits are not important (we don't have to insert/update PHI nodes in those exit blocks), so it may be that there are even more constraints on those unimportant exits due to the iterative dominator update - I think that the entry block of the region needs to dominate all exit destinations, otherwise get_dominated_by_region is not working correctly. In your case one exit is a back-edge? We should be able to add checking to the code to verify unimportant edges are unimportant enough. I'm sure Zdenek knows the limitations best. as far as I remember, indeed only the single-entry assumption is important (but I do not remember all that much, unfortunately), Zdenek
Re: [PATCH] Fix PR56035
Hi, We ICEd on attached testcase because we didn't properly detected the latch edge. Furthermore, I think the code for re-computing latches is somehow broken at the moment. In fix_loop_structure we have /* If there was no latch, schedule the loop for removal. */ if (!first_latch) loop-header = NULL; /* If there was a single latch and it belongs to the loop of the header, record it. */ else if (latch latch-src-loop_father == loop) loop-latch = latch-src; /* Otherwise there are multiple latches which are eventually disambiguated below. */ else loop-latch = NULL; but I don't think we can use the -loop_father info, because that is already cleared by FOR_EACH_BB (bb) { if (changed_bbs) bb-aux = (void *) (size_t) loop_depth (bb-loop_father); bb-loop_father = current_loops-tree_root; } earlier on. I am not quite sure why we test latch-src-loop_father == loop in the first place. Would things work if that condition is removed? I don't much like your proposed patch (as well as the current state), as according to the specification of fix_loop_structure, we are not allowed to assume anything about the bb -- loop mapping, Zdenek
Re: [PATCH] Fix PR56035
Hi, We ICEd on attached testcase because we didn't properly detected the latch edge. Furthermore, I think the code for re-computing latches is somehow broken at the moment. In fix_loop_structure we have /* If there was no latch, schedule the loop for removal. */ if (!first_latch) loop-header = NULL; /* If there was a single latch and it belongs to the loop of the header, record it. */ else if (latch latch-src-loop_father == loop) loop-latch = latch-src; /* Otherwise there are multiple latches which are eventually disambiguated below. */ else loop-latch = NULL; but I don't think we can use the -loop_father info, because that is already cleared by FOR_EACH_BB (bb) { if (changed_bbs) bb-aux = (void *) (size_t) loop_depth (bb-loop_father); bb-loop_father = current_loops-tree_root; } earlier on. I am not quite sure why we test latch-src-loop_father == loop in the first place. Would things work if that condition is removed? I don't much like your proposed patch (as well as the current state), as according to the specification of fix_loop_structure, we are not allowed to assume anything about the bb -- loop mapping, I tried that approach, too, and it worked (even regtested that successfully). let's go with that, then (as well as updating the missleading comment before the test). I was however worried that we might end up with an edge coming out of BB from different loop heading into the header. In this case setting up loop's latch as the source of the latch edge would be wrong. Actually, the comment suggesting that possibility does not make much sense. A latch (a predecessor of the header H that is dominated by H) belongs to the loop headed by H by definition, so I am not quite sure what the test was supposed to do. The latch block may of course belong to a subloop of the loop, but that is not forbidden (and it is taken care of further in fix_loop_structure through force_single_succ_latches in the situations where we want to avoid this possibility), Zdenek
Re: [PATCH] Fix PR55833 + cheaper checking
Hi, Yes, you should check whether you are in an irreducible loop. This is done by testing EDGE_IRREDUCIBLE_LOOP flag, Alright, I was wondering whether there's any other way. Unfortunately, here I couldn't do something like if (loop_preheader_edge (loop)-flags EDGE_IRREDUCIBLE_LOOP) ... because we're natural loop in a subloop, so I abused the loop exits of this loop. Hopefully I'm not doing something evil. Updated patch attached. Ok if testing passes? Thanks, yes, this is OK, Zdenek 2013-01-14 Richard Biener rguent...@suse.de Marek Polacek pola...@redhat.com PR rtl-optimization/55833 * loop-unswitch.c (unswitch_loops): Move loop verification... (unswitch_single_loop): ...here. Call mark_irreducible_loops. * cfgloopmanip.c (fix_loop_placement): Add IRRED_INVALIDATED parameter. Set it to true when we're removing a loop from hierarchy tree in an irreducible region. (fix_bb_placements): Adjust caller. (fix_loop_placements): Likewise. * gcc.dg/pr55833.c: New test.
Re: [PATCH] Fix PR55833 + cheaper checking
Hi, On Thu, Jan 10, 2013 at 11:19:43PM +0100, Zdenek Dvorak wrote: I agree -- at the very least, unswitch_single_loop should check whether there is any possiblity it could have affected irreducible loops information (this should only be the case when some already existing irreducible loop is altered in the progress). Which is what it (or more precisely, remove_path function used by it) tries to do -- so is should be sufficient to check why this fails for the considered testcase, and make sure the situation is correctly detected, Actually, in this case, we don't call remove_path from unswitch_single_loop at all. I am not quite sure what you mean by that -- remove_path is called unconditionally in unswitch_loop (and fix_loop_placement is only called through remove_path). So, here's another stab at it. In this version, we will call mark_irreducible_loops only in case where we're removing a loop from loop hierarchy tree. Because when we do that (and we're in some irreducible region), the edge that connects those two loops should be marked as EDGE_IRREDUCIBLE_LOOP. And the preheader BB eventually as BB_IRREDUCIBLE_LOOP. Does this look any better? I'm not actually checking whether we really are in a irreducible region, should that be done (how?)? Yes, you should check whether you are in an irreducible loop. This is done by testing EDGE_IRREDUCIBLE_LOOP flag, Zdenek
Re: [PATCH] Fix PR55833 + cheaper checking
Hi, On Thu, Jan 10, 2013 at 6:31 PM, Marek Polacek wrote: + /* We changed the CFG. Recompute irreducible BBs and edges. */ + mark_irreducible_loops (); This is a very expensive fix for a really unusual situation. I don't think this is the right thing to do... I agree -- at the very least, unswitch_single_loop should check whether there is any possiblity it could have affected irreducible loops information (this should only be the case when some already existing irreducible loop is altered in the progress). Which is what it (or more precisely, remove_path function used by it) tries to do -- so is should be sufficient to check why this fails for the considered testcase, and make sure the situation is correctly detected, Zdenek
Re: [PATCH] Fix iv_number_of_iterations (PR rtl-optimization/55838)
Hi, When one (or both) IVs have extend_mode wider than mode, but step doesn't fit into mode (the IV is (subreg:MODE (plus:EXTEND_MODE base (mult:EXTEND_MODE i step)) lowpart) ), such as for EXTEND_MODE SImode, MODE QImode and step e.g. 129, 128, 256 or 517, iv_number_of_iterations can create invalid rtl. I think it is safe to just use the lowpart subreg of the step. The second hunk isn't enough, we use iv0.step resp. iv1.step directly in several other places in the routine, and the first hunk IMHO isn't enough either, if for the above extend_mode SI and mode QI iv1.step is 128, the first hunk will make -128 out of it, but then we negate it and get step 128 out of it again, not valid QImode CONST_INT, and use it e.g. as argument to UMOD. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? I think this is ok, Zdenek
Re: [patch, mips] Fix for PR 54619, GCC aborting with -O -mips16
Hi, I think tree-ssa-loop-ivopts is simply asking for the wrong thing, and needs to be changed. As I say, Sandra had some fixes in this area. This patch: http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00319.html Sadly, that patch has fallen off the bottom of my priority list (some legal wrangling halted all work on the Hexagon port that was the motivating example for it), and meanwhile it's become bit-rotten due to Bill Schmidt's work that touched on the ivopts code as well. If someone else would like to adopt this patch, please do! I agree that it is just plain dumb for ivopts to be trying to precompute address costs for modes that are not even valid. Zdenek, this patch prevents ivopts from passing VOIDmode to address_cost. The patch fixes a buildbreaker in newlib for mips16. I've confirmed that I can do a full mips build with newlib with this patch. I've bootstrapped and reg-tested this patch on x86_64. OK for trunk (without the assert)? Thanks, - Tom 2012-11-14 Tom de Vries t...@codesourcery.com * tree-ssa-loop-ivopts.c (get_use_type): New function. (get_computation_at): Use get_use_type. (get_computation_cost_at): Declare and set mem_mode. Use mem_mode. OK, Zdenek
Re: Ping: RFA: Improve doloop_begin support
Hi, 2012-09-26 Jorn Rennecke joern.renne...@arc.com * loop-doloop.c (doloop_modify): Pass doloop_end pattern to gen_doloop_begin. * loop-doloop.c (doloop_optimize): Pass flag to indicate if loop is entered at top to gen_doloop_end. * config/arm/thumb2.md (doloop_end): Accept extra operand. * config/bfin/bfin.md (doloop_end): Likewise. * config/c6x/c6x.md (doloop_end): Likewise. * config/ia64/ia64.md (doloop_end): Likewise. * config/mep/mep.md (doloop_begin, doloop_end): Likewise. * config/rs6000/rs6000.md (doloop_end): Likewise. * config/s390/s390.md (doloop_end): Likewise. * config/sh/sh.md (doloop_end): Likewise. * config/spu/spu.md (doloop_end): Likewise. * config/tilegx/tilegx.md (doloop_end): Likewise. * config/tilepro/tilepro.md (doloop_end): Likewise. * doc/md.texi (doloop_end): Document new operand. http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01807.html + entered_at_top = loop_preheader_edge (loop)-dest == desc-in_edge-dest; is equivalent to + entered_at_top = loop-header == desc-in_edge-dest; But, I don't think it will do what you want. Loops are canonicalized so that their latch blocks have single successors. So, desc-in_edge-dest will be the latch block, not the header, for the loop entered at the top. I think + entered_at_top = (loop-latch == desc-in_edge-dest + forwarder_block_p (loop-latch)); is what you want. Other than that, the patch seems ok to me, Zdenek
Re: Ping: RFA: Improve doloop_begin support
Hi, The loop appears to be entered at the top, yet both my original and your test fail. Looking what happens with your test in more detail, I find that loop-latch == desc-in_edge-dest is true, but forwarder_block_p (loop-latch) fails: 562 if (dest-loop_father-header == dest) (gdb) 563 return false; (gdb) p dest $7 = (basic_block) 0xb7be8198 (gdb) p dest-loop_father-header $8 = (basic_block) 0xb7be8198 The comment in front of this code says: /* Protect loop latches, headers and preheaders. */ So, presumably, the loop latch will remain a forwarder block precisely because forwarder_block_p denies that it is one. So, may I just write: entered_at_top = (loop-latch == desc-in_edge-dest); no -- you should also test that latch contains no active insns. I.e., factorize out whatever forwarder_block_p does except for the test (dest-loop_father-header == dest) test, Zdenek
Re: Ping: RFA: Improve doloop_begin support
Quoting Zdenek Dvorak rakd...@iuuk.mff.cuni.cz: no -- you should also test that latch contains no active insns. I.e., factorize out whatever forwarder_block_p does except for the test (dest-loop_father-header == dest) test, Like this? * basic-block.h (forwarder_block_p_1): Declare. * cfgrtl.c (orwarder_block_p_1): New function, factored out of ... (orwarder_block_p): ... here. yes (except maybe giving it some more descriptive name than forwarder_block_p_1, say contains_no_active_insn_p or something), Zdenek
Re: [RFC, ivopts] fix bugs in ivopts address cost computation
Hi, (7) If the computed address cost turns out to be 0, the current code (for some unknown reason) is turning that into 1, which can screw up the relative costs of address computations vs other operations like addition. I've come up with the attached patch to try to fix these things. The biggest change is that I have discarded the code for precomputing and caching costs and instead go straight to querying the target back end for the cost for the specific address computation we're handed; this makes the code a lot simpler. I would kind of like to get rid of multiplier_allowed_in_address_p too, but it's being used in a couple places other than the address computation and it seemed better not to mess with that for now. The other fixes are pretty straightforward. Can you split the patch into pieces fixing the above bugs separately? Removing the pre-compute and caching is the most questionable change, the others look like real bugs (the symbol cost might be questionable as well). the changes seem reasonable to me (with the caveat about caching). On the other hand, an important thing to keep in mind is that all these are just heuristics that cannot model the actual costs very realistically. Furthermore, due to interference from further optimizations, there is no guarantee that say the choices of the addressing modes by ivopts will actually be respected. Consequently, improving the cost computation will not necessarily improve the resulting code quality, and beyond some point, any such improvements become essentially irrelevant. So, any changes complicating the model should be backed by benchmarking, as otherwise there is no way to say whether there are actually benefitial or not, Zdenek
Re: Preserve pointer types in ivopts
Yes, that could work. ?Though it might end up being incredibly ugly ;) In the code? Should really only change a few lines in the patch I would have thought. I'll get back to you when I have a new version - thanks for the tip. No, in the GIMPLE IL. Richard. And I can just imagine how happy would other optimizations be about such a change. Are we speaking about something that you would like to get into mainline? All of this looks like a lot of hacks to deal with a rather weird target. Zdenek
Re: Preserve pointer types in ivopts
Hello, On 03/16/2012 02:46 PM, Richard Guenther wrote: In the end what we want is a POINTER_PLUS_EXPR variant that does not make alias-analysis assume the result still points to within the objects the pointer pointed to before the increment/decrement. Hold on, is alias analysis really affected by this? Sure, we create temporary pointers that point outside their base object, but we don't dereference them. Anything value that ends up in a MEM_REF can only point into that object again. it can be affected; by standard pointer arithmetics rules, you cannot create a pointer that would be outside of an object (unless you convert it from an integer). Thus, alias analysis may deduce that if we have something like int a[100]; int *p = a + 10; int *q = p - i; *(q + 5) = 1; a[1] = 2; then q points inside a, and consequently q + 5 = a + 5, hence the assignments do not alias. Ivopts may however produce this (in an equivalent form with casts to unsigned) even if i 10. Zdenek
Re: Preserve pointer types in ivopts
Hi, Currently, tree-ssa-loop-ivopts assumes that pointers and integers can be used interchangeably. It prefers to compute everything in unsigned integer types rather than pointer types. On a new target that I'm working on, this assumption is problematic; casting a pointer to an integer and doing arithmetic on the integer would be much too expensive to be useful. I came up with the patch below, which makes ivopts try harder to preserve pointer types. tree-affine is changed to keep track of base pointers, and do arithmetic (in particular subtraction) properly if base pointers are involved. Bootstrapped and regression tested with all languages on i686-linux. I get FAILs in: * gcc.dg/guality/pr43051-1.c A somewhat contrived testcase; gdb now produces optimized out messages which I think is acceptable? In any case I remain to be convinced that this testcase demonstrates any real problem. * gcc.dg/tree-ssa/reassoc-19.c scan-tree-dump-times reassoc2 \+ 0 This seems to fail because we do POINTER_PLUS (ptr, NEG offset)) rather than (char *)((int)ptr - offset). As far as I can tell this is also not a real problem, but I don't know how to adjust the testcase. Comments? Ok? the reason unsigned integer types are prefered is that possible overflows during the computation have defined semantics. With pointer types, the intermediate steps of the computations could have undefined behavior, possibly confusing further optimizations. Is the patch with this regard? Zdenek
Re: Preserve pointer types in ivopts
Hi, Well, what are our rules for whether overflow on POINTER_PLUS_EXPR is defined or not? A quick search through the headers and docs doesn't turn up anything. Would there be a downside to defining it as wrapping? Can you show an example where a POINTER_PLUS_EXPR produced by ivopts would overflow? Don't have a testcase right now, I've just seen IVOPTS many times in the past initialize an IV with start of array minus some constant, end of array plus some constant or similar (which is fine if the IV is unsigned integer of the size of a pointer, but certainly wouldn't be fine if the IV had pointer type). For pointer arithmetics in the IL we assume the C requirements, pointer arithmetics can be performed only within the same object, so for int a[10]; both of the following are invalid, even in the IL: int *p = a - 1; int *q = a + 11; for example, something like this: int a[1000]; void foo(int n) { int i, *p = a; for (i = 8; i n; i++) *p++ = 10; } ivopts may decide to change this to: int a[1000]; void foo(int n) { int i, *p = a - 8; for (i = 8; i n; i++) p[i] = 10; } which may require one less adition, depending on the available addressing modes. Of course, as written above this has undefined behavior; hence, the casts to unsigned integer, Zdenek
Re: [PATCH] Fix RTL sharing bug in loop unswitching (PR rtl-optimization/52092)
Hi, It seems loop-iv.c happily creates shared RTL in lots of places, loop-unroll.c solves that by unsharing everything it adds, this is an attempt to do the same in unswitching to unshare everything iv_analyze came up with. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? I would suggest to move the unsharing to the point where it is clear that the rtl will added, i.e., loop-unswitch.c:370 nloop = unswitch_loop (loop, bbs[i], cond, cinsn); - nloop = unswitch_loop (loop, bbs[i], copy_rtx_if_shared (cond), cinsn); Zdenek
Re: PR rtl-optimization/49710 (segfault in loop peeling)
Hi, --- 316,333 normally. We may assume that e-dest is not a header of any loop, as it now has exactly one predecessor. */ while (loop_outer (e-src-loop_father) ! dominated_by_p (CDI_DOMINATORS, !e-src-loop_father-latch, e-dest)) unloop (e-src-loop_father, irred_invalidated); + l = e-src-loop_father; + while (l loop_outer (l)) + { + while (loop_outer (loop_outer (l)) + dominated_by_p (CDI_DOMINATORS, + loop_outer (l)-latch, e-dest)) + unloop (loop_outer (l), irred_invalidated); + l = loop_outer (l); + } this will not work when e-src-loop_father is the cancelled loop, since the first loop tested for removal is loop_outer (e-src-loop_father). I would suggest for (l = e-src-loop_father; loop_outer (l); l = f) { f = loop_outer (l); if (dominated_by_p (CDI_DOMINATORS, l-latch, e-dest)) unloop (l, irred_invalidated); } Otherwise ok, Zdenek
Re: PR rtl-optimization/51069 (verify_loop_info failed)
Hi, Just a nit, can't you break out of the loop when irred_invalidated is set to true as well? There is no need to look through any further edges. I.e. perhaps: if (!irred_invalidated) FOR_EACH_EDGE (ae, ei, e-src-succs) if (ae != e ae-dest != EXIT_BLOCK_PTR (ae-flags EDGE_IRREDUCIBLE_LOOP) !TEST_BIT (seen, ae-dest-index)) { irred_invalidated = true; break; } Thanks for looking into this, I'll defer the review to somebody familiar with cfgloopmanip.c though. the change looks fine to me. Sure, though we do have horrible time complexity in case irreducible regions are including recomputing the whole CFG flags after every path removal. Yeah, though trying to keep it up-to-date locally was a nightmare. We actually do not use the information about irreducible regions all that much, so maybe the right approach would be to just compute it when needed, Zdenek
Re: [PATCH, PR50322] Fix test-case ivopts-lt.c to use int of same size as pointer.
Hi, The attached patch fixes PR50322. The test-case is supposed to succeed if the loop counter data-type has the same size as a pointer. The patch defines TYPE to be an int datatype of the same size as a pointer, and uses it. After this fix, there's no need for the avr xfails anymore. tested with avr, x86_64 and x86_64 -m32. what about using uintptr_t? Zdenek Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c === --- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c (revision 178804) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c (working copy) @@ -1,8 +1,18 @@ /* { dg-do compile } */ /* { dg-options -O2 -fdump-tree-ivopts } */ +#if (__SIZEOF_LONG_LONG__ == __SIZEOF_POINTER__) +#define TYPE unsigned long long int +#elif (__SIZEOF_LONG__ == __SIZEOF_POINTER__) +#define TYPE unsigned long int +#elif (__SIZEOF_INT__ == __SIZEOF_POINTER__) +#define TYPE unsigned int +#else +#error Add target support here +#endif + void -f1 (char *p, unsigned long int i, unsigned long int n) +f1 (char *p, TYPE i, TYPE n) { p += i; do @@ -14,8 +24,7 @@ f1 (char *p, unsigned long int i, unsign while (i n); } -/* For the fails on avr see PR tree-optimization/50322. */ -/* { dg-final { scan-tree-dump-times PHI 1 ivopts { xfail { avr-*-* } } } } */ +/* { dg-final { scan-tree-dump-times PHI 1 ivopts } } */ /* { dg-final { scan-tree-dump-times PHI p_ 1 ivopts} } */ -/* { dg-final { scan-tree-dump-times p_\[0-9\]* 1 ivopts { xfail { avr-*-* } } } } */ +/* { dg-final { scan-tree-dump-times p_\[0-9\]* 1 ivopts } } */ /* { dg-final { cleanup-tree-dump ivopts } } */
Re: ivopts improvement
Hi, here's the updated version of the patch. The goal is to remove the 'i' iterator from the following example, by replacing 'i n' with 'p base + n'. void f (char *base, unsigned long int i, unsigned long int n) { char *p = base + i; do { *p = '\0'; p++; i++; } while (i n); } bootstrapped and reg-tested on x864_64, and build and reg-tested on MIPS. I will sent a test-case in a separate email. OK for trunk? OK, Zdenek
Re: [PATCH 2/2] Fix PR47594: Build signed niter expressions
Hi, * tree-ssa-loop-niter.c (number_of_iterations_ne): Use the signed types for the trivial case, then convert to unsigned. (number_of_iterations_lt): Use the original signed types. (number_of_iterations_cond): Same. (find_loop_niter): Build signed integer constant. (loop_niter_by_eval): Same. this is incorrect, or at least very dubious. Number of iterations does not have to fit in the signed variant of the type; and since it is always a nonnegative number, even semantically using an unsigned type seems to be a better choice. What is the purpose of this change? Zdenek
Re: [patch] Fix PR tree-optimization/49471
Hi, It'll collide with Sebastians patch in that area. I suggested a INTEGRAL_TYPE_P check instead of the simple_iv one, it should be cheaper. Zdenek, do you think it will be incorrect in some cases? well, it does not make much sense -- reductions of integral type would be taken into consideration for determining the size of the canonical variable. However, it is not a big issue (the choice of the type is more or less arbitrary, as long as the number of iterations fits into it; selecting the type based on another existing iv is just to avoid unnecessary extensions), Hm, ok. Shouldn't we then simply choose a signed type that can hold niter based on the fact that we know this IV won't overflow? Choosing the biggest of all IVs precision looks indeed odd if we just need to count from zero to niter. we often cannot use a signed type (as long as the initial value is zero), since we could not guarantee that it will not overflow (if the number of iterations is more than half of the range of the type), Zdenek
Re: [patch] Fix PR tree-optimization/49471
Hi, This patch fixes the build failure of cactusADM and dealII spec2006 benchmarks when autopar is enabled. (for powerpc they fail only when -m32 is additionally enabled) The problem originated in canonicalize_loop_ivs, where we iterate the header's phis in order to base all the induction variables on a single control variable. We use the largest precision of the loop's ivs in order to determine the type of the control variable. Since iterating the loop's phis takes into account not only the loop's ivs, but also reduction variables, we got precision values like 80 for x86, or 128 for ppc. The compilers failed to create proper types for these sizes (respectively). The proper behavior for determining the control variable's type is to take into account only the loop's ivs, which is what this patch does. Bootstrap and testsuite pass successfully (as autopar is not enabled by default). No new regressions when the testsuite is run with autopar enabled. No new regressions for the run of spec2006 with autopar enabled, cactusADM and dealII benchmarks now pass successfully with autopar on powerpc and x86. Thanks to Zdenek who helped me figure out the failure/fix. OK for trunk? It'll collide with Sebastians patch in that area. I suggested a INTEGRAL_TYPE_P check instead of the simple_iv one, it should be cheaper. Zdenek, do you think it will be incorrect in some cases? well, it does not make much sense -- reductions of integral type would be taken into consideration for determining the size of the canonical variable. However, it is not a big issue (the choice of the type is more or less arbitrary, as long as the number of iterations fits into it; selecting the type based on another existing iv is just to avoid unnecessary extensions), Zdenek Thanks, Richard. Thanks, Razya ChangeLog: PR tree-optimization/49471 * tree-vect-loop-manip.c (canonicalize_loop_ivs): Add condition to ignore reduction variables when iterating the loop header's phis.
Re: PATCH] PR 49580
Hi, I moved the adjustment of the loop's iterations from gimple_duplicate_sese_tail to tree-parloops.c, right before the call to gimple_duplicate_sese_tail. I repeated the bootstrap, regression and spec runs - no new regressions. OK to commit? OK, Zdenek Index: gcc/tree-parloops.c === --- gcc/tree-parloops.c (revision 174166) +++ gcc/tree-parloops.c (working copy) @@ -1464,6 +1464,8 @@ transform_to_exit_first_loop (struct loop *loop, h gimple phi, nphi, cond_stmt, stmt, cond_nit; gimple_stmt_iterator gsi; tree nit_1; + edge exit_1; + tree new_rhs; split_block_after_labels (loop-header); orig_header = single_succ (loop-header); @@ -1492,6 +1494,38 @@ transform_to_exit_first_loop (struct loop *loop, h control = t; } } + + /* Setting the condition towards peeling the last iteration: +If the block consisting of the exit condition has the latch as +successor, then the body of the loop is executed before +the exit condition is tested. In such case, moving the +condition to the entry, causes that the loop will iterate +one less iteration (which is the wanted outcome, since we +peel out the last iteration). If the body is executed after +the condition, moving the condition to the entry requires +decrementing one iteration. */ + exit_1 = EDGE_SUCC (exit-src, EDGE_SUCC (exit-src, 0) == exit); + if (exit_1-dest == loop-latch) +new_rhs = gimple_cond_rhs (cond_stmt); + else + { +new_rhs = fold_build2 (MINUS_EXPR, TREE_TYPE (gimple_cond_rhs (cond_stmt)), +gimple_cond_rhs (cond_stmt), +build_int_cst (TREE_TYPE (gimple_cond_rhs (cond_stmt)), 1)); +if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME) + { + basic_block preheader; + gimple_stmt_iterator gsi1; + + preheader = loop_preheader_edge(loop)-src; + gsi1 = gsi_after_labels (preheader); + new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true, + NULL_TREE,false,GSI_CONTINUE_LINKING); + } + } + gimple_cond_set_rhs (cond_stmt, unshare_expr (new_rhs)); + gimple_cond_set_lhs (cond_stmt, unshare_expr (gimple_cond_lhs (cond_stmt))); + bbs = get_loop_body_in_dom_order (loop); for (n = 0; bbs[n] != loop-latch; n++) Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c(revision 174166) +++ gcc/tree-cfg.c(working copy) @@ -5397,12 +5397,10 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U int total_freq = 0, exit_freq = 0; gcov_type total_count = 0, exit_count = 0; edge exits[2], nexits[2], e; - gimple_stmt_iterator gsi,gsi1; + gimple_stmt_iterator gsi; gimple cond_stmt; edge sorig, snew; basic_block exit_bb; - basic_block iters_bb; - tree new_rhs; gimple_stmt_iterator psi; gimple phi; tree def; @@ -5483,35 +5481,6 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND); cond_stmt = gimple_copy (cond_stmt); - /* If the block consisting of the exit condition has the latch as -successor, then the body of the loop is executed before -the exit condition is tested. In such case, moving the -condition to the entry, causes that the loop will iterate -one less iteration (which is the wanted outcome, since we -peel out the last iteration). If the body is executed after -the condition, moving the condition to the entry requires -decrementing one iteration. */ - if (exits[1]-dest == orig_loop-latch) -new_rhs = gimple_cond_rhs (cond_stmt); - else - { -new_rhs = fold_build2 (MINUS_EXPR, TREE_TYPE (gimple_cond_rhs (cond_stmt)), -gimple_cond_rhs (cond_stmt), -build_int_cst (TREE_TYPE (gimple_cond_rhs (cond_stmt)), 1)); - -if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME) - { - iters_bb = gimple_bb (SSA_NAME_DEF_STMT (gimple_cond_rhs (cond_stmt))); - for (gsi1 = gsi_start_bb (iters_bb); !gsi_end_p (gsi1); gsi_next (gsi1)) - if (gsi_stmt (gsi1) == SSA_NAME_DEF_STMT (gimple_cond_rhs (cond_stmt))) - break; - - new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true, - NULL_TREE,false,GSI_CONTINUE_LINKING); - } - } - gimple_cond_set_rhs (cond_stmt, unshare_expr (new_rhs)); - gimple_cond_set_lhs (cond_stmt, unshare_expr (gimple_cond_lhs (cond_stmt))); gsi_insert_after (gsi, cond_stmt, GSI_NEW_STMT); sorig = single_succ_edge (switch_bb); 07-05-2011 Razya Ladelsky ra...@il.ibm.com * tree-cfg.c (gimple_duplicate_sese_tail): Remove handling of the loop's number of iterations. * tree-parloops.c
Re: PATCH] PR 49580
Hi, This patch fixes the build failure of gcc spec2006 benchmark. The change is in gimple_duplicate_sese_tail(), where we need to subtract 1 from the loop's number of iterations. The stmt created to change the rhs of the loop's condition, used to be inserted right after the defining stmt of the rhs (an ssa name). This requires special handling of different cases of the defining stmt: 1.gimple_stmt 2.gimple_phi 3.default_def Instead of handling each case separately, if we insert the new stmt at the begining of the loop's preheader, we're sure that it's already been defined. Bootstrap and testsuite pass successfully (as autopar is not enabled by default). No new regressions when the testsuite is run with autopar enabled. No new regressions for the run of spec2006 with autopar enabled, and gcc benchmark now passes successfully.. OK for trunk? actually, I think a better approach would be not to have this kind of pass-specific adjustments in gimple_duplicate_sese_tail at all. The code decreasing the number of iterations in gimple_duplicate_sese_tail only works because parloops does induction variable canonicalization first; should we need it to be used anywhere else, it will break. I.e., parloops should first transform the condition so that it does the comparison with the adjusted value, and then gimple_duplicate_sese_tail could do just code copying and cfg changes, Zdenek
Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic
I don't think we should move this kind of undefinedness from C to the GIMPLE semantics. What do other languages allow that we have to support (what did KR C specify?). I don't think there is a formal specification of KR C, just the (somewhat informal) book. On topic of pointer arithmetics, the case of addition is not completely clear. It does say that you can only subtract pointers to members of the same array, though. On topic of addition of integer to a pointer, it says that The construction p + n means the address of the n-th object beyond the one p currently points to. This is true regardless of the kind of object p points to; n is scaled according to the size of the objects p points to, which is determined by the declaration of p. Zdenek
Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic
I don't think we should move this kind of undefinedness from C to the GIMPLE semantics. What do other languages allow that we have to support (what did KR C specify?). I don't think there is a formal specification of KR C, just the (somewhat informal) book. On topic of pointer arithmetics, the case of addition is not completely clear. It does say that you can only subtract pointers to members of the same array, though. On topic of addition of integer to a pointer, it says that The construction p + n means the address of the n-th object beyond the one p currently points to. This is true regardless of the kind of object p points to; n is scaled according to the size of the objects p points to, which is determined by the declaration of p. Anyway, I don't think that this should be a matter of lawyer scrutiny of the specifications; rather, we should consider whether there is a situation where a user could reasonably expect NULL + 0 to be valid. In the example by Richard, int __attribute__((noinline)) foo (void *p, int i) { return p + i != NULL; } I think it would be hard to argue that this construction is natural. Zdenek
Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic
Interesting. I'd never thought about the generation/use angle to prove a pointer was non-null. ISTM we could use that same logic to infer that more pointers are non-null in extract_range_from_binary_expr. Interested in tackling that improvement, obviously as an independent patch? I'm not familiar with vrp code, but.. something like this? Index: tree-vrp.c === --- tree-vrp.c (revision 173703) +++ tree-vrp.c (working copy) @@ -2273,7 +2273,12 @@ extract_range_from_binary_expr (value_ra { /* For pointer types, we are really only interested in asserting whether the expression evaluates to non-NULL. */ - if (range_is_nonnull (vr0) || range_is_nonnull (vr1)) + if (flag_delete_null_pointer_checks nowrap_type_p (expr_type)) the latter would always return true Btw, I guess you'll miscompile a load of code that is strictly undefined. So I'm not sure we want to do this against our users ... Probably not, at least unless the user explicitly asks for it -- for example, we could have -fdelete-null-pointer-checks=2. In fact, it might be a good idea to implement this flag anyway, since some current uses of flag_delete_null_pointer_checks can lead to miscompilations when user makes an error in their code and would probably appreciate more having their program crash. Oh, and of course it's even wrong. I thing it needs !range_includes_zero (vr1) (which we probably don't have). The offset may be 0 and NULL + 0 is still NULL. actually, the result of NULL + 0 is undefined (pointer arithmetics is only defined for pointers to actual objects, and NULL cannot point to one). Zdenek
Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic
Hi, Index: tree-vrp.c === --- tree-vrp.c (revision 173703) +++ tree-vrp.c (working copy) @@ -2273,7 +2273,12 @@ extract_range_from_binary_expr (value_ra { /* For pointer types, we are really only interested in asserting whether the expression evaluates to non-NULL. */ - if (range_is_nonnull (vr0) || range_is_nonnull (vr1)) + if (flag_delete_null_pointer_checks nowrap_type_p (expr_type)) the latter would always return true Btw, I guess you'll miscompile a load of code that is strictly undefined. So I'm not sure we want to do this against our users ... Probably not, at least unless the user explicitly asks for it -- for example, we could have -fdelete-null-pointer-checks=2. In fact, it might be a good idea to implement this flag anyway, since some current uses of flag_delete_null_pointer_checks can lead to miscompilations when user makes an error in their code and would probably appreciate more having their program crash. Oh, and of course it's even wrong. I thing it needs !range_includes_zero (vr1) (which we probably don't have). The offset may be 0 and NULL + 0 is still NULL. actually, the result of NULL + 0 is undefined (pointer arithmetics is only defined for pointers to actual objects, and NULL cannot point to one). It's maybe undefined in C, but is it undefined in the middle-end? Thus, are you sure we never generate it from (void *)((uintptr_t)p + obfuscated_0)? I'm sure we simply fold that to p + obfuscated_0. if we do, we definitely should not -- the only point of such a construction is to bypass the pointer arithmetics restrictions, Zdenek
Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic
diff -u gcc/tree-ssa-loop-niter.c (working copy) gcc/tree-ssa-loop-niter.c (working copy) --- gcc/tree-ssa-loop-niter.c (working copy) +++ gcc/tree-ssa-loop-niter.c (working copy) @@ -2875,6 +2875,16 @@ low = lower_bound_in_type (type, type); high = upper_bound_in_type (type, type); + /* In C, pointer arithmetic p + 1 cannot use a NULL pointer, and p - 1 cannot + produce a NULL pointer. The contrary would mean NULL points to an object, + while NULL is supposed to compare unequal with the address of all objects. + Furthermore, p + 1 cannot produce a NULL pointer and p - 1 cannot use a + NULL pointer since that would mean wrapping, which we assume here not to + happen. So, we can exclude NULL from the valid range of pointer + arithmetic. */ + if (int_cst_value (low) == 0) + low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type))); + record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true); } OK, I think this is only valid for !flag_delete_null_pointer_checks, on architectures where that isn't the default we have to assume that NULL may point to an object. agreed. Thanks for the correction. Zdenek
Re: [PATCH PR45098, 7/10] Nowrap limits iterations
Hi, As far as I can tell, what is current calculated in i_bound (and assigned to nb_iterations_upper_bound), is the maximum amount of times any statement in the loop is executed, where any includes exit tests. Differently put, the maximum amount of times the loop header is executed. hmm... this is rather confusing, I don't really recall why I gave nb_iterations_upper_bound a different semantics from any other instance of what # of iterations of a loop means. This is confirmed by this comment in tree-vrp.c: /* Try to use estimated number of iterations for the loop to constrain the final value in the evolution. We are interested in the number of executions of the latch, while nb_iterations_upper_bound includes the last execution of the exit test. */ I modified the patch to improved the comment. I think a better fix would be to make the nb_iterations_upper_bound semantics consistent with that of nb_iterations. Let me try to do it, hopefully this should be mostly mechanical, Zdenek
Re: [PATCH PR45098, 7/10] Nowrap limits iterations
Hi, The header block of the loop is bb 4, the latch block is bb 3: ... (gdb) p loop.header.index $4 = 4 (gdb) p loop.latch.index $5 = 3 ... The number of times the latch edge is executed, is 10. But loop-nb_iterations_upper_bound, or max_niter is 11: this is a bit strange, it looks like the # of iterations estimation is setting nb_iterations_upper_bound too conservatively (or I gave nb_iterations_upper_bound a different semantics than I remember -- but both my memory and the comment in cfgloop.h suggest that nb_iterations_upper_bound = nb_iterations, i.e., that it should be 10 in your example), Zdenek
Re: [PATCH, PR49121] [4.7 Regression] FAIL: gcc.dg/tree-ssa/ivopt_infer_2.c scan-tree-dump-times ivopts Replacing 0
Hi, The analysis is correct, and the test case needs to be adapted. I adapted the testcase such that it still replaces the exit test if 'a' is declared as 'char a[400]', but not if it is declared as 'extern char a[]'. ok for trunk? OK, Zdenek
Re: [PATCH PR45098, 7/10] Nowrap limits iterations
Hi, Would it be possible to add the handling of these cases to estimate_numbers_of_iterations, rather than doing it just for ivopts? Yes, I did that now. Thanks, - Tom 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * tree-ssa-loop-niter.c (infer_loop_bounds_from_pointer_arith): New function. (infer_loop_bounds_from_undefined): Use new function. this is OK. * tree-ssa-loop-ivopts.c (may_eliminate_iv): Fix estimated_loop_iterations comparison. I don't think this part is correct, though: Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173734) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -4391,8 +4391,13 @@ may_eliminate_iv (struct ivopts_data *da { if (!estimated_loop_iterations (loop, true, max_niter)) return false; - /* The loop bound is already adjusted by adding 1. */ - if (double_int_ucmp (max_niter, period_value) 0) + /* The max iterations applies also to the number of times the loop + exit condition is executed. The number of distinct values of + the cand is period_value + 1. So, test for + 'period_value + 1 = max_iterations'. + */ + period_value = double_int_add (period_value, double_int_one); + if (double_int_ucmp (max_niter, period_value) 0) return false; } else max_niter is the upper bound on the number of iterations of the loop, i.e., the number of executions of its latch edge. Therefore, the control induction variable of the loop will (at the exit statement) achieve at most max_niter + 1 different values. Conversely, the number of distinct values that the control iv can represent is period + 1 (naming of the period variable is a bit missleading). Thus, the correct test is # of available values = # of required values, equivalently period + 1 = max_niter + 1, equivalently period = max_niter, i.e., the current test. Zdenek
Re: [PATCH, PR45098, 2/10]
Hi, 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * tree-ssa-loop-ivopts.c (seq_cost): Fix call to rtx_cost. OK, Zdenek
Re: [PATCH PR45098, 4/10] Iv init cost.
Hi, Resubmitting with comment. The init cost of an iv will in general not be zero. It will be exceptional that the iv register happens to be initialized with the proper value at no cost. In general, there will at the very least be a regcopy or a const set. OK. Please add a comment explaining this to the code, Zdenek 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent cost_base.cost == 0. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c(revision 173380) +++ gcc/tree-ssa-loop-ivopts.c(working copy) @@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d base = cand-iv-base; cost_base = force_var_cost (data, base, NULL); + if (cost_base.cost == 0) + cost_base.cost = COSTS_N_INSNS (1); cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data-speed); cost = cost_step + adjust_setup_cost (data, cost_base.cost);
Re: [PATCH PR45098, 9/10] Cheap shift-add.
Hi, + sa_cost = (TREE_CODE (expr) != MINUS_EXPR + ? shiftadd_cost[speed][mode][m] + : (mult == op1 +? shiftsub1_cost[speed][mode][m] +: shiftsub0_cost[speed][mode][m])); + res = new_cost (sa_cost, 0); + res = add_costs (res, mult == op1 ? cost0 : cost1); just forgetting the cost of the other operand does not seem correct -- what if it contains some more complicated subexpression? Zdenek
Re: [PATCH, SMS 3/3] Skip DEBUG_INSN in loop-doloop.c
Hi, This small fix was inserted to skip DEBUG_INSNs while recognizing doloop pattern in loop-doloop.c file. It's a fix for the already approved do-loop patch (not in mainline yet, http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html) in loop-doloop.c The patch was tested together with the rest of the patches in this series and on top of the patch to support do-loop for ARM (not yet in mainline, but approved http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html). On ppc64-redhat-linux regtest as well as bootstrap with SMS flags enabling SMS also on loops with stage count 1. Regtested on SPU. On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS flags enabling SMS also on loops with stage count 1. OK for mainline? OK, Zdenek
[patch] PR 48837
Hi, when accumulator transformation is performed on a function like foo(a) { if (a 0) return 1 + foo (a - 1) return bla(); } this becomes foo(a) { int tmp = 0; while (a 0) tm = 1 + tmp; return tmp + bla(); } Before, bla was a tail-call, but after the optimization, it is not (since an addition has to be performed after the result of bla is known). However, we used to mark bla as tail-call, leading to a misscompilation later. Fixed by not marking tail-calls when the transformation is performed. Bootstrapped and regtested on i686. Zdenek PR tree-optimization/48837 * tree-tailcall.c (tree_optimize_tail_calls_1): Do not mark tailcalls when accumulator transformation is performed. * gcc.dg/pr48837.c: New testcase. Index: tree-tailcall.c === --- tree-tailcall.c (revision 173354) +++ tree-tailcall.c (working copy) @@ -1021,6 +1021,14 @@ tree_optimize_tail_calls_1 (bool opt_tailcalls) integer_one_node); } + if (a_acc || m_acc) +{ + /* When the tail call elimination using accumulators is performed, +statements adding the accumulated value are inserted at all exits. +This turns all other tail calls to non-tail ones. */ + opt_tailcalls = false; +} + for (; tailcalls; tailcalls = next) { next = tailcalls-next; Index: testsuite/gcc.dg/pr48837.c === --- testsuite/gcc.dg/pr48837.c (revision 0) +++ testsuite/gcc.dg/pr48837.c (revision 0) @@ -0,0 +1,30 @@ +/* PR tree-optimization/48837 */ +/* { dg-do run } */ +/* { dg-options -O2 } */ + +void abort (void); + +__attribute__((noinline)) +int baz(void) +{ + return 1; +} + +inline const int *bar(const int *a, const int *b) +{ + return *a ? a : b; +} + +int foo(int a, int b) +{ + return a || b ? baz() : foo(*bar(a, b), 1) + foo(1, 0); +} + +int main(void) +{ + if (foo(0, 0) != 2) + abort(); + + return 0; +} +
Re: ivopts improvement
Hi, + /* Determine if the exit test is formulated in terms of the phi or the + increment of the use iv. */ + use_uses_inced_iv += gimple_code (SSA_NAME_DEF_STMT (use-iv-ssa_name)) != GIMPLE_PHI; + + /* Determine if the exit test is before or after the increment of the + cand. */ + use_after_cand_inc += stmt_after_increment (data-current_loop, cand, use-stmt); + + /* For now, we only handle these cases. */ + if (use_after_cand_inc != use_uses_inced_iv) +return false; what is this trying to achieve? USE_USES_INCED_IV is pretty much meaningless -- the value of USE does not depend in any way on whether it comes directly from a PHI node or from some other calculation. it is trying to allow for do { *p = '\0'; i++; /* use_uses_inced_iv == true */ p++; /* use_after_cand_inc == true */ if (!(i n)) break; } while (1); and for do { *p = '\0'; if (!(i n)) break; i++; /* use_uses_inced_iv == false */ p++; /* use_after_cand_inc == false */ } while (1); but not for do { *p = '\0'; i++; /* use_uses_inced_iv == true */ if (!(i n)) break; p++; /* use_after_cand_inc == false */ } while (1); and not for do { *p = '\0'; p++; /* use_after_cand_inc == true */ if (!(i n)) break; i++; /* use_uses_inced_iv == false */ } while (1); In the latter 2 cases, I cannot simply replace i n with p base + n. Why not (other than that the value to compare with varies in these cases, but it always is the value of p in the last iteration of the loop)? One more thing: what is fold_walk_def_plus for? Zdenek
Re: ivopts improvement
Hi, it is trying to allow for do { *p = '\0'; i++; /* use_uses_inced_iv == true */ p++; /* use_after_cand_inc == true */ if (!(i n)) break; } while (1); and for do { *p = '\0'; if (!(i n)) break; i++; /* use_uses_inced_iv == false */ p++; /* use_after_cand_inc == false */ } while (1); but not for do { *p = '\0'; i++; /* use_uses_inced_iv == true */ if (!(i n)) break; p++; /* use_after_cand_inc == false */ } while (1); and not for do { *p = '\0'; p++; /* use_after_cand_inc == true */ if (!(i n)) break; i++; /* use_uses_inced_iv == false */ } while (1); In the latter 2 cases, I cannot simply replace i n with p base + n. Why not (other than that the value to compare with varies in these cases, but it always is the value of p in the last iteration of the loop)? In the 2 latter cases, the value to compare with will be either base + n + 1 or base + n - 1. The code does not handle these cases yet. well, if not, then it certainly handles one of the first two cases incorrectly (since the use_uses_inced_iv test is meaningless). One more thing: what is fold_walk_def_plus for? It tries to fold a plus, and if not successful, finds ssa vars in operands of the plus, substitutes the defining statements of ssa vars for those ssa vars and retries folding. Sorry for not being more clear; I meant, why is it used? Zdenek
Re: ivopts improvement
Hi, /* Whether the loop body includes any function calls. */ bool body_includes_call; + + /* Whether the loop body includes any function calls that possibly have side + effects. */ + bool body_includes_side_effect_call; }; /* An assignment of iv candidates to uses. */ @@ -456,6 +460,20 @@ return exit; } +/* Returns true if single_exit (DATA-current_loop) is the only possible exit. + Uses the same logic as loop_only_exit_p. */ why are you duplicating the functionality, instead of simply caching the result of loop_only_exit_p? +/* Tries to detect + NIT == (use_iv_max USE-iv-base) +? 0 +: (use_iv_max - USE-iv-base) + where + use_iv_real_base == (USE-iv-base - USE-iv-step) + CAND-iv-base == base_ptr + use_iv_real_base + and returns the exclusive upper bound for CAND-var_after: + base_ptr + use_iv_max. */ + +static tree +get_lt_bound (struct iv_use *use, struct iv_cand *cand, tree nit) +{ ... + /* use_iv_real_base == use-iv-base - use-iv-step. */ + use_iv_real_base = fold_build_plus (MINUS_EXPR, use-iv-base, use-iv-step); + + /* cand_iv_base. */ + + /* cand-iv-base == base_ptr + use_iv_real_base. */ ... + /* 0. */ ... This function seriously needs better comments. All that are currently present just give relations between variables that can be as easily seen from the code (but do not at all explain what the variables are supposed to mean), or make no sense (what does the 0. comment mean?) Otherwise the patch looks ok (but I would still like to see get_lt_bound with proper comments, currently I don't really understand what happens there), Zdenek