Re: [PATCH] Builtins handling in IVOPT

2013-11-22 Thread Zdenek Dvorak
Hi,

  If a pointer typed use is plainly value passed to a func call, it is
  not an address use, right? But as you said, x86 lea may help here.
 
 But that's what you are matching ... (well, for builtins you know
 will expand that to a memory reference).
 
 What I dislike in the patch is the special-casing of some builtins
 via a target hook.  I'd rather say treat all internal functions and
 all target builtins that way.  Or simply all addresses.

unless the architecture has lea-type instruction to compute the address,
computing say b+4*i incurs some cost, while if mem[b+4*i] is accessed, the
computation is for free.  Thus, it does not make sense to treat the address
computations the same way as memory references (or to treat all functions
the same way as builtins which translate to memory references),

Zdenek


Re: [PATCH] Builtins handling in IVOPT

2013-11-22 Thread Zdenek Dvorak
Hi,

   If a pointer typed use is plainly value passed to a func call, it is
   not an address use, right? But as you said, x86 lea may help here.
 
  But that's what you are matching ... (well, for builtins you know
  will expand that to a memory reference).
 
  What I dislike in the patch is the special-casing of some builtins
  via a target hook.  I'd rather say treat all internal functions and
  all target builtins that way.  Or simply all addresses.
 
  unless the architecture has lea-type instruction to compute the address,
  computing say b+4*i incurs some cost, while if mem[b+4*i] is accessed, the
  computation is for free.  Thus, it does not make sense to treat the address
  computations the same way as memory references (or to treat all functions
  the same way as builtins which translate to memory references),
 
 I understand that, but I think the patch tries to improve code generation
 not by changing the set of IVs used (thus adjust cost considerations)
 but by changing the way it rewrites certain address uses.

actually, the costs are adjusted in the patch -- the address calculations
for the handled builtins are recorded as USE_ADDRESS (not as 
USE_NONLINEAR_EXPR),
so that their costs are calculated in the same way as for memory references,

Zdenek


Re: [PATCH] Builtins handling in IVOPT

2013-11-21 Thread Zdenek Dvorak
Hi,

 This patch works on the intrinsic calls handling issue in IVOPT mentioned 
 here:
 http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01295.html
 
 In find_interesting_uses_stmt, it changes
 
 arg = expr
 __builtin_xxx (arg)
 
 to
 
 arg = expr;
 tmp = addr_expr (mem_ref(arg));
 __builtin_xxx (tmp, ...)

this looks a bit confusing (and wasteful) to me. It would make more sense to
just record the argument as USE_ADDRESS and do the rewriting in 
rewrite_use_address.

Zdenek


Re: [PATCH] Fix PR57343

2013-05-27 Thread Zdenek Dvorak
Hi,

 Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?
 
   PR tree-optimization/57343
   * tree-ssa-loop-niter.c (number_of_iterations_ne_max): Do not
   use multiple_of_p if not TYPE_OVERFLOW_UNDEFINED.
   (number_of_iterations_cond): Do not build the folded tree.
 
   * gcc.dg/torture/pr57343.c: New testcase.

ok,

Zdenek


Re: [PATCH] Fix PR57396

2013-05-24 Thread Zdenek Dvorak
Hi,

 Bootstrap and regtest running on x86_64-unknown-linux-gnu.
 
 Zdenek, does this look ok?  double_int_constant_multiple_p seems to
 be only used from aff_combination_constant_multiple_p.

yes,

Zdenek


Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-15 Thread Zdenek Dvorak
Hi,

  I realize you're trying to do the same, but by using the SESE copier, you're
  implicitly trying to do an incremental update.  I think you're going to
  really need to look at the assumptions of that code and verify that the
  switch FSA optimization doesn't violate those assumptions.
 
 Indeed.  I'd rather have a flag to the SESE copier that tells it the region
 is SEME and in that case make it not update dominators but leave that
 to the caller (which means, recompute them).  It seems the code already
 handles ME regions if the other exits are not important (we don't have
 to insert/update PHI nodes in those exit blocks), so it may be that there
 are even more constraints on those unimportant exits due to the
 iterative dominator update - I think that the entry block of the region
 needs to dominate all exit destinations, otherwise get_dominated_by_region
 is not working correctly.  In your case one exit is a back-edge?  We should
 be able to add checking to the code to verify unimportant edges are
 unimportant enough.
 
 I'm sure Zdenek knows the limitations best.

as far as I remember, indeed only the single-entry assumption is important
(but I do not remember all that much, unfortunately),

Zdenek


Re: [PATCH] Fix PR56035

2013-01-22 Thread Zdenek Dvorak
Hi,

 We ICEd on attached testcase because we didn't properly detected the
 latch edge.  Furthermore, I think the code for re-computing latches
 is somehow broken at the moment.  In fix_loop_structure we have
   /* If there was no latch, schedule the loop for removal.  */
   if (!first_latch)
   loop-header = NULL;
   /* If there was a single latch and it belongs to the loop of the
header, record it.  */
   else if (latch
   latch-src-loop_father == loop)
   loop-latch = latch-src;
   /* Otherwise there are multiple latches which are eventually
  disambiguated below.  */
   else
   loop-latch = NULL;
 but I don't think we can use the -loop_father info, because that is
 already cleared by
   FOR_EACH_BB (bb)
 {
   if (changed_bbs)
   bb-aux = (void *) (size_t) loop_depth (bb-loop_father);
   bb-loop_father = current_loops-tree_root;
 }
 earlier on.

I am not quite sure why we test  latch-src-loop_father == loop in
the first place.  Would things work if that condition is removed?
I don't much like your proposed patch (as well as the current state),
as according to the specification of fix_loop_structure, we are not
allowed to assume anything about the bb -- loop mapping,

Zdenek


Re: [PATCH] Fix PR56035

2013-01-22 Thread Zdenek Dvorak
Hi,

   We ICEd on attached testcase because we didn't properly detected the
   latch edge.  Furthermore, I think the code for re-computing latches
   is somehow broken at the moment.  In fix_loop_structure we have
 /* If there was no latch, schedule the loop for removal.  */
 if (!first_latch)
 loop-header = NULL;
 /* If there was a single latch and it belongs to the loop of the
  header, record it.  */
 else if (latch
 latch-src-loop_father == loop)
 loop-latch = latch-src;
 /* Otherwise there are multiple latches which are eventually
disambiguated below.  */
 else
 loop-latch = NULL;
   but I don't think we can use the -loop_father info, because that is
   already cleared by
 FOR_EACH_BB (bb)
   {
 if (changed_bbs)
 bb-aux = (void *) (size_t) loop_depth (bb-loop_father);
 bb-loop_father = current_loops-tree_root;
   }
   earlier on.
  
  I am not quite sure why we test  latch-src-loop_father == loop in
  the first place.  Would things work if that condition is removed?
  I don't much like your proposed patch (as well as the current state),
  as according to the specification of fix_loop_structure, we are not
  allowed to assume anything about the bb -- loop mapping,
 
 I tried that approach, too, and it worked (even regtested that
 successfully).

let's go with that, then (as well as updating the missleading comment before 
the test).

 I was however worried that we might end up with an edge
 coming out of BB
 from different loop heading into the header.  In this case setting up
 loop's latch as the source of the latch edge would be wrong.

Actually, the comment suggesting that possibility does not make much sense.
A latch (a predecessor of the header H that is dominated by H) belongs to
the loop headed by H by definition, so I am not quite sure what the test was 
supposed to do.

The latch block may of course belong to a subloop of the loop, but that is not
forbidden (and it is taken care of further in fix_loop_structure through 
force_single_succ_latches
in the situations where we want to avoid this possibility),

Zdenek


Re: [PATCH] Fix PR55833 + cheaper checking

2013-01-15 Thread Zdenek Dvorak
Hi,

  Yes, you should check whether you are in an irreducible loop.  This is done 
  by
  testing EDGE_IRREDUCIBLE_LOOP flag,
 
 Alright, I was wondering whether there's any other way.  Unfortunately,
 here I couldn't do something like
 
 if (loop_preheader_edge (loop)-flags  EDGE_IRREDUCIBLE_LOOP)
   ...
 because we're natural loop in a subloop, so I abused the loop exits of
 this loop.  Hopefully I'm not doing something evil.
 
 Updated patch attached.  Ok if testing passes?  Thanks,

yes, this is OK,

Zdenek

 2013-01-14  Richard Biener  rguent...@suse.de
   Marek Polacek  pola...@redhat.com
 
   PR rtl-optimization/55833
   * loop-unswitch.c (unswitch_loops): Move loop verification...
   (unswitch_single_loop): ...here.  Call mark_irreducible_loops.
   * cfgloopmanip.c (fix_loop_placement): Add IRRED_INVALIDATED parameter.
   Set it to true when we're removing a loop from hierarchy tree in
   an irreducible region.
   (fix_bb_placements): Adjust caller.
   (fix_loop_placements): Likewise.
 
   * gcc.dg/pr55833.c: New test.


Re: [PATCH] Fix PR55833 + cheaper checking

2013-01-14 Thread Zdenek Dvorak
Hi,

 On Thu, Jan 10, 2013 at 11:19:43PM +0100, Zdenek Dvorak wrote:
  I agree -- at the very least, unswitch_single_loop should check whether 
  there
  is any possiblity it could have affected irreducible loops information (this
  should only be the case when some already existing irreducible loop is 
  altered
  in the progress).  Which is what it (or more precisely, remove_path function
  used by it) tries to do -- so is should be sufficient to check why this 
  fails
  for the considered testcase, and make sure the situation is correctly 
  detected,
 
 Actually, in this case, we don't call remove_path from unswitch_single_loop
 at all.

I am not quite sure what you mean by that -- remove_path is called 
unconditionally
in unswitch_loop (and fix_loop_placement is only called through remove_path).

 So, here's another stab at it.  In this version, we will call
 mark_irreducible_loops only in case where we're removing a loop
 from loop hierarchy tree.  Because when we do that (and we're in some
 irreducible region), the edge that connects those two loops
 should be marked as EDGE_IRREDUCIBLE_LOOP.  And the preheader BB
 eventually as BB_IRREDUCIBLE_LOOP.  Does this look any better?
 I'm not actually checking whether we really are in a irreducible
 region, should that be done (how?)?

Yes, you should check whether you are in an irreducible loop.  This is done by
testing EDGE_IRREDUCIBLE_LOOP flag,

Zdenek


Re: [PATCH] Fix PR55833 + cheaper checking

2013-01-10 Thread Zdenek Dvorak
Hi,

 On Thu, Jan 10, 2013 at 6:31 PM, Marek Polacek wrote:
  +  /* We changed the CFG.  Recompute irreducible BBs and edges.  */
  +  mark_irreducible_loops ();
 
 This is a very expensive fix for a really unusual situation.
 
 I don't think this is the right thing to do...

I agree -- at the very least, unswitch_single_loop should check whether there
is any possiblity it could have affected irreducible loops information (this
should only be the case when some already existing irreducible loop is altered
in the progress).  Which is what it (or more precisely, remove_path function
used by it) tries to do -- so is should be sufficient to check why this fails
for the considered testcase, and make sure the situation is correctly detected,

Zdenek


Re: [PATCH] Fix iv_number_of_iterations (PR rtl-optimization/55838)

2013-01-03 Thread Zdenek Dvorak
Hi,

 When one (or both) IVs have extend_mode wider than mode, but step doesn't
 fit into mode (the IV is
 (subreg:MODE (plus:EXTEND_MODE base (mult:EXTEND_MODE i step)) lowpart)
 ), such as for EXTEND_MODE SImode, MODE QImode and step e.g. 129, 128, 256
 or 517, iv_number_of_iterations can create invalid rtl.  I think it is safe
 to just use the lowpart subreg of the step.  The second hunk isn't enough,
 we use iv0.step resp. iv1.step directly in several other places in the
 routine, and the first hunk IMHO isn't enough either, if for the above
 extend_mode SI and mode QI iv1.step is 128, the first hunk will make -128 out 
 of
 it, but then we negate it and get step 128 out of it again, not valid QImode
 CONST_INT, and use it e.g. as argument to UMOD.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I think this is ok,

Zdenek


Re: [patch, mips] Fix for PR 54619, GCC aborting with -O -mips16

2012-11-15 Thread Zdenek Dvorak
Hi,

  I think tree-ssa-loop-ivopts is simply
  asking for the wrong thing, and needs to be changed.  As I say,
  Sandra had some fixes in this area.
  
  This patch:
  
  http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00319.html
  
  Sadly, that patch has fallen off the bottom of my priority list (some 
  legal wrangling halted all work on the Hexagon port that was the 
  motivating example for it), and meanwhile it's become bit-rotten due to 
  Bill Schmidt's work that touched on the ivopts code as well.  If someone 
  else would like to adopt this patch, please do!
  
  I agree that it is just plain dumb for ivopts to be trying to precompute 
  address costs for modes that are not even valid.
  
 
 Zdenek,
 
 this patch prevents ivopts from passing VOIDmode to address_cost.
 
 The patch fixes a buildbreaker in newlib for mips16. I've confirmed that I can
 do a full mips build with newlib with this patch.
 
 I've bootstrapped and reg-tested this patch on x86_64.
 
 OK for trunk (without the assert)?
 
 Thanks,
 - Tom
 
 2012-11-14  Tom de Vries  t...@codesourcery.com
 
   * tree-ssa-loop-ivopts.c (get_use_type): New function.
   (get_computation_at): Use get_use_type.
   (get_computation_cost_at): Declare and set mem_mode.  Use mem_mode.

OK,

Zdenek


Re: Ping: RFA: Improve doloop_begin support

2012-10-16 Thread Zdenek Dvorak
Hi,

 2012-09-26  Jorn Rennecke  joern.renne...@arc.com
 
 * loop-doloop.c (doloop_modify): Pass doloop_end pattern to
 gen_doloop_begin.
 * loop-doloop.c (doloop_optimize): Pass flag to indicate if loop is
 entered at top to gen_doloop_end.
   * config/arm/thumb2.md (doloop_end): Accept extra operand.
   * config/bfin/bfin.md (doloop_end): Likewise.
   * config/c6x/c6x.md (doloop_end): Likewise.
   * config/ia64/ia64.md (doloop_end): Likewise.
   * config/mep/mep.md (doloop_begin, doloop_end): Likewise.
   * config/rs6000/rs6000.md (doloop_end): Likewise.
   * config/s390/s390.md (doloop_end): Likewise.
   * config/sh/sh.md (doloop_end): Likewise.
   * config/spu/spu.md (doloop_end): Likewise.
   * config/tilegx/tilegx.md (doloop_end): Likewise.
   * config/tilepro/tilepro.md (doloop_end): Likewise.
   * doc/md.texi (doloop_end): Document new operand.
 
 http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01807.html

+  entered_at_top = loop_preheader_edge (loop)-dest == desc-in_edge-dest;

is equivalent to

+  entered_at_top = loop-header == desc-in_edge-dest;

But, I don't think it will do what you want.  Loops are canonicalized so that
their latch blocks have single successors.  So, desc-in_edge-dest will be
the latch block, not the header, for the loop entered at the top.  I think

+  entered_at_top = (loop-latch == desc-in_edge-dest 
+ forwarder_block_p (loop-latch));

is what you want.  Other than that, the patch seems ok to me,

Zdenek


Re: Ping: RFA: Improve doloop_begin support

2012-10-16 Thread Zdenek Dvorak
Hi,

 The loop appears to be entered at the top, yet both my original and your test
 fail.
 
 Looking what happens with your test in more detail, I find that
 
  loop-latch == desc-in_edge-dest
 
 is true, but forwarder_block_p (loop-latch) fails:
 
 562   if (dest-loop_father-header == dest)
 (gdb)
 563 return false;
 (gdb) p dest
 $7 = (basic_block) 0xb7be8198
 (gdb) p dest-loop_father-header
 $8 = (basic_block) 0xb7be8198
 
 The comment in front of this code says:
   /* Protect loop latches, headers and preheaders.  */
 
 So, presumably, the loop latch will remain a forwarder block precisely because
 forwarder_block_p denies that it is one.
 
 So, may I just write:
 entered_at_top = (loop-latch == desc-in_edge-dest);

no -- you should also test that latch contains no active insns.  I.e., factorize
out whatever forwarder_block_p does except for the test 
(dest-loop_father-header == dest) test,

Zdenek


Re: Ping: RFA: Improve doloop_begin support

2012-10-16 Thread Zdenek Dvorak
 Quoting Zdenek Dvorak rakd...@iuuk.mff.cuni.cz:
 
 no -- you should also test that latch contains no active insns.
 I.e., factorize
 out whatever forwarder_block_p does except for the test
 (dest-loop_father-header == dest) test,
 
 Like this?
 
 * basic-block.h (forwarder_block_p_1): Declare.
 * cfgrtl.c (orwarder_block_p_1): New function, factored out of ...
 (orwarder_block_p): ... here.

yes (except maybe giving it some more descriptive name than forwarder_block_p_1,
say contains_no_active_insn_p or something),

Zdenek


Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-06-06 Thread Zdenek Dvorak
Hi,

  (7) If the computed address cost turns out to be 0, the current code
  (for some unknown reason) is turning that into 1, which can screw up
  the relative costs of address computations vs other operations like
  addition.
 
  I've come up with the attached patch to try to fix these things.  The
  biggest change is that I have discarded the code for precomputing and
  caching costs and instead go straight to querying the target back end
  for the cost for the specific address computation we're handed; this
  makes the code a lot simpler.  I would kind of like to get rid of
  multiplier_allowed_in_address_p too, but it's being used in a couple
  places other than the address computation and it seemed better not to
  mess with that for now.  The other fixes are pretty straightforward.
 
 Can you split the patch into pieces fixing the above bugs separately?
 Removing the pre-compute and caching is the most questionable change,
 the others look like real bugs (the symbol cost might be questionable as
 well).

the changes seem reasonable to me (with the caveat about caching).  On the
other hand, an important thing to keep in mind is that all these are just
heuristics that cannot model the actual costs very realistically.  Furthermore,
due to interference from further optimizations, there is no guarantee that say
the choices of the addressing modes by ivopts will actually be respected.

Consequently, improving the cost computation will not necessarily improve
the resulting code quality, and beyond some point, any such improvements
become essentially irrelevant.  So, any changes complicating the model
should be backed by benchmarking, as otherwise there is no way to say
whether there are actually benefitial or not,

Zdenek


Re: Preserve pointer types in ivopts

2012-03-16 Thread Zdenek Dvorak
  Yes, that could work. ?Though it might end up being incredibly ugly ;)
 
  In the code? Should really only change a few lines in the patch I would
  have thought. I'll get back to you when I have a new version - thanks
  for the tip.
 
 No, in the GIMPLE IL.
 
 Richard.

And I can just imagine how happy would other optimizations be about such a 
change.
Are we speaking about something that you would like to get into mainline?
All of this looks like a lot of hacks to deal with a rather weird target.

Zdenek


Re: Preserve pointer types in ivopts

2012-03-16 Thread Zdenek Dvorak
Hello,

 On 03/16/2012 02:46 PM, Richard Guenther wrote:
  In the end what we want is a POINTER_PLUS_EXPR variant
  that does not make alias-analysis assume the result still points
  to within the objects the pointer pointed to before the increment/decrement.
 
 Hold on, is alias analysis really affected by this? Sure, we create
 temporary pointers that point outside their base object, but we don't
 dereference them. Anything value that ends up in a MEM_REF can only
 point into that object again.

it can be affected; by standard pointer arithmetics rules, you cannot
create a pointer that would be outside of an object (unless you convert
it from an integer).  Thus, alias analysis may deduce that if we have
something like

int a[100];
int *p = a + 10;
int *q = p - i;
*(q + 5) = 1;
a[1] = 2;

then q points inside a, and consequently q + 5 = a + 5, hence the
assignments do not alias.  Ivopts may however produce this (in an equivalent
form with casts to unsigned) even if i  10.

Zdenek


Re: Preserve pointer types in ivopts

2012-03-15 Thread Zdenek Dvorak
Hi,

 Currently, tree-ssa-loop-ivopts assumes that pointers and integers can
 be used interchangeably. It prefers to compute everything in unsigned
 integer types rather than pointer types.
 On a new target that I'm working on, this assumption is problematic;
 casting a pointer to an integer and doing arithmetic on the integer
 would be much too expensive to be useful. I came up with the patch
 below, which makes ivopts try harder to preserve pointer types.
 tree-affine is changed to keep track of base pointers, and do arithmetic
 (in particular subtraction) properly if base pointers are involved.
 
 Bootstrapped and regression tested with all languages on i686-linux. I
 get FAILs in:
 
 * gcc.dg/guality/pr43051-1.c
   A somewhat contrived testcase; gdb now produces optimized out
   messages which I think is acceptable? In any case I remain to be
   convinced that this testcase demonstrates any real problem.
 * gcc.dg/tree-ssa/reassoc-19.c scan-tree-dump-times reassoc2  \+  0
   This seems to fail because we do POINTER_PLUS (ptr, NEG offset))
   rather than (char *)((int)ptr - offset). As far as I can tell this
   is also not a real problem, but I don't know how to adjust the
   testcase.
 
 Comments? Ok?

the reason unsigned integer types are prefered is that possible overflows
during the computation have defined semantics.  With pointer types, the
intermediate steps of the computations could have undefined behavior, possibly
confusing further optimizations.  Is the patch with this regard?

Zdenek


Re: Preserve pointer types in ivopts

2012-03-15 Thread Zdenek Dvorak
Hi,

  Well, what are our rules for whether overflow on POINTER_PLUS_EXPR is
  defined or not? A quick search through the headers and docs doesn't turn
  up anything. Would there be a downside to defining it as wrapping?
  
  Can you show an example where a POINTER_PLUS_EXPR produced by ivopts
  would overflow?
 
 Don't have a testcase right now, I've just seen IVOPTS many times in the
 past initialize an IV with start of array minus some constant, end of array
 plus some constant or similar (which is fine if the IV is unsigned integer
 of the size of a pointer, but certainly wouldn't be fine if the IV had
 pointer type).  For pointer arithmetics in the IL we assume the C
 requirements, pointer arithmetics can be performed only within the same
 object, so for
 int a[10];
 both of the following are invalid, even in the IL:
 int *p = a - 1;
 int *q = a + 11;

for example, something like this:

int a[1000];

void foo(int n)
{
  int i, *p = a;

  for (i = 8; i  n; i++)
*p++ = 10;
}

ivopts may decide to change this to:

int a[1000];

void foo(int n)
{
  int i, *p = a - 8;

  for (i = 8; i  n; i++)
p[i] = 10;
}

which may require one less adition, depending on the available addressing modes.
Of course, as written above this has undefined behavior; hence, the casts to
unsigned integer,

Zdenek


Re: [PATCH] Fix RTL sharing bug in loop unswitching (PR rtl-optimization/52092)

2012-02-03 Thread Zdenek Dvorak
Hi,

 It seems loop-iv.c happily creates shared RTL in lots of places,
 loop-unroll.c solves that by unsharing everything it adds, this is
 an attempt to do the same in unswitching to unshare everything iv_analyze
 came up with.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I would suggest to move the unsharing to the point where it is clear that
the rtl will added, i.e.,

loop-unswitch.c:370
  nloop = unswitch_loop (loop, bbs[i], cond, cinsn);
-
  nloop = unswitch_loop (loop, bbs[i], copy_rtx_if_shared (cond), cinsn);

Zdenek


Re: PR rtl-optimization/49710 (segfault in loop peeling)

2012-01-04 Thread Zdenek Dvorak
Hi,

 --- 316,333 
normally.   We may assume that e-dest is not a header of any loop,
as it now has exactly one predecessor.  */
 while (loop_outer (e-src-loop_father)
 !  dominated_by_p (CDI_DOMINATORS,
 !e-src-loop_father-latch, e-dest))
   unloop (e-src-loop_father, irred_invalidated);
 +   l = e-src-loop_father;
 +   while (l  loop_outer (l))
 + {
 +   while (loop_outer (loop_outer (l))
 +   dominated_by_p (CDI_DOMINATORS,
 + loop_outer (l)-latch, e-dest))
 + unloop (loop_outer (l), irred_invalidated);
 +   l = loop_outer (l);
 + }

this will not work when e-src-loop_father is the cancelled loop,
since the first loop tested for removal is loop_outer (e-src-loop_father).

I would suggest

for (l = e-src-loop_father; loop_outer (l); l = f)
  {
f = loop_outer (l);
if (dominated_by_p (CDI_DOMINATORS, l-latch, e-dest))
  unloop (l, irred_invalidated);
  }

Otherwise ok,

Zdenek


Re: PR rtl-optimization/51069 (verify_loop_info failed)

2011-12-28 Thread Zdenek Dvorak
Hi,

 Just a nit, can't you break out of the loop when irred_invalidated is set to
 true as well?  There is no need to look through any further edges.  I.e.
 perhaps:
   if (!irred_invalidated)
 FOR_EACH_EDGE (ae, ei, e-src-succs)
   if (ae != e
  ae-dest != EXIT_BLOCK_PTR
  (ae-flags  EDGE_IRREDUCIBLE_LOOP)
  !TEST_BIT (seen, ae-dest-index))
   {
 irred_invalidated = true;
 break;
   }
 Thanks for looking into this, I'll defer the review to somebody familiar
 with cfgloopmanip.c though.

the change looks fine to me.

 Sure, though we do have horrible time complexity in case irreducible
 regions are including recomputing the whole CFG flags after every path
 removal.

Yeah, though trying to keep it up-to-date locally was a nightmare.  We actually
do not use the information about irreducible regions all that much, so maybe
the right approach would be to just compute it when needed,

Zdenek


Re: [PATCH, PR50322] Fix test-case ivopts-lt.c to use int of same size as pointer.

2011-09-14 Thread Zdenek Dvorak
Hi,

 The attached patch fixes PR50322.
 
 The test-case is supposed to succeed if the loop counter data-type has the 
 same
 size as a pointer. The patch defines TYPE to be an int datatype of the same 
 size
 as a pointer, and uses it. After this fix, there's no need for the avr xfails
 anymore.
 
 tested with avr, x86_64 and x86_64 -m32.

what about using uintptr_t?

Zdenek

 Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c (revision 178804)
 +++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c (working copy)
 @@ -1,8 +1,18 @@
  /* { dg-do compile } */
  /* { dg-options -O2 -fdump-tree-ivopts } */
  
 +#if (__SIZEOF_LONG_LONG__ == __SIZEOF_POINTER__)
 +#define TYPE unsigned long long int
 +#elif (__SIZEOF_LONG__ == __SIZEOF_POINTER__)
 +#define TYPE unsigned long int
 +#elif (__SIZEOF_INT__ == __SIZEOF_POINTER__)
 +#define TYPE unsigned int
 +#else
 +#error Add target support here
 +#endif
 +
  void
 -f1 (char *p, unsigned long int i, unsigned long int n)
 +f1 (char *p, TYPE i, TYPE n)
  {
p += i;
do
 @@ -14,8 +24,7 @@ f1 (char *p, unsigned long int i, unsign
while (i  n);
  }
  
 -/* For the fails on avr see PR tree-optimization/50322.  */
 -/* { dg-final { scan-tree-dump-times PHI 1 ivopts { xfail { avr-*-* } 
 } } } */
 +/* { dg-final { scan-tree-dump-times PHI 1 ivopts } } */
  /* { dg-final { scan-tree-dump-times PHI p_ 1 ivopts} } */
 -/* { dg-final { scan-tree-dump-times p_\[0-9\]*  1 ivopts { xfail { 
 avr-*-* } } } } */
 +/* { dg-final { scan-tree-dump-times p_\[0-9\]*  1 ivopts } } */
  /* { dg-final { cleanup-tree-dump ivopts } } */



Re: ivopts improvement

2011-08-25 Thread Zdenek Dvorak
Hi,

 here's the updated version of the patch.
 
 The goal is to remove the 'i' iterator from the following example, by
 replacing 'i  n' with 'p  base + n'.
 
 void
 f (char *base, unsigned long int i, unsigned long int n)
 {
   char *p = base + i;
   do
 {
   *p = '\0';
   p++;
   i++;
}
   while (i  n);
 }
 
 bootstrapped and reg-tested on x864_64, and build and reg-tested on MIPS.
 
 I will sent a test-case in a separate email.
 
 OK for trunk?

OK,

Zdenek


Re: [PATCH 2/2] Fix PR47594: Build signed niter expressions

2011-08-02 Thread Zdenek Dvorak
Hi,

   * tree-ssa-loop-niter.c (number_of_iterations_ne): Use the signed types
   for the trivial case, then convert to unsigned.
   (number_of_iterations_lt): Use the original signed types.
   (number_of_iterations_cond): Same.
   (find_loop_niter): Build signed integer constant.
   (loop_niter_by_eval): Same.

this is incorrect, or at least very dubious.  Number of iterations does not have
to fit in the signed variant of the type; and since it is always a nonnegative
number, even semantically using an unsigned type seems to be a better choice.
What is the purpose of this change?

Zdenek


Re: [patch] Fix PR tree-optimization/49471

2011-07-31 Thread Zdenek Dvorak
Hi,

  It'll collide with Sebastians patch in that area.  I suggested a
  INTEGRAL_TYPE_P check instead of the simple_iv one, it
  should be cheaper.  Zdenek, do you think it will be incorrect
  in some cases?
 
  well, it does not make much sense -- reductions of integral type would
  be taken into consideration for determining the size of the canonical
  variable.  However, it is not a big issue (the choice of the type is more
  or less arbitrary, as long as the number of iterations fits into it; 
  selecting
  the type based on another existing iv is just to avoid unnecessary 
  extensions),
 
 Hm, ok.  Shouldn't we then simply choose a signed type that can hold
 niter based on the fact that we know this IV won't overflow?  Choosing
 the biggest of all IVs precision looks indeed odd if we just need to count
 from zero to niter.

we often cannot use a signed type (as long as the initial value is zero),
since we could not guarantee that it will not overflow (if the number of 
iterations
is more than half of the range of the type),

Zdenek


Re: [patch] Fix PR tree-optimization/49471

2011-07-30 Thread Zdenek Dvorak
Hi,

  This patch fixes the build failure of cactusADM and dealII spec2006
  benchmarks when autopar is enabled.
  (for powerpc they fail only when -m32 is additionally enabled)
 
  The problem originated in canonicalize_loop_ivs, where we iterate the
  header's phis in order to base all
  the induction variables on a single control variable.
  We use the largest precision of the loop's ivs in order to determine the
  type of the control variable.
 
  Since iterating the loop's phis takes into account not only the loop's
  ivs, but also reduction variables,
  we got precision values like 80 for x86, or 128 for ppc.
  The compilers failed to create proper types for these sizes
  (respectively).
 
  The proper behavior for determining the control variable's type is to take
  into account only the loop's ivs,
  which is what this patch does.
 
  Bootstrap and testsuite pass successfully (as autopar is not enabled by
  default).
  No new regressions when the testsuite is run with autopar enabled.
  No new regressions for the run of spec2006 with autopar enabled,
 
  cactusADM and dealII benchmarks now pass successfully with autopar on
  powerpc and x86.
 
  Thanks to Zdenek who helped me figure out the failure/fix.
  OK for trunk?
 
 It'll collide with Sebastians patch in that area.  I suggested a
 INTEGRAL_TYPE_P check instead of the simple_iv one, it
 should be cheaper.  Zdenek, do you think it will be incorrect
 in some cases?

well, it does not make much sense -- reductions of integral type would
be taken into consideration for determining the size of the canonical
variable.  However, it is not a big issue (the choice of the type is more
or less arbitrary, as long as the number of iterations fits into it; selecting
the type based on another existing iv is just to avoid unnecessary extensions),

Zdenek

 Thanks,
 Richard.
 
  Thanks,
  Razya
 
  ChangeLog:
 
    PR tree-optimization/49471
    * tree-vect-loop-manip.c (canonicalize_loop_ivs): Add condition to
    ignore reduction variables when iterating the loop header's phis.
 
 
 


Re: PATCH] PR 49580

2011-07-05 Thread Zdenek Dvorak
Hi,

 I moved the adjustment of the loop's iterations from 
 gimple_duplicate_sese_tail
 to tree-parloops.c, right before the call to gimple_duplicate_sese_tail.
 I repeated the bootstrap, regression and spec runs - no new regressions.
 
 OK to commit?

OK,

Zdenek

 Index: gcc/tree-parloops.c
 ===
 --- gcc/tree-parloops.c   (revision 174166)
 +++ gcc/tree-parloops.c   (working copy)
 @@ -1464,6 +1464,8 @@ transform_to_exit_first_loop (struct loop *loop, h
gimple phi, nphi, cond_stmt, stmt, cond_nit;
gimple_stmt_iterator gsi;
tree nit_1;
 +  edge exit_1;
 +  tree new_rhs;
  
split_block_after_labels (loop-header);
orig_header = single_succ (loop-header);
 @@ -1492,6 +1494,38 @@ transform_to_exit_first_loop (struct loop *loop, h
 control = t;
   }
  }
 +
 + /* Setting the condition towards peeling the last iteration:
 +If the block consisting of the exit condition has the latch as
 +successor, then the body of the loop is executed before
 +the exit condition is tested.  In such case, moving the
 +condition to the entry, causes that the loop will iterate
 +one less iteration (which is the wanted outcome, since we
 +peel out the last iteration).  If the body is executed after
 +the condition, moving the condition to the entry requires
 +decrementing one iteration.  */
 +  exit_1 = EDGE_SUCC (exit-src, EDGE_SUCC (exit-src, 0) == exit); 
 +  if (exit_1-dest == loop-latch)
 +new_rhs = gimple_cond_rhs (cond_stmt);
 +  else
 +  {
 +new_rhs = fold_build2 (MINUS_EXPR, TREE_TYPE (gimple_cond_rhs 
 (cond_stmt)),
 +gimple_cond_rhs (cond_stmt),
 +build_int_cst (TREE_TYPE (gimple_cond_rhs 
 (cond_stmt)), 1));
 +if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME)
 +  {
 + basic_block preheader;
 + gimple_stmt_iterator gsi1;
 +
 + preheader = loop_preheader_edge(loop)-src;
 + gsi1 = gsi_after_labels (preheader);
 + new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true,
 + 
 NULL_TREE,false,GSI_CONTINUE_LINKING);
 +  }
 +  }
 +  gimple_cond_set_rhs (cond_stmt, unshare_expr (new_rhs));
 +  gimple_cond_set_lhs (cond_stmt, unshare_expr (gimple_cond_lhs 
 (cond_stmt)));
 +  
bbs = get_loop_body_in_dom_order (loop);
  
for (n = 0; bbs[n] != loop-latch; n++)
 Index: gcc/tree-cfg.c
 ===
 --- gcc/tree-cfg.c(revision 174166)
 +++ gcc/tree-cfg.c(working copy)
 @@ -5397,12 +5397,10 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U
int total_freq = 0, exit_freq = 0;
gcov_type total_count = 0, exit_count = 0;
edge exits[2], nexits[2], e;
 -  gimple_stmt_iterator gsi,gsi1;
 +  gimple_stmt_iterator gsi;
gimple cond_stmt;
edge sorig, snew;
basic_block exit_bb;
 -  basic_block iters_bb;
 -  tree new_rhs;
gimple_stmt_iterator psi;
gimple phi;
tree def;
 @@ -5483,35 +5481,6 @@ gimple_duplicate_sese_tail (edge entry ATTRIBUTE_U
gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND);
cond_stmt = gimple_copy (cond_stmt);
  
 - /* If the block consisting of the exit condition has the latch as
 -successor, then the body of the loop is executed before
 -the exit condition is tested.  In such case, moving the
 -condition to the entry, causes that the loop will iterate
 -one less iteration (which is the wanted outcome, since we
 -peel out the last iteration).  If the body is executed after
 -the condition, moving the condition to the entry requires
 -decrementing one iteration.  */
 -  if (exits[1]-dest == orig_loop-latch)
 -new_rhs = gimple_cond_rhs (cond_stmt);
 -  else
 -  {
 -new_rhs = fold_build2 (MINUS_EXPR, TREE_TYPE (gimple_cond_rhs 
 (cond_stmt)),
 -gimple_cond_rhs (cond_stmt),
 -build_int_cst (TREE_TYPE (gimple_cond_rhs 
 (cond_stmt)), 1));
 -
 -if (TREE_CODE (gimple_cond_rhs (cond_stmt)) == SSA_NAME)
 -  {
 - iters_bb = gimple_bb (SSA_NAME_DEF_STMT (gimple_cond_rhs (cond_stmt)));
 - for (gsi1 = gsi_start_bb (iters_bb); !gsi_end_p (gsi1); gsi_next 
 (gsi1))
 -   if (gsi_stmt (gsi1) == SSA_NAME_DEF_STMT (gimple_cond_rhs 
 (cond_stmt)))
 - break;
 -
 - new_rhs = force_gimple_operand_gsi (gsi1, new_rhs, true,
 - 
 NULL_TREE,false,GSI_CONTINUE_LINKING);
 -  }
 -  }
 -  gimple_cond_set_rhs (cond_stmt, unshare_expr (new_rhs));
 -  gimple_cond_set_lhs (cond_stmt, unshare_expr (gimple_cond_lhs 
 (cond_stmt)));
gsi_insert_after (gsi, cond_stmt, GSI_NEW_STMT);
  
sorig = single_succ_edge (switch_bb);

 
 07-05-2011  Razya Ladelsky  ra...@il.ibm.com
 
   * tree-cfg.c (gimple_duplicate_sese_tail): Remove handling of 
   the loop's number of iterations.
 * tree-parloops.c 

Re: PATCH] PR 49580

2011-06-30 Thread Zdenek Dvorak
Hi,

 This patch fixes the build failure of gcc spec2006 benchmark.
 The change is in  gimple_duplicate_sese_tail(), where we need to subtract 
 1 from the loop's number of iterations.
 The stmt created to change the rhs of the loop's condition, used to be 
 inserted right after the defining stmt of the rhs (an ssa name).
 This requires special handling of different cases of the defining stmt:
 1.gimple_stmt
 2.gimple_phi
 3.default_def
 
 Instead of handling each case separately, if we insert the new stmt at the 
 begining of the loop's preheader, we're sure that 
 it's already been defined.
 
 Bootstrap and testsuite pass successfully (as autopar is not enabled by 
 default).
 No new regressions when the testsuite is run with autopar enabled.
 No new regressions for the run of spec2006 with autopar enabled, and gcc 
 benchmark now passes successfully.. 
 
 OK for trunk? 

actually, I think a better approach would be not to have this kind of 
pass-specific
adjustments in gimple_duplicate_sese_tail at all.  The code decreasing the 
number of
iterations in gimple_duplicate_sese_tail only works because parloops does 
induction
variable canonicalization first; should we need it to be used anywhere else, it 
will break.
I.e., parloops should first transform the condition so that it does the 
comparison with
the adjusted value, and then gimple_duplicate_sese_tail could do just code 
copying
and cfg changes,

Zdenek


Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic

2011-06-20 Thread Zdenek Dvorak
 I don't think we should move this kind of undefinedness from C to
 the GIMPLE semantics.  What do other languages allow that
 we have to support (what did KR C specify?).

I don't think there is a formal specification of KR C, just the (somewhat
informal) book.  On topic of pointer arithmetics, the case of addition
is not completely clear.  It does say that you can only subtract pointers
to members of the same array, though.

On topic of addition of integer to a pointer, it says that The construction
p + n means the address of the n-th object beyond the one p currently points 
to. This is true
regardless of the kind of object p points to; n is scaled according to the size 
of the objects p
points to, which is determined by the declaration of p.

Zdenek


Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic

2011-06-20 Thread Zdenek Dvorak
  I don't think we should move this kind of undefinedness from C to
  the GIMPLE semantics.  What do other languages allow that
  we have to support (what did KR C specify?).
 
 I don't think there is a formal specification of KR C, just the (somewhat
 informal) book.  On topic of pointer arithmetics, the case of addition
 is not completely clear.  It does say that you can only subtract pointers
 to members of the same array, though.
 
 On topic of addition of integer to a pointer, it says that The construction
 p + n means the address of the n-th object beyond the one p currently points 
 to. This is true
 regardless of the kind of object p points to; n is scaled according to the 
 size of the objects p
 points to, which is determined by the declaration of p.

Anyway, I don't think that this should be a matter of lawyer scrutiny of the 
specifications;
rather, we should consider whether there is a situation where a user could 
reasonably expect
NULL + 0 to be valid.  In the example by Richard,

int __attribute__((noinline)) foo (void *p, int i)
{
  return p + i != NULL;
}

I think it would be hard to argue that this construction is natural.

Zdenek


Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic

2011-06-17 Thread Zdenek Dvorak
  Interesting.  I'd never thought about the generation/use angle to prove
  a pointer was non-null.  ISTM we could use that same logic to infer that
  more pointers are non-null in extract_range_from_binary_expr.
 
  Interested in tackling that improvement, obviously as an independent patch?
 
 
  I'm not familiar with vrp code, but.. something like this?
 
  Index: tree-vrp.c
  ===
  --- tree-vrp.c  (revision 173703)
  +++ tree-vrp.c  (working copy)
  @@ -2273,7 +2273,12 @@ extract_range_from_binary_expr (value_ra
         {
           /* For pointer types, we are really only interested in asserting
              whether the expression evaluates to non-NULL.  */
  -         if (range_is_nonnull (vr0) || range_is_nonnull (vr1))
  +         if (flag_delete_null_pointer_checks  nowrap_type_p (expr_type))
 
 the latter would always return true
 
 Btw, I guess you'll miscompile a load of code that is strictly
 undefined.  So I'm not sure we want to do this against our users ...

Probably not, at least unless the user explicitly asks for it -- for example,
we could have -fdelete-null-pointer-checks=2.  In fact, it might be a good idea
to implement this flag anyway, since some current uses of 
flag_delete_null_pointer_checks
can lead to miscompilations when user makes an error in their code and would
probably appreciate more having their program crash.

 Oh, and of course it's even wrong.  I thing it needs 
 !range_includes_zero (vr1) (which we probably don't have).  The
 offset may be 0 and NULL + 0
 is still NULL.

actually, the result of NULL + 0 is undefined (pointer arithmetics is only 
defined
for pointers to actual objects, and NULL cannot point to one).

Zdenek


Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic

2011-06-17 Thread Zdenek Dvorak
Hi,

   Index: tree-vrp.c
   ===
   --- tree-vrp.c  (revision 173703)
   +++ tree-vrp.c  (working copy)
   @@ -2273,7 +2273,12 @@ extract_range_from_binary_expr (value_ra
          {
            /* For pointer types, we are really only interested in asserting
               whether the expression evaluates to non-NULL.  */
   -         if (range_is_nonnull (vr0) || range_is_nonnull (vr1))
   +         if (flag_delete_null_pointer_checks  nowrap_type_p 
   (expr_type))
 
  the latter would always return true
 
  Btw, I guess you'll miscompile a load of code that is strictly
  undefined.  So I'm not sure we want to do this against our users ...
 
  Probably not, at least unless the user explicitly asks for it -- for 
  example,
  we could have -fdelete-null-pointer-checks=2.  In fact, it might be a good 
  idea
  to implement this flag anyway, since some current uses of 
  flag_delete_null_pointer_checks
  can lead to miscompilations when user makes an error in their code and 
  would
  probably appreciate more having their program crash.
 
  Oh, and of course it's even wrong.  I thing it needs 
  !range_includes_zero (vr1) (which we probably don't have).  The
  offset may be 0 and NULL + 0
  is still NULL.
 
  actually, the result of NULL + 0 is undefined (pointer arithmetics is only 
  defined
  for pointers to actual objects, and NULL cannot point to one).
 
 It's maybe undefined in C, but is it undefined in the middle-end?  Thus,
 are you sure we never generate it from (void *)((uintptr_t)p + obfuscated_0)?
 I'm sure we simply fold that to p + obfuscated_0.

if we do, we definitely should not -- the only point of such a construction is
to bypass the pointer arithmetics restrictions,

Zdenek


Re: [PATCH PR45098] Disallow NULL pointer in pointer arithmetic

2011-06-16 Thread Zdenek Dvorak
  diff -u gcc/tree-ssa-loop-niter.c (working copy) gcc/tree-ssa-loop-niter.c 
  (working copy)
  --- gcc/tree-ssa-loop-niter.c (working copy)
  +++ gcc/tree-ssa-loop-niter.c (working copy)
  @@ -2875,6 +2875,16 @@
     low = lower_bound_in_type (type, type);
     high = upper_bound_in_type (type, type);
 
  +  /* In C, pointer arithmetic p + 1 cannot use a NULL pointer, and p - 1 
  cannot
  +     produce a NULL pointer.  The contrary would mean NULL points to an 
  object,
  +     while NULL is supposed to compare unequal with the address of all 
  objects.
  +     Furthermore, p + 1 cannot produce a NULL pointer and p - 1 cannot 
  use a
  +     NULL pointer since that would mean wrapping, which we assume here 
  not to
  +     happen.  So, we can exclude NULL from the valid range of pointer
  +     arithmetic.  */
  +  if (int_cst_value (low) == 0)
  +    low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE 
  (type)));
  +
     record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
   }
 
  OK,
 
 I think this is only valid for !flag_delete_null_pointer_checks, on
 architectures where that isn't the default we have to assume that
 NULL may point to an object.

agreed.  Thanks for the correction.

Zdenek


Re: [PATCH PR45098, 7/10] Nowrap limits iterations

2011-05-31 Thread Zdenek Dvorak
Hi,

 As far as I can tell, what is current calculated in i_bound (and assigned to
 nb_iterations_upper_bound), is the maximum amount of times any statement in 
 the
 loop is executed, where any includes exit tests. Differently put, the maximum
 amount of times the loop header is executed.

hmm... this is rather confusing, I don't really recall why I gave
nb_iterations_upper_bound a different semantics from any other instance
of what # of iterations of a loop means.  

 This is confirmed by this comment in tree-vrp.c:
 
   /* Try to use estimated number of iterations for the loop to constrain the
  final value in the evolution.
  We are interested in the number of executions of the latch, while
  nb_iterations_upper_bound includes the last execution of the exit test.  
 */
 
 I modified the patch to improved the comment.

I think a better fix would be to make the nb_iterations_upper_bound semantics
consistent with that of nb_iterations.  Let me try to do it, hopefully this 
should
be mostly mechanical,

Zdenek


Re: [PATCH PR45098, 7/10] Nowrap limits iterations

2011-05-30 Thread Zdenek Dvorak
Hi,

  The header block of the loop is bb 4, the latch block is bb 3:
  ...
  (gdb) p loop.header.index
  $4 = 4
  (gdb) p loop.latch.index
  $5 = 3
  ...
  
  The number of times the latch edge is executed, is 10.
  
  But loop-nb_iterations_upper_bound, or max_niter is 11:

this is a bit strange, it looks like the # of iterations estimation is setting
nb_iterations_upper_bound too conservatively (or I gave 
nb_iterations_upper_bound
a different semantics than I remember -- but both my memory and the comment in 
cfgloop.h
suggest that nb_iterations_upper_bound = nb_iterations, i.e., that it should 
be 10 in your
example),

Zdenek


Re: [PATCH, PR49121] [4.7 Regression] FAIL: gcc.dg/tree-ssa/ivopt_infer_2.c scan-tree-dump-times ivopts Replacing 0

2011-05-24 Thread Zdenek Dvorak
Hi,

 The analysis is correct, and the test case needs to be adapted.
 
 I adapted the testcase such that it still replaces the exit test if 'a' is
 declared as 'char a[400]', but not if it is declared as 'extern char a[]'.
 
 ok for trunk?

OK,

Zdenek


Re: [PATCH PR45098, 7/10] Nowrap limits iterations

2011-05-21 Thread Zdenek Dvorak
Hi,

  Would it be possible to add the handling of these cases
  to estimate_numbers_of_iterations, rather than doing it just for ivopts?
 
 Yes, I did that now.
 
 Thanks,
 - Tom
 
 2011-05-05  Tom de Vries  t...@codesourcery.com
 
   PR target/45098
   * tree-ssa-loop-niter.c (infer_loop_bounds_from_pointer_arith): New
   function.
   (infer_loop_bounds_from_undefined): Use new function.

this is OK.

   * tree-ssa-loop-ivopts.c (may_eliminate_iv): Fix
   estimated_loop_iterations comparison.

I don't think this part is correct, though:

 Index: gcc/tree-ssa-loop-ivopts.c
 ===
 --- gcc/tree-ssa-loop-ivopts.c (revision 173734)
 +++ gcc/tree-ssa-loop-ivopts.c (working copy)
 @@ -4391,8 +4391,13 @@ may_eliminate_iv (struct ivopts_data *da
  {
if (!estimated_loop_iterations (loop, true, max_niter))
  return false;
 -  /* The loop bound is already adjusted by adding 1.  */
 -  if (double_int_ucmp (max_niter, period_value)  0)
 +  /* The max iterations applies also to the number of times the 
 loop
 + exit condition is executed.  The number of distinct values 
 of
 + the cand is period_value + 1.  So, test for
 + 'period_value + 1 = max_iterations'.
 +   */
 +  period_value = double_int_add (period_value, double_int_one);
 +  if (double_int_ucmp (max_niter, period_value)  0)
  return false;
  }
else

max_niter is the upper bound on the number of iterations of the loop, i.e., the 
number
of executions of its latch edge.  Therefore, the control induction variable of 
the loop
will (at the exit statement) achieve at most max_niter + 1 different values.  
Conversely,
the number of distinct values that the control iv can represent is period + 1 
(naming of
the period variable is a bit missleading).  Thus, the correct test is
# of available values = # of required values, equivalently
period + 1 = max_niter + 1, equivalently
period = max_niter, i.e.,
the current test.

Zdenek


Re: [PATCH, PR45098, 2/10]

2011-05-18 Thread Zdenek Dvorak
Hi,

 2011-05-05  Tom de Vries  t...@codesourcery.com
 
   PR target/45098
   * tree-ssa-loop-ivopts.c (seq_cost): Fix call to rtx_cost.

OK,

Zdenek


Re: [PATCH PR45098, 4/10] Iv init cost.

2011-05-18 Thread Zdenek Dvorak
Hi,

 Resubmitting with comment.
 
 The init cost of an iv will in general not be zero. It will be
 exceptional that the iv register happens to be initialized with the
 proper value at no cost. In general, there will at the very least be a
 regcopy or a const set.

OK.  Please add a comment explaining this to the code,

Zdenek

 2011-05-05  Tom de Vries  t...@codesourcery.com
 
   PR target/45098
   * tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent
   cost_base.cost == 0.

 Index: gcc/tree-ssa-loop-ivopts.c
 ===
 --- gcc/tree-ssa-loop-ivopts.c(revision 173380)
 +++ gcc/tree-ssa-loop-ivopts.c(working copy)
 @@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d
  
base = cand-iv-base;
cost_base = force_var_cost (data, base, NULL);
 +  if (cost_base.cost == 0)
 +  cost_base.cost = COSTS_N_INSNS (1);
cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data-speed);
  
cost = cost_step + adjust_setup_cost (data, cost_base.cost);



Re: [PATCH PR45098, 9/10] Cheap shift-add.

2011-05-18 Thread Zdenek Dvorak
Hi,

 +  sa_cost = (TREE_CODE (expr) != MINUS_EXPR
 + ? shiftadd_cost[speed][mode][m]
 + : (mult == op1
 +? shiftsub1_cost[speed][mode][m]
 +: shiftsub0_cost[speed][mode][m]));
 +  res = new_cost (sa_cost, 0);
 +  res = add_costs (res, mult == op1 ? cost0 : cost1);

just forgetting the cost of the other operand does not seem correct -- what
if it contains some more complicated subexpression?

Zdenek


Re: [PATCH, SMS 3/3] Skip DEBUG_INSN in loop-doloop.c

2011-05-08 Thread Zdenek Dvorak
Hi,

 This small fix was inserted to skip DEBUG_INSNs while
 recognizing doloop pattern in loop-doloop.c file.  It's a fix
 for the already approved do-loop patch (not in mainline yet,
 http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html) in loop-doloop.c
 
 The patch was tested together with the rest of the patches in this series
 and on top of the patch to support do-loop for ARM (not yet in mainline,
 but approved http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html).
 On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
 enabling SMS also on loops with stage count 1.  Regtested on SPU.
 On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
 flags enabling SMS also on loops with stage count 1.
 
 OK for mainline?

OK,

Zdenek


[patch] PR 48837

2011-05-06 Thread Zdenek Dvorak
Hi,

when accumulator transformation is performed on a function like

foo(a)
{
  if (a  0)
return 1 + foo (a - 1)

  return bla();
}

this becomes

foo(a)
{
  int tmp = 0;

  while (a  0)
tm = 1 + tmp;

  return tmp + bla();
}

Before, bla was a tail-call, but after the optimization, it is not (since an 
addition
has to be performed after the result of bla is known).  However, we used to 
mark bla
as tail-call, leading to a misscompilation later.  Fixed by not marking 
tail-calls
when the transformation is performed.  Bootstrapped and regtested on i686.

Zdenek

PR tree-optimization/48837
* tree-tailcall.c (tree_optimize_tail_calls_1): Do not mark tailcalls
when accumulator transformation is performed.

* gcc.dg/pr48837.c: New testcase.

Index: tree-tailcall.c
===
--- tree-tailcall.c (revision 173354)
+++ tree-tailcall.c (working copy)
@@ -1021,6 +1021,14 @@ tree_optimize_tail_calls_1 (bool opt_tailcalls)
 integer_one_node);
 }
 
+  if (a_acc || m_acc)
+{
+  /* When the tail call elimination using accumulators is performed,
+statements adding the accumulated value are inserted at all exits.
+This turns all other tail calls to non-tail ones.  */
+  opt_tailcalls = false;
+}
+
   for (; tailcalls; tailcalls = next)
 {
   next = tailcalls-next;
Index: testsuite/gcc.dg/pr48837.c
===
--- testsuite/gcc.dg/pr48837.c  (revision 0)
+++ testsuite/gcc.dg/pr48837.c  (revision 0)
@@ -0,0 +1,30 @@
+/* PR tree-optimization/48837 */
+/* { dg-do run } */
+/* { dg-options -O2 } */
+
+void abort (void);
+
+__attribute__((noinline))
+int baz(void)
+{
+  return 1;
+}
+
+inline const int *bar(const int *a, const int *b)
+{
+ return *a ? a : b;
+}
+
+int foo(int a, int b)
+{
+   return a || b ? baz() : foo(*bar(a, b), 1) + foo(1, 0);
+}
+
+int main(void)
+{
+ if (foo(0, 0) != 2)
+   abort();
+
+ return 0;
+}
+


Re: ivopts improvement

2011-03-14 Thread Zdenek Dvorak
Hi,

  +  /* Determine if the exit test is formulated in terms of the phi or the
  + increment of the use iv.  */
  +  use_uses_inced_iv
  += gimple_code (SSA_NAME_DEF_STMT (use-iv-ssa_name)) != GIMPLE_PHI;
  +
  +  /* Determine if the exit test is before or after the increment of the
  + cand.  */
  +  use_after_cand_inc
  += stmt_after_increment (data-current_loop, cand, use-stmt);
  +
  +  /* For now, we only handle these cases.  */
  +  if (use_after_cand_inc != use_uses_inced_iv)
  +return false;
  
  what is this trying to achieve?  USE_USES_INCED_IV is pretty much 
  meaningless --
  the value of USE does not depend in any way on whether it comes
  directly from
  a PHI node or from some other calculation.
 
 it is trying to allow for
 
 do
   {
 *p = '\0';
 i++; /* use_uses_inced_iv == true */
 p++; /* use_after_cand_inc == true */
 if (!(i  n))
   break;
   }
 while (1);
 
 and for
 
 do
   {
 *p = '\0';
 if (!(i  n))
   break;
 i++; /* use_uses_inced_iv == false */
 p++; /* use_after_cand_inc == false */
   }
 while (1);
 
 but not for
 
 do
   {
 *p = '\0';
 i++; /* use_uses_inced_iv == true */
 if (!(i  n))
   break;
 p++; /* use_after_cand_inc == false */
   }
 while (1);
 
 and not for
 
 do
   {
 *p = '\0';
 p++; /* use_after_cand_inc == true */
 if (!(i  n))
   break;
 i++; /* use_uses_inced_iv == false */
   }
 while (1);
 
 In the latter 2 cases, I cannot simply replace i  n with p  base + n.

Why not (other than that the value to compare with varies in these cases, but
it always is the value of p in the last iteration of the loop)?

One more thing: what is fold_walk_def_plus for?

Zdenek


Re: ivopts improvement

2011-03-14 Thread Zdenek Dvorak
Hi,

  it is trying to allow for
 
  do
{
  *p = '\0';
  i++; /* use_uses_inced_iv == true */
  p++; /* use_after_cand_inc == true */
  if (!(i  n))
break;
}
  while (1);
 
  and for
 
  do
{
  *p = '\0';
  if (!(i  n))
break;
  i++; /* use_uses_inced_iv == false */
  p++; /* use_after_cand_inc == false */
}
  while (1);
 
  but not for
 
  do
{
  *p = '\0';
  i++; /* use_uses_inced_iv == true */
  if (!(i  n))
break;
  p++; /* use_after_cand_inc == false */
}
  while (1);
 
  and not for
 
  do
{
  *p = '\0';
  p++; /* use_after_cand_inc == true */
  if (!(i  n))
break;
  i++; /* use_uses_inced_iv == false */
}
  while (1);
 
  In the latter 2 cases, I cannot simply replace i  n with p  base + n.
  
  Why not (other than that the value to compare with varies in these cases, 
  but
  it always is the value of p in the last iteration of the loop)?
 
 In the 2 latter cases, the value to compare with will be either base + n
 + 1 or base + n - 1. The code does not handle these cases yet.

well, if not, then it certainly handles one of the first two cases incorrectly
(since the use_uses_inced_iv test is meaningless).

  One more thing: what is fold_walk_def_plus for?
 
 It tries to fold a plus, and if not successful, finds ssa vars in
 operands of the plus, substitutes the defining statements of ssa vars
 for those ssa vars and retries folding.

Sorry for not being more clear; I meant, why is it used?

Zdenek


Re: ivopts improvement

2011-03-04 Thread Zdenek Dvorak
Hi,

/* Whether the loop body includes any function calls.  */
bool body_includes_call;
 +
 +  /* Whether the loop body includes any function calls that possibly have 
 side
 + effects.  */
 +  bool body_includes_side_effect_call;
  };
  
  /* An assignment of iv candidates to uses.  */
 @@ -456,6 +460,20 @@
return exit;
  }
  
 +/* Returns true if single_exit (DATA-current_loop) is the only possible 
 exit.
 +   Uses the same logic as loop_only_exit_p.  */

why are you duplicating the functionality, instead of simply caching the result
of loop_only_exit_p?

 +/* Tries to detect
 + NIT == (use_iv_max  USE-iv-base)
 +? 0
 +: (use_iv_max - USE-iv-base)
 +   where
 + use_iv_real_base == (USE-iv-base - USE-iv-step)
 +  CAND-iv-base == base_ptr + use_iv_real_base
 +   and returns the exclusive upper bound for CAND-var_after:
 + base_ptr + use_iv_max.  */
 +
 +static tree
 +get_lt_bound (struct iv_use *use, struct iv_cand *cand, tree nit)
 +{
...
 +  /* use_iv_real_base == use-iv-base - use-iv-step.  */
 +  use_iv_real_base = fold_build_plus (MINUS_EXPR, use-iv-base, 
 use-iv-step);
 +
 +  /* cand_iv_base.  */
 +
 +  /* cand-iv-base == base_ptr + use_iv_real_base.  */
...
 +  /* 0.  */
...

This function seriously needs better comments.  All that are currently present 
just
give relations between variables that can be as easily seen from the code (but
do not at all explain what the variables are supposed to mean), or make no sense
(what does the 0. comment mean?)

Otherwise the patch looks ok (but I would still like to see get_lt_bound with 
proper
comments, currently I don't really understand what happens there),

Zdenek