[Bug tree-optimization/32306] [6/7/8 Regression] redundant && || not eliminated

2018-02-19 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32306

Jeffrey A. Law  changed:

   What|Removed |Added

   Target Milestone|6.5 |9.0

--- Comment #38 from Jeffrey A. Law  ---
So I prototyped the idea in c#37.  It helps, but is not sufficient to address
this case.   Given that it doesn't fully address this case, I don't think it's
warranted at this stage in development.  I'll queue it up for gcc-9.   I'm also
going to push this BZ out to gcc-9.

[Bug tree-optimization/32306] [6/7/8 Regression] redundant && || not eliminated

2018-02-16 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32306

--- Comment #37 from Jeffrey A. Law  ---
Another thought for dealing with this BZ:  Use the infrastructure Alex built to
identify blocks that are in effect forwarders (ie, they need not be copied for
jump threading).  Use that knowledge to thread deeper in the CFG each
iteration.

[Bug tree-optimization/32306] [6/7/8 Regression] redundant && || not eliminated

2017-11-24 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32306

--- Comment #36 from Jeffrey A. Law  ---
Just a couple notes.  I'm not currently looking at this, but this is probably
the best bug to track thoughts around how to try and capture secondary effects
of jump threading without re-running all of DOM.

-- Taken from another BZ so that it doesn't get lost --

I looked at ways to introduce iteration without the full DOM pass years ago. 
It was pretty obvious that the most interesting things happened as a result of
exposing degenerate PHIs.  But I wasn't ever able to make a leap from that to a
low overhead iterating jump threader.  What did come out of it was the phi-only
cprop pass which propagates away the degenerate PHIs.

--

And today's thought.  In addition to degenerate PHIs the other key property
which indicates an exposed secondary opportunity is a reduction in the number
of preds for a block.  Particularly when we can drop from N to 1 pred.

ssa--dom-simplify-1 is a good example, particularly if one disables the VRP
threader.   Prior to DOM we'll see:

;;   basic block 2, loop depth 0, count 1073741825 (estimated locally), maybe
hot
;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
;;pred:   ENTRY [always]  count:1073741826 (estimated locally)
(FALLTHRU,EXECUTABLE)
  if (x_3(D) > 3)
goto ; [33.00%]
  else
goto ; [67.00%]
;;succ:   3 [33.0% (guessed)]  count:354334800 (estimated locally)
(TRUE_VALUE,EXECUTABLE)
;;4 [67.0% (guessed)]  count:719407025 (estimated locally)
(FALSE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, count 354334802 (estimated locally), maybe
hot
;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;pred:   2 [33.0% (guessed)]  count:354334800 (estimated locally)
(TRUE_VALUE,EXECUTABLE)
  frob (1);
;;succ:   4 [always (guessed)]  count:354334802 (estimated locally)
(FALLTHRU,EXECUTABLE)

;;   basic block 4, loop depth 0, count 1073741825 (estimated locally), maybe
hot
;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;pred:   3 [always (guessed)]  count:354334802 (estimated locally)
(FALLTHRU,EXECUTABLE)
;;2 [67.0% (guessed)]  count:719407025 (estimated locally)
(FALSE_VALUE,EXECUTABLE)
  if (x_3(D) > 2)
goto ; [33.00%]
  else
goto ; [67.00%]
;;succ:   5 [33.0% (guessed)]  count:354334800 (estimated locally)
(TRUE_VALUE,EXECUTABLE)
;;6 [67.0% (guessed)]  count:719407025 (estimated locally)
(FALSE_VALUE,EXECUTABLE)

;;   basic block 5, loop depth 0, count 354334802 (estimated locally), maybe
hot
;;prev block 4, next block 6, flags: (NEW, REACHABLE, VISITED)
;;pred:   4 [33.0% (guessed)]  count:354334800 (estimated locally)
(TRUE_VALUE,EXECUTABLE)
  frob (x_3(D));
;;succ:   6 [always (guessed)]  count:354334802 (estimated locally)
(FALLTHRU,EXECUTABLE)

;;   basic block 6, loop depth 0, count 1073741825 (estimated locally), maybe
hot
;;prev block 5, next block 1, flags: (NEW, REACHABLE, VISITED)
;;pred:   4 [67.0% (guessed)]  count:719407025 (estimated locally)
(FALSE_VALUE,EXECUTABLE)
;;5 [always (guessed)]  count:354334802 (estimated locally)
(FALLTHRU,EXECUTABLE)
  return;
;;succ:   EXIT [always (guessed)]  count:1073741825 (estimated locally)

}


DOM is going to discover the 3->4->5 jump thread easily.But there are no
PHIs in BB4 that would trigger any kind of reanalysis.  But the # preds for BB4
drops from 2 to 1.  This is important as the remaining path into BB4 can be
further optimized after we realize the 3->4->5 jump thread.

It feels as if when we discover a degenerate PHI or the incoming preds drops to
1, then we ought to start a re-analysis.  The blocks involved would start at
the idom of the affected block, covering the dominance tree and the PHI nodes
as the dominance frontier.

I thought I explored that idom/dominance tree/dominance frontier idea years ago
and likely dismissed it as incomplete (consider how scrambled the dominance
tree can get after threading).  But while I could certainly conjure up
scenarios where it's incomplete, it might be "good enough" to catch the
secondary opportunities without a fully iterating DOM.

--

Of course I'm also very interested to evaluate if any of that is necessary with
Aldy's recent work on the backwards threader.