[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 --- Comment #13 from CVS Commits --- The master branch has been updated by Andrew Macleod : https://gcc.gnu.org/g:329d2f0df7d6d22c87ab3338b94caef68139cd58 commit r11-8251-g329d2f0df7d6d22c87ab3338b94caef68139cd58 Author: Andrew MacLeod Date: Fri Apr 16 17:08:51 2021 -0400 tree-optimization/100081 - Limit depth of logical expression windback. Limit how many logical expressions GORI will look back through when evaluating outgoing edge range. PR tree-optimization/100081 * gimple-range-cache.h (ranger_cache): Inherit from gori_compute rather than gori_compute_cache. * gimple-range-gori.cc (is_gimple_logical_p): Move to top of file. (range_def_chain::m_logical_depth): New member. (range_def_chain::range_def_chain): Initialize m_logical_depth. (range_def_chain::get_def_chain): Don't build defchains through more than LOGICAL_LIMIT logical expressions. * params.opt (param_ranger_logical_depth): New.
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 --- Comment #12 from Andrew Macleod --- (In reply to Richard Biener from comment #10) > (In reply to Andrew Macleod from comment #8) > > OMG. Don't bother reducing. I can see the problem. > > > > EVRP is fine, but when wrestrict runs, its quite late, and the CFG has > > > > [local count: 28382607]: > > <...> > > _571 = _61 >= _593; > > _3583 = &arr_724 + _3992; > > _2220 = _831 <= _3583; > > _47 = _571 | _2220; > > _2935 = _376 * 2; > > _3966 = &arr_725 + _2935; > > _3024 = _61 >= _3966; > > _4219 = _3992 * 2; > > _4218 = &arr_725 + _4219; > > _1836 = _831 <= _4218; > > _3080 = _1836 | _3024; > > <...> > > _5348 = _5347 & _32080; > > _5349 = _5348 & _32151; > > _5350 = _5349 & _32176; > > _5351 = _5350 & _32488; > > _5352 = _5351 & _33691; > > _5353 = _5352 & _33762; > > _5354 = _5353 & _34753; > > _35662 = _5354 & _34824; > > if (_35662 != 0) > > goto ; [90.00%] > > else > > goto ; [10.00%] > > > > Its a 7200 stmt basic block, made up of calculations and 2614 ORs and 1480 > > ANDs. > > > > A request is made for a range which can be exported from this block, and > > ranger is dutifully trying everything it can to process those blocks. > > > > Each AND/OR is a logical expression which evaluates a TRUE and FALSE range > > for each operands, so it calculates up to 4 ranges for each pair of > > operands. I knew this could get out of hand in pathological cases, so we > > introduced a logical cache to help resolve this and avoid extra work. Its > > actually making this one worse I think. > > Hmm, still the overall work should be linear to produce ranges for all > of the SSA defs in this BB, no? As heuristic you might want to avoid > producing ranges for single-use defs, like those that are just used in > another & or | combination? Wrestrict should only be interested in > ranges for the "tails" of this &| tree (for example _61 in _61 >= _3966). > Since the direction is bottom up, it is no longer linear. This has probably never been explain very well. lets make up a simple example: if (x > 2 && x < 10 || x == 15) for unsigned x turns into: _1 = x_8(D) + 4294967293; _2 = _1 <= 6; _3 = x_8(D) == 15; _4 = _2 | _3; if (_4 != 0) goto ; [INV] else goto ; [INV] and we can calculate the following ranges (note none of them are calculated in advance, only if asked/required) : 2->3 (T) _4 : bool [1, 1] 2->3 (T) x_8(D) : unsigned int [3, 9][15, 15] 2->5 (F) _1 : unsigned int [7, +INF] 2->5 (F) _2 : bool [0, 0] 2->5 (F) _3 : bool [0, 0] 2->5 (F) _4 : bool [0, 0] 2->5 (F) x_8(D) : unsigned int [0, 2][10, 14][16, +INF] When a client asks for the range of x_8 on the true edge, we have to solve [1,1] = _4 != 0, which in turn feeds back to the def of _4 as: [1,1] = _2 | _3 There are 3 possible ways this branch can be taken.. a) _2 = [1, 1], _3 = [1, 1] b) _2 = [0, 0], _3 = [1, 1] c) _2 = [1, 1], _3 = [0, 0] In order to calculate a precise range for x_8, we need to calculate the range of x_8 for both possible values of _2 and _3 and combine them.. I wont trace the actual calculation for each one, but it boils down to _2 = [0, 0] produces x_8 = ~[3, 9] _2 = [1, 1] produces x_8 = [3, 9] _3 = [0, 0] produces x_8 = ~[15, 15] _3 = [1, 1] produces x_8 = [15, 15] Then we combine them with the 2 possible combinations, and produce the final range of unsigned int [3, 9][15, 15]. Once upon a time I tried to "simplify" this a couple of different ways, but in more complex situations, it inevitably fails to produce the correct range.. so instead, we simply do the calculations exactly as the statement requires and combine them. The logical OR spawned 4 requests for the range of x_8.. so when these logical expressions feed each other, we get the exponential growth of computations. The logical cache was suppose to resolve this by caching the true and false values of x_8 for _2 and _3 eliminating the need to recalculate them. More complex cases with many ssa_names feeding through a boolean condition cause it to not function well. As for single use-use defs.. There is nothing special about them. We never produce ranges for anything that is not used an an outgoing edge calculation, regardless of how many uses there are. Those are tagged and we simply use their global value. Furthermore, we never produce ranges for anything that isn't either explicitly requested, or used in a calculation that affects an explicit request. In this case for instance, I forget the name that restrict asked for, but lets say it was _3992. we start at the bottom of the block and work back to the definition of _3992. During the evaluation, we go through many single-use cases which we will need the ranges for as they feed the condition at the bottom and may therefore affect the outcome. Anything above _3992's def is never evaluated. Up until now, I haven't really throttled
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 Martin Liška changed: What|Removed |Added Keywords|needs-reduction | --- Comment #11 from Martin Liška --- > If you want to try it, this should resolve the issue. I can confirm the patch resolves that.
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 --- Comment #10 from Richard Biener --- (In reply to Andrew Macleod from comment #8) > OMG. Don't bother reducing. I can see the problem. > > EVRP is fine, but when wrestrict runs, its quite late, and the CFG has > > [local count: 28382607]: > <...> > _571 = _61 >= _593; > _3583 = &arr_724 + _3992; > _2220 = _831 <= _3583; > _47 = _571 | _2220; > _2935 = _376 * 2; > _3966 = &arr_725 + _2935; > _3024 = _61 >= _3966; > _4219 = _3992 * 2; > _4218 = &arr_725 + _4219; > _1836 = _831 <= _4218; > _3080 = _1836 | _3024; > <...> > _5348 = _5347 & _32080; > _5349 = _5348 & _32151; > _5350 = _5349 & _32176; > _5351 = _5350 & _32488; > _5352 = _5351 & _33691; > _5353 = _5352 & _33762; > _5354 = _5353 & _34753; > _35662 = _5354 & _34824; > if (_35662 != 0) > goto ; [90.00%] > else > goto ; [10.00%] > > Its a 7200 stmt basic block, made up of calculations and 2614 ORs and 1480 > ANDs. > > A request is made for a range which can be exported from this block, and > ranger is dutifully trying everything it can to process those blocks. > > Each AND/OR is a logical expression which evaluates a TRUE and FALSE range > for each operands, so it calculates up to 4 ranges for each pair of > operands. I knew this could get out of hand in pathological cases, so we > introduced a logical cache to help resolve this and avoid extra work. Its > actually making this one worse I think. Hmm, still the overall work should be linear to produce ranges for all of the SSA defs in this BB, no? As heuristic you might want to avoid producing ranges for single-use defs, like those that are just used in another & or | combination? Wrestrict should only be interested in ranges for the "tails" of this &| tree (for example _61 in _61 >= _3966). But yes, if you have any worse than O(n log n) algorithm then artificially limiting it's cost by capping 'n' at some (--param controlled) value is the way to go.
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 Andrew Macleod changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |amacleod at redhat dot com --- Comment #9 from Andrew Macleod --- Created attachment 50619 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50619&action=edit proposed fix I've added a depth limited for logical combinations. In this patch its set to 6.. I'm going to try running it against various scenarios to make sure it doesn't miss anything we do want... and of course I'll bootstrap and regtest it. If you want to try it, this should resolve the issue.
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 --- Comment #8 from Andrew Macleod --- OMG. Don't bother reducing. I can see the problem. EVRP is fine, but when wrestrict runs, its quite late, and the CFG has [local count: 28382607]: <...> _571 = _61 >= _593; _3583 = &arr_724 + _3992; _2220 = _831 <= _3583; _47 = _571 | _2220; _2935 = _376 * 2; _3966 = &arr_725 + _2935; _3024 = _61 >= _3966; _4219 = _3992 * 2; _4218 = &arr_725 + _4219; _1836 = _831 <= _4218; _3080 = _1836 | _3024; <...> _5348 = _5347 & _32080; _5349 = _5348 & _32151; _5350 = _5349 & _32176; _5351 = _5350 & _32488; _5352 = _5351 & _33691; _5353 = _5352 & _33762; _5354 = _5353 & _34753; _35662 = _5354 & _34824; if (_35662 != 0) goto ; [90.00%] else goto ; [10.00%] Its a 7200 stmt basic block, made up of calculations and 2614 ORs and 1480 ANDs. A request is made for a range which can be exported from this block, and ranger is dutifully trying everything it can to process those blocks. Each AND/OR is a logical expression which evaluates a TRUE and FALSE range for each operands, so it calculates up to 4 ranges for each pair of operands. I knew this could get out of hand in pathological cases, so we introduced a logical cache to help resolve this and avoid extra work. Its actually making this one worse I think. Regardless, I know what the issue is. I have 2 things to try. 1) We have a patch in our branch that gives up early.. once it finds the result is going to be varying... I'll give that a shot first, it may stop this lookup quickly. Its possible it won't. if not, then 2) we've also discussed that in ridiculously large combinations of &&/|| we are unlikely to be able to haul a useful range out of it, so limit the depth of logical processing to something in a reasonable range. 4000 logicals operations is not reasonable to look thru. something more akin to 10 maybe at most.. Anyway, I know what the issue is and will have it resolved early next week at the latest.
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 Martin Liška changed: What|Removed |Added Keywords||needs-reduction --- Comment #7 from Martin Liška --- I'm reducing the test-case right now...
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 --- Comment #6 from Aldy Hernandez --- BTW, we're looking as to why there are so many calls to varying_p. Something seems off.
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 --- Comment #5 from Aldy Hernandez --- (In reply to Richard Biener from comment #4) > Or > > bool > irange::symbolic_p () const > { > return (!varying_p () > && !undefined_p () > && (!is_gimple_min_invariant (min ()) > || !is_gimple_min_invariant (max (; > } > > which should be simply > > bool > irange::symbolic_p () const > { > return (m_num_ranges == 1 > && (!is_gimple_min_invariant (min ()) > || !is_gimple_min_invariant (max (; > } > > ? Or do we have symbolic anti ranges represented with two ranges? > > Likewise > > bool > irange::constant_p () const > { > return (!varying_p () > && !undefined_p () > && TREE_CODE (min ()) == INTEGER_CST > && TREE_CODE (max ()) == INTEGER_CST); > } > > err - I thought varying == constant... Those varying_p checks definitely look suspect. You should be able to just look at the min/max as you suggest. However, the undefined_p check must stay because it is really a check for num_ranges > 0, otherwise in the case of undefined_p, the is_gimple_*_invariant would dereference m_base[] which has nothing of interest. Perhaps a more obvious check would be m_num_ranges > 0, instead of the confusing undefined_p.
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 --- Comment #4 from Richard Biener --- Or bool irange::symbolic_p () const { return (!varying_p () && !undefined_p () && (!is_gimple_min_invariant (min ()) || !is_gimple_min_invariant (max (; } which should be simply bool irange::symbolic_p () const { return (m_num_ranges == 1 && (!is_gimple_min_invariant (min ()) || !is_gimple_min_invariant (max (; } ? Or do we have symbolic anti ranges represented with two ranges? Likewise bool irange::constant_p () const { return (!varying_p () && !undefined_p () && TREE_CODE (min ()) == INTEGER_CST && TREE_CODE (max ()) == INTEGER_CST); } err - I thought varying == constant... Note the testcase is fully accounted to rest of compilation: 850.63 ( 97%) 0.12 ( 24%) 864.41 ( 97%) 4351M ( 96%) so not sure where it is actually spent.
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 --- Comment #3 from Richard Biener --- inline bool irange::varying_p () const { if (legacy_mode_p ()) return m_kind == VR_VARYING; if (m_num_ranges != 1) return false; tree l = m_base[0]; tree u = m_base[1]; tree t = TREE_TYPE (l); unsigned prec = TYPE_PRECISION (t); signop sign = TYPE_SIGN (t); if (INTEGRAL_TYPE_P (t)) return (wi::to_wide (l) == wi::min_value (prec, sign) && wi::to_wide (u) == wi::max_value (prec, sign)); if (POINTER_TYPE_P (t)) return (wi::to_wide (l) == 0 && wi::to_wide (u) == wi::max_value (prec, sign)); return true; is truly excessive for !legacy_mode_p () ..., I think the pointer vs. int case is premature optimization and we might see more inlining and optimization when doing unconditional return (wi::to_wide (l) == wi::min_value (prec, sign) && wi::to_wide (u) == wi::max_value (prec, sign)); possibly there are simply too many varying_p calls as well. Like inline bool range_includes_zero_p (const irange *vr) { if (vr->undefined_p ()) return false; if (vr->varying_p ()) return true; return vr->may_contain_p (build_zero_cst (vr->type ())); also inline value_range_kind irange::kind () const { if (legacy_mode_p ()) return m_kind; if (undefined_p ()) return VR_UNDEFINED; if (varying_p ()) return VR_VARYING; return VR_RANGE; looks like quite expensive. IMHO since we have m_kind we should keep it up-to-date and make those checks cheap. And kill that legacy_mode_p () stuff!?
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 --- Comment #2 from Richard Biener --- >From the profile it looks like there's a lot tree INTEGER_CST work being done rather than sticking to wide_ints. That's always (constant-time) more expensive.
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 --- Comment #1 from Martin Liška --- I see the following in perf top: 9.84% cc1plus[.] wide_int_to_tree_1 6.59% cc1plus[.] irange::varying_p 6.13% cc1plus[.] bitmap_bit_p 4.35% cc1plus[.] wi::force_to_size 3.69% cc1plus[.] cache_wide_int_in_type_cache 2.83% cc1plus[.] irange::operator= 2.81% cc1plus[.] get_int_cst_ext_nunits 2.77% cc1plus[.] logical_stmt_cache::cacheable_p 2.77% cc1plus[.] gori_compute::compute_logical_operands_in_chain 2.43% cc1plus[.] gori_compute::compute_operand_range 2.33% cc1plus[.] compare_values_warnv 2.16% cc1plus[.] wi::max_value
[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081 Martin Liška changed: What|Removed |Added Known to work||10.3.0 Status|UNCONFIRMED |NEW Target Milestone|--- |11.0 Last reconfirmed||2021-04-14 Known to fail||11.0 Ever confirmed|0 |1