[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org --- Comment #15 from Tamar Christina --- Hi All, I created a new issue PR85449 that occurs after this change. IPA seems to be mixing up some of the recursive calls in a specialized function causing an invalid code path.
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 Martin Liška changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED Known to fail|8.0.1 | --- Comment #14 from Martin Liška --- New ICE has a new PR85447.
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 Martin Liška changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #13 from Martin Liška --- (In reply to Martin Liška from comment #12) > Created attachment 43973 [details] > Unreduced test-case > > Unfortunately I see one more ICE (reduced from Firefox w/ -O3). I'm > currently reducing that, but hopefully will be possible to debug for > unreduced version. $ ./xg++ -B. -O3 -c ~/Downloads/ice.ii /home/marxin/Downloads/ice.ii:5441:214: internal compiler error: in create_specialized_node, at ipa-cp.c:3870 already_AddRefed nsImageLoadingContent::GetCurrentRequestFinalURI() { nsCOMPtr uri; if (mCurrentRequest) { mCurrentRequest->GetFinalURI(getter_AddRefs(uri)); } return uri.forget(); } ^ 0x1624b99 create_specialized_node ../../gcc/ipa-cp.c:3870 0x16252a2 decide_about_value../../gcc/ipa-cp.c:4699 0x1626440 decide_whether_version_node ../../gcc/ipa-cp.c:4743 0x1626440 ipcp_decision_stage ../../gcc/ipa-cp.c:4906 0x1626440 ipcp_driver ../../gcc/ipa-cp.c:5086
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 --- Comment #12 from Martin Liška --- Created attachment 43973 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43973=edit Unreduced test-case Unfortunately I see one more ICE (reduced from Firefox w/ -O3). I'm currently reducing that, but hopefully will be possible to debug for unreduced version.
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #11 from Jakub Jelinek --- So hopefully fixed.
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 --- Comment #10 from Martin Jambor --- Author: jamborm Date: Wed Apr 11 13:30:53 2018 New Revision: 259319 URL: https://gcc.gnu.org/viewcvs?rev=259319=gcc=rev Log: Improve IPA-CP handling of self-recursive calls 2018-04-11 Martin JamborPR ipa/84149 * ipa-cp.c (propagate_vals_across_pass_through): Expand comment. (cgraph_edge_brings_value_p): New parameter dest_val, check if it is not the same as the source val. (cgraph_edge_brings_value_p): New parameter. (gather_edges_for_value): Pass destination value to cgraph_edge_brings_value_p. (perhaps_add_new_callers): Likewise. (get_info_about_necessary_edges): Likewise and exclude values brought only by self-recursive edges. (create_specialized_node): Redirect only clones of self-calling edges. (+self_recursive_pass_through_p): New function. (find_more_scalar_values_for_callers_subset): Use it. (find_aggregate_values_for_callers_subset): Likewise. (known_aggs_to_agg_replacement_list): Removed. (decide_whether_version_node): Re-calculate known constants for all remaining context clones. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-cp.c
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 --- Comment #9 from Martin Jambor --- I have posted a proposed fix to the mailing list as: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00419.html (please ignore the stuff I mistakenly pasted to the subject line).
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 Martin Jambor changed: What|Removed |Added Attachment #43588|0 |1 is obsolete|| --- Comment #8 from Martin Jambor --- Created attachment 43872 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43872=edit Current patch This patch, which speeds up mcf and which is more consistent with the IPA-CP design and does not miscompile mcf, has passed LTO bootstrap and testing, I will try it on a few bigger applications and submit it shortly afterwards.
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- Any progress here? I don't see the patch posted on gcc-patches.
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Priority|P3 |P1 --- Comment #6 from Richard Biener --- Seems we have a handle on this bug?
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 --- Comment #5 from Martin Jambor --- Created attachment 43588 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43588=edit Prototype patch I have to leave the office now but this is the (only very very lightly tested) prototype patch that fixes the regression as described above. It still needs a bit of care before submitting, however.
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 Martin Jambor changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #4 from Martin Jambor --- In my (more recent) checkout of trunk, the estimated benefit is even lower, only 270. Raising devirtualization hint to compensate seems excessive, I need: diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index ee41a8d55b7..09ba92ef14d 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -2612,6 +2612,25 @@ devirtualization_time_bonus (struct cgraph_node *node, res += 7 / ((int)speculative + 1); } + if (res) +/* Devirtualization is likely more important in recursive callers because + those cannot be entirely inlined themselves. */ +for (cgraph_edge *e = node->callees; e; e = e->next_callee) + { + enum availability avail; + cgraph_node *callee = e->callee->function_symbol (); + if (e->caller == callee + && avail >= AVAIL_AVAILABLE) + { + res = 8 * res; + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Recursive devirtualization bohus %i\n", + res); + break; + } + } + return res; } Which is excessive (but it may make sense to boost IPA-CP bonuses for self-recursive functions a bit since those cannot be dealt with completely by inlining). Nevertless I already have a prototype patch which can explain to IPA-CP that only there is only one remaining value in the contexts for which we did not clone and replace it there too.
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 Martin Liška changed: What|Removed |Added Assignee|marxin at gcc dot gnu.org |jamborm at gcc dot gnu.org --- Comment #3 from Martin Liška --- So the problematic predictor change is: - DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (91), 0) + DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (71), 0) It lower frequency of BB 5: primal_bea_mpp (int64_t m, struct arc_t * arcs, struct arc_t * stop_arcs, int64_t * basket_sizes, struct BASKET * * perm, int thread, struct arc_t * * end_arc, int64_t step, int64_t num_threads, int64_t max_elems) { ... [local count: 12634988]: READY: # DEBUG BEGIN_STMT _113 = *_40; _114 = (sizetype) _113; _115 = _114 + 1; _116 = _115 * 8; _117 = perm_148(D) + _116; _118 = *_117; _118->number = -1; # DEBUG BEGIN_STMT _122 = *_40; if (_122 == 0) goto ; [29.93%] else goto ; [70.07%] [local count: 3781652]: # DEBUG BEGIN_STMT // predicted unlikely by early return (on trees) predictor. goto ; [100.00%] [local count: 8853336]: # DEBUG BEGIN_STMT _127 = (long unsigned int) _122; _128 = perm_148(D) + 8; spec_qsort (_128, _127, 8, cost_compare); # DEBUG BEGIN_STMT _176 = MEM[(struct BASKET * *)perm_148(D) + 8B]; [local count: 12634988]: # _135 = PHI <0B(34), _176(35)> return _135; } It's function defined in pbeampp.c. Then in IPA CP we end up with: Evaluating opportunities for spec_qsort/353. - considering value arc_compare for param #3 int (*) (const void *, const void *) (caller_count: 1) good_cloning_opportunity_p (time: 143, size: 269, freq_sum: 987, scc) -> evaluation: 314, threshold: 500 good_cloning_opportunity_p (time: 942, size: 866, freq_sum: 987, scc) -> evaluation: 643, threshold: 500 Creating a specialized node of spec_qsort/353. replacing param #2 size_t with const 8 replacing param #3 int (*) (const void *, const void *) with const arc_compare Accounting size:173.00, time:191.00 on predicate exec:(true) Accounting size:3.00, time:2.00 on new predicate exec:(not inlined) the new node is spec_qsort.constprop/377. ipa-prop: Discovered an indirect call to a known target (spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 71 converting indirect call in spec_qsort.constprop to direct call to arc_compare controlled uses count of param 3 bumped down to 8 ipa-prop: Discovered an indirect call to a known target (spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 218 converting indirect call in spec_qsort.constprop to direct call to arc_compare controlled uses count of param 3 bumped down to 7 ipa-prop: Discovered an indirect call to a known target (spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 267 converting indirect call in spec_qsort.constprop to direct call to arc_compare controlled uses count of param 3 bumped down to 6 ipa-prop: Discovered an indirect call to a known target (spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 348 converting indirect call in spec_qsort.constprop to direct call to arc_compare controlled uses count of param 3 bumped down to 5 - considering value cost_compare for param #3 int (*) (const void *, const void *) (caller_count: 1) good_cloning_opportunity_p (time: 143, size: 269, freq_sum: 701, scc) -> evaluation: 223, threshold: 500 good_cloning_opportunity_p (time: 942, size: 866, freq_sum: 701, scc) -> evaluation: 457, threshold: 500 - Creating a specialized node of spec_qsort/353 for all known contexts. replacing param #2 size_t with const 8 Accounting size:173.00, time:191.00 on predicate exec:(true) Accounting size:3.00, time:2.00 on new predicate exec:(not inlined) the new node is spec_qsort.constprop/378. Evaluating opportunities for spec_qsort/353. - adding an extra caller spec_qsort.constprop/377 of spec_qsort.constprop/377 - considering value cost_compare for param #3 int (*) (const void *, const void *) (caller_count: 1) good_cloning_opportunity_p (time: 143, size: 269, freq_sum: 701, scc) -> evaluation: 223, threshold: 500 good_cloning_opportunity_p (time: 942, size: 866, freq_sum: 701, scc) -> evaluation: 457, threshold: 500 Marking node as dead: spec_qsort/353. Here you can see evaluation is 457, which the threshold is 500. Another question is why IPA inline does not inline the call. Answer is here: IPA function summary for primal_bea_mpp.constprop/366 inlinable global time: 127.809798 self size: 134 global size: 132 min size: 10 self stack: 0 global stack:0 size:112.00, time:108.00 size:7.00, time:2.00, executed if:(not inlined) size:2.00, time:2.00, nonconst if:(op3 changed)
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 --- Comment #2 from Martin Liška --- I can confirm the regression on a Haswell CPU.
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2018-02-05 Assignee|unassigned at gcc dot gnu.org |marxin at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Martin Liška --- Let me take a look.
[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149 Richard Biener changed: What|Removed |Added Target Milestone|--- |8.0