[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-04-18 Thread tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #15 from Tamar Christina  ---
Hi All,

I created a new issue PR85449 that occurs after this change. IPA seems to be
mixing up some of the recursive calls in a specialized function causing an
invalid code path.

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-04-18 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

Martin Liška  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED
  Known to fail|8.0.1   |

--- Comment #14 from Martin Liška  ---
New ICE has a new PR85447.

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-04-18 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

Martin Liška  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #13 from Martin Liška  ---
(In reply to Martin Liška from comment #12)
> Created attachment 43973 [details]
> Unreduced test-case
> 
> Unfortunately I see one more ICE (reduced from Firefox w/ -O3). I'm
> currently reducing that, but hopefully will be possible to debug for
> unreduced version.

$ ./xg++ -B. -O3 -c ~/Downloads/ice.ii 

/home/marxin/Downloads/ice.ii:5441:214: internal compiler error: in
create_specialized_node, at ipa-cp.c:3870
 already_AddRefed
nsImageLoadingContent::GetCurrentRequestFinalURI() {   nsCOMPtr uri;  
if (mCurrentRequest) { mCurrentRequest->GetFinalURI(getter_AddRefs(uri));  
}   return uri.forget(); }
   
   
  ^
0x1624b99 create_specialized_node
../../gcc/ipa-cp.c:3870
0x16252a2 decide_about_value
../../gcc/ipa-cp.c:4699
0x1626440 decide_whether_version_node
../../gcc/ipa-cp.c:4743
0x1626440 ipcp_decision_stage
../../gcc/ipa-cp.c:4906
0x1626440 ipcp_driver
../../gcc/ipa-cp.c:5086

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-04-18 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

--- Comment #12 from Martin Liška  ---
Created attachment 43973
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43973=edit
Unreduced test-case

Unfortunately I see one more ICE (reduced from Firefox w/ -O3). I'm currently
reducing that, but hopefully will be possible to debug for unreduced version.

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-04-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Jakub Jelinek  ---
So hopefully fixed.

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-04-11 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

--- Comment #10 from Martin Jambor  ---
Author: jamborm
Date: Wed Apr 11 13:30:53 2018
New Revision: 259319

URL: https://gcc.gnu.org/viewcvs?rev=259319=gcc=rev
Log:
Improve IPA-CP handling of self-recursive calls

2018-04-11  Martin Jambor  

PR ipa/84149
* ipa-cp.c (propagate_vals_across_pass_through): Expand comment.
(cgraph_edge_brings_value_p): New parameter dest_val, check if it is
not the same as the source val.
(cgraph_edge_brings_value_p): New parameter.
(gather_edges_for_value): Pass destination value to
cgraph_edge_brings_value_p.
(perhaps_add_new_callers): Likewise.
(get_info_about_necessary_edges): Likewise and exclude values brought
only by self-recursive edges.
(create_specialized_node): Redirect only clones of self-calling edges.
(+self_recursive_pass_through_p): New function.
(find_more_scalar_values_for_callers_subset): Use it.
(find_aggregate_values_for_callers_subset): Likewise.
(known_aggs_to_agg_replacement_list): Removed.
(decide_whether_version_node): Re-calculate known constants for all
remaining context clones.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-cp.c

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-04-10 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

--- Comment #9 from Martin Jambor  ---
I have posted a proposed fix to the mailing list as:

https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00419.html

(please ignore the stuff I mistakenly pasted to the subject line).

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-04-06 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

Martin Jambor  changed:

   What|Removed |Added

  Attachment #43588|0   |1
is obsolete||

--- Comment #8 from Martin Jambor  ---
Created attachment 43872
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43872=edit
Current patch

This patch, which speeds up mcf and which is more consistent with the IPA-CP
design and does not miscompile mcf, has passed LTO bootstrap and testing, I
will try it on a few bigger applications and submit it shortly afterwards.

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-04-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
Any progress here?  I don't see the patch posted on gcc-patches.

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-03-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Priority|P3  |P1

--- Comment #6 from Richard Biener  ---
Seems we have a handle on this bug?

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-03-07 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

--- Comment #5 from Martin Jambor  ---
Created attachment 43588
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43588=edit
Prototype patch

I have to leave the office now but this is the (only very very lightly tested)
prototype patch that fixes the regression as described above.  It still needs a
bit of care before submitting, however.

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-03-07 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

Martin Jambor  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #4 from Martin Jambor  ---
In my (more recent) checkout of trunk, the estimated benefit is even
lower, only 270.  Raising devirtualization hint to compensate seems
excessive, I need:

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index ee41a8d55b7..09ba92ef14d 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -2612,6 +2612,25 @@ devirtualization_time_bonus (struct cgraph_node *node,
res += 7 / ((int)speculative + 1);
 }

+  if (res)
+/*  Devirtualization is likely more important in recursive callers because
+   those cannot be entirely inlined themselves.  */
+for (cgraph_edge *e = node->callees; e; e = e->next_callee)
+  {
+   enum availability avail;
+   cgraph_node *callee = e->callee->function_symbol ();
+   if (e->caller == callee
+   && avail >= AVAIL_AVAILABLE)
+ {
+   res = 8 * res;
+
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "Recursive devirtualization bohus %i\n",
+  res);
+   break;
+ }
+  }
+
   return res;
 }

Which is excessive (but it may make sense to boost IPA-CP bonuses for
self-recursive functions a bit since those cannot be dealt with
completely by inlining).  Nevertless I already have a prototype patch which
can explain to IPA-CP that only there is only one remaining value in the
contexts for which we did not clone and replace it there too.

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-02-13 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

Martin Liška  changed:

   What|Removed |Added

   Assignee|marxin at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #3 from Martin Liška  ---
So the problematic predictor change is:

- DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (91), 0)
+ DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (71), 0)

It lower frequency of BB 5:

primal_bea_mpp (int64_t m, struct arc_t * arcs, struct arc_t * stop_arcs,
int64_t * basket_sizes, struct BASKET * * perm, int thread, struct arc_t * *
end_arc, int64_t step, int64_t num_threads, int64_t max_elems)
{
...
   [local count: 12634988]:
READY:
  # DEBUG BEGIN_STMT
  _113 = *_40;
  _114 = (sizetype) _113;
  _115 = _114 + 1;
  _116 = _115 * 8;
  _117 = perm_148(D) + _116;
  _118 = *_117;
  _118->number = -1;
  # DEBUG BEGIN_STMT
  _122 = *_40;
  if (_122 == 0)
goto ; [29.93%]
  else
goto ; [70.07%]

   [local count: 3781652]:
  # DEBUG BEGIN_STMT
  // predicted unlikely by early return (on trees) predictor.
  goto ; [100.00%]

   [local count: 8853336]:
  # DEBUG BEGIN_STMT
  _127 = (long unsigned int) _122;
  _128 = perm_148(D) + 8;
  spec_qsort (_128, _127, 8, cost_compare);
  # DEBUG BEGIN_STMT
  _176 = MEM[(struct BASKET * *)perm_148(D) + 8B];

   [local count: 12634988]:
  # _135 = PHI <0B(34), _176(35)>
  return _135;
}

It's function defined in pbeampp.c.

Then in IPA CP we end up with:

Evaluating opportunities for spec_qsort/353.
 - considering value arc_compare for param #3 int (*) (const void *,
const void *) (caller_count: 1)
 good_cloning_opportunity_p (time: 143, size: 269, freq_sum: 987, scc) ->
evaluation: 314, threshold: 500
 good_cloning_opportunity_p (time: 942, size: 866, freq_sum: 987, scc) ->
evaluation: 643, threshold: 500
  Creating a specialized node of spec_qsort/353.
replacing param #2 size_t with const 8
replacing param #3 int (*) (const void *, const void *) with const
arc_compare
Accounting size:173.00, time:191.00 on predicate exec:(true)
Accounting size:3.00, time:2.00 on new predicate exec:(not
inlined)
 the new node is spec_qsort.constprop/377.
ipa-prop: Discovered an indirect call to a known target
(spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 71
converting indirect call in spec_qsort.constprop to direct call to arc_compare
 controlled uses count of param 3 bumped down to 8
ipa-prop: Discovered an indirect call to a known target
(spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 218
converting indirect call in spec_qsort.constprop to direct call to arc_compare
 controlled uses count of param 3 bumped down to 7
ipa-prop: Discovered an indirect call to a known target
(spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 267
converting indirect call in spec_qsort.constprop to direct call to arc_compare
 controlled uses count of param 3 bumped down to 6
ipa-prop: Discovered an indirect call to a known target
(spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 348
converting indirect call in spec_qsort.constprop to direct call to arc_compare
 controlled uses count of param 3 bumped down to 5
 - considering value cost_compare for param #3 int (*) (const void *,
const void *) (caller_count: 1)
 good_cloning_opportunity_p (time: 143, size: 269, freq_sum: 701, scc) ->
evaluation: 223, threshold: 500
 good_cloning_opportunity_p (time: 942, size: 866, freq_sum: 701, scc) ->
evaluation: 457, threshold: 500
 - Creating a specialized node of spec_qsort/353 for all known contexts.
replacing param #2 size_t with const 8
Accounting size:173.00, time:191.00 on predicate exec:(true)
Accounting size:3.00, time:2.00 on new predicate exec:(not
inlined)
 the new node is spec_qsort.constprop/378.

Evaluating opportunities for spec_qsort/353.
 - adding an extra caller spec_qsort.constprop/377 of spec_qsort.constprop/377
 - considering value cost_compare for param #3 int (*) (const void *,
const void *) (caller_count: 1)
 good_cloning_opportunity_p (time: 143, size: 269, freq_sum: 701, scc) ->
evaluation: 223, threshold: 500
 good_cloning_opportunity_p (time: 942, size: 866, freq_sum: 701, scc) ->
evaluation: 457, threshold: 500
  Marking node as dead: spec_qsort/353.

Here you can see evaluation is 457, which the threshold is 500.

Another question is why IPA inline does not inline the call. Answer is here:

IPA function summary for primal_bea_mpp.constprop/366 inlinable
  global time: 127.809798
  self size:   134
  global size: 132
  min size:   10
  self stack:  0
  global stack:0
size:112.00, time:108.00
size:7.00, time:2.00,  executed if:(not inlined)
size:2.00, time:2.00,  nonconst if:(op3 changed)

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-02-08 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

--- Comment #2 from Martin Liška  ---
I can confirm the regression on a Haswell CPU.

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-02-05 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2018-02-05
   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Martin Liška  ---
Let me take a look.

[Bug ipa/84149] [8 Regression] SPEC CPU2017 505.mcf/605.mcf ~10% performance regression with r256888

2018-02-01 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |8.0