[Bug rtl-optimization/80197] pgo dramatically pessimizes scimark2 MonteCarlo benchmark

2017-07-24 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80197

--- Comment #9 from Jan Hubicka  ---
The original idea of tracing was that we can pro-actively duplicate tails and
rely on crossjumping to merge the paths back if they did not trigger context
sensitive optimizations.  Nowdays crossjumping much weaker than it used to be
because it does not handle well RTL genertated from SSA and we do not do any
gimple level tail merging after tracer which we probably should.

[Bug rtl-optimization/80197] pgo dramatically pessimizes scimark2 MonteCarlo benchmark

2017-03-30 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80197

--- Comment #8 from rguenther at suse dot de  ---
On Wed, 29 Mar 2017, amonakov at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80197
> 
> --- Comment #7 from Alexander Monakov  ---
> No, with fixed-up inlining -ftracer sees reasonable edge probabilities. The
> reason tracer makes things worse here, is that it clones the path leading to a
> 50%/50% conditional branch (and correctly stops at that branch), making the
> tiny BB under that branch ineligible(?) for if-conversion. We go from
> 
> 
> if (!cond) goto 
>   
>   var = VAL;   // this can eventually become a cmov
> 
> 
> to
> 
> 
> if (!cond) goto 
> goto 
> 
> ...
> 
> 
> if (!cond) goto 
>   
>   var = VAL;  // this doesn't become a cmov
> 
> 
> 
> I think in principle if-conversion could still do its job here by duplicating
> the conditional var=VAL assignment under BB0_1.
> 
> Here's a silly compile-only sample where -O2 -ftracer is worse than -O2 due to
> this effect:
> 
> void f(long n, signed char *x)
> {
>   for (; n; n--) {
> long a=x[n], b;
> if (!a)
>   a = 42;
> b = x[a];
> if (b < 0)
>   b += a;
> x[b] = 0;
>   }
> }

It's hard for tracer to predict followup optimization opportunities
in the isolated path and weight that against missed if-conversion
opportunities.

Of course the fact that late phiopt runs after tracer / split-paths
doesn't help, nor that phiopt doesn't catch this kind of
if-conversion (the idea was that the RTL pass can do _much_ better
in assessing if-conversion const/benefit).

Tracer doesn't have a very sophisticated cost model either.

[Bug rtl-optimization/80197] pgo dramatically pessimizes scimark2 MonteCarlo benchmark

2017-03-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80197

--- Comment #7 from Alexander Monakov  ---
No, with fixed-up inlining -ftracer sees reasonable edge probabilities. The
reason tracer makes things worse here, is that it clones the path leading to a
50%/50% conditional branch (and correctly stops at that branch), making the
tiny BB under that branch ineligible(?) for if-conversion. We go from


if (!cond) goto 
  
  var = VAL;   // this can eventually become a cmov


to


if (!cond) goto 
goto 

...


if (!cond) goto 
  
  var = VAL;  // this doesn't become a cmov



I think in principle if-conversion could still do its job here by duplicating
the conditional var=VAL assignment under BB0_1.

Here's a silly compile-only sample where -O2 -ftracer is worse than -O2 due to
this effect:

void f(long n, signed char *x)
{
  for (; n; n--) {
long a=x[n], b;
if (!a)
  a = 42;
b = x[a];
if (b < 0)
  b += a;
x[b] = 0;
  }
}

[Bug rtl-optimization/80197] pgo dramatically pessimizes scimark2 MonteCarlo benchmark

2017-03-29 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80197

--- Comment #6 from rguenther at suse dot de  ---
On Tue, 28 Mar 2017, amonakov at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80197
> 
> --- Comment #5 from Alexander Monakov  ---
> On trunk, manually fixing up inlining is not enough: trunk additionally needs
> -fno-tracer, otherwise crucial if-conversion of 'if (k < 0) k += m1;' is
> prevented.

That means -ftracer has similar issues as path splitting.  -ftracer
simply looks for a path with high probability so supposedly the
distorted profile might be the reason for this?

[Bug rtl-optimization/80197] pgo dramatically pessimizes scimark2 MonteCarlo benchmark

2017-03-28 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80197

--- Comment #5 from Alexander Monakov  ---
On trunk, manually fixing up inlining is not enough: trunk additionally needs
-fno-tracer, otherwise crucial if-conversion of 'if (k < 0) k += m1;' is
prevented.

[Bug rtl-optimization/80197] pgo dramatically pessimizes scimark2 MonteCarlo benchmark

2017-03-27 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80197

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #4 from Alexander Monakov  ---
According to my analysis, this is mostly caused by different inlining decisions
with regards to inlining new_Random_seed into MonteCarlo_integrate.  Inlining
happens at profile-generate time, but does not happen at profile-use time. 
This appears to throw off edge probabilities, and also prevents the compiler
from seeing that R->haveRange accessed in Random_nextDouble (which is inlined)
is always 0.

Declaring new_Random_seed (which is called once) as 'inline
__attribute__((always_inline))' makes code generation sane again.

[Bug rtl-optimization/80197] pgo dramatically pessimizes scimark2 MonteCarlo benchmark

2017-03-27 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80197

--- Comment #3 from Uroš Bizjak  ---
(In reply to rguent...@suse.de from comment #2)

> I think that if FDO says either the true or false edge is very likely
> then not if-converting the loop is best?  Or is a well-predicted
> conditional move as good as a well-predicted if?  10% missed branches
> would be more than

Please note that when if-conversion succeeded through noce_try_addcc, we don't
care about prediction anymore. The conversion converts:

ucomisd %xmm5, %xmm4
jb  .L17
.L16:
addl$1, %ebp
.L17:

to:

ucomisd %xmm0, %xmm3# 195   *cmpiudf/2  [length = 4]
sbbl$-1, %ebx   # 196   subsi3_carry/1  [length = 3]

IMO, this conversion should always be performed, as it is always a win.

[Bug rtl-optimization/80197] pgo dramatically pessimizes scimark2 MonteCarlo benchmark

2017-03-27 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80197

--- Comment #2 from rguenther at suse dot de  ---
On Mon, 27 Mar 2017, ubizjak at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80197
> 
> Uroš Bizjak  changed:
> 
>What|Removed |Added
> 
>  Status|UNCONFIRMED |NEW
>Last reconfirmed||2017-03-27
>  CC||jakub at gcc dot gnu.org,
>||rguenth at gcc dot gnu.org
>  Ever confirmed|0   |1
> 
> --- Comment #1 from Uroš Bizjak  ---
> For some reason, recently fixed if-conversion (PR79389) does not trigger in 
> PGO
> case. There is still a jump with -O2:
> 
> mulsd   %xmm0, %xmm5
> mulsd   %xmm2, %xmm2
> addsd   %xmm2, %xmm5
> ucomisd %xmm5, %xmm4
> jb  .L17
> .L16:
> addl$1, %ebp
> .L17:
> addl$1, %edi
> cmpl%edi, %ebx
> je  .L5
> 
> Since this asm corresponds to random operands, the jump can't be predicted:
> 
> for (count=0; count {
> double x= Random_nextDouble(R);
> double y= Random_nextDouble(R);
> 
> if ( x*x + y*y <= 1.0)
>  under_curve ++;
> 
> }
> 
> Based on the discussion in PR79389, and the fact that -O2 and -O3 both compile
> to a jump, I suspect that loop splitting cost model should be fine tuned to
> also handle PGO case. Note that
> 
> Adding some CCs.

Not sure - loop splitting isn't done here and doing it would remove
the if-conversion opportunity.

I think that if FDO says either the true or false edge is very likely
then not if-converting the loop is best?  Or is a well-predicted
conditional move as good as a well-predicted if?  10% missed branches
would be more than

/* When branch is predicted to be taken with probability lower than this
   threshold (in percent), then it is considered well predictable. */
DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME,
  "predictable-branch-outcome",
  "Maximal estimated outcome of branch considered predictable.",
  2, 0, 50)

so it shouldn't affect if-conversion...

Are we sure we're not hitting some architectural limitation here?  Like
disabling the loop stream cache because of size or the CFG?
(otoh we have calls in the loop(?)).

[Bug rtl-optimization/80197] pgo dramatically pessimizes scimark2 MonteCarlo benchmark

2017-03-27 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80197

Uroš Bizjak  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-03-27
 CC||jakub at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Uroš Bizjak  ---
For some reason, recently fixed if-conversion (PR79389) does not trigger in PGO
case. There is still a jump with -O2:

mulsd   %xmm0, %xmm5
mulsd   %xmm2, %xmm2
addsd   %xmm2, %xmm5
ucomisd %xmm5, %xmm4
jb  .L17
.L16:
addl$1, %ebp
.L17:
addl$1, %edi
cmpl%edi, %ebx
je  .L5

Since this asm corresponds to random operands, the jump can't be predicted:

for (count=0; count