Re: potential for GHC benchmarks w.r.t. optimisations being incorrect

2018-05-05 Thread Joachim Breitner
Hi,

Am Samstag, den 05.05.2018, 12:33 -0400 schrieb Daniel Cartwright:
> I write this out of curiosity, as well as concern, over how this may affect 
> GHC.

our performance measurements are pretty non-scientific. For many
decades, developers just ran our benchmark suite (nofib) before and
after their change, hopefully on a cleanly built working copy, and
pasted the most interesting numbers in the commit logs. Maybe some went
for coffee to have an otherwise relatively quiet machine (or have some
remote setup), maybe not.

In the end, the run-time performance numbers are often ignored and we
we focus on comparing the effects of *dynamic heap allocations*, which
are much more stable across different environments, and which we
believe are a good proxy for actual performance, at least for the kind
of high-level optimizations that we work on in the core-to-core
pipeline. But this assumption is folklore, and not scientifically
investigated.

Since two years or so we started collecting performance numbers for
every commit to the GHC repository, and I wrote a tool to print
comparisons: https://perf.haskell.org/ghc/

This runs on a dedicated physical machine, and still the run-time
numbers were varying too widely and gave us many false warnings (and
probably reported many false improvements which we of course were happy
to believe). I have since switched to measuring only dynamic
instruction counts with valgrind. This means that we cannot detect
improvement or regressions due to certain low-level stuff, but we gain
the ability to reliably measure *something* that we expect to change
when we improve (or accidentally worsen) the high-level
transformations.

I wish there were a better way of getting a reliable, stable number
that reflects the actual performance.

Cheers,
Joachim


-- 
Joachim Breitner
  m...@joachim-breitner.de
  http://www.joachim-breitner.de/


signature.asc
Description: This is a digitally signed message part
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Basic Block Layout in the NCG

2018-05-05 Thread Andreas Klebinger

Does anyone have good hints for literature on basic block layout algorithms?
I've run into a few examples where the current algorithm falls apart 
while working on Cmm.


There is a trac ticket https://ghc.haskell.org/trac/ghc/ticket/15124#ticket
where I tracked some of the issues I ran into.

As it stands some cmm optimizations are far out weighted by
accidental changes they cause in the layout of basic blocks.

The main problem seems to be that the current codegen only considers the 
last jump

in a basic block as relevant for code layout.

This works well for linear chains of control flow but behaves badly and 
somewhat
unpredictable when dealing with branch heavy code where blocks have more 
than

one successor or calls.

In particular if we have a loop

A jmp B call C call D

which we enter into at block B from Block E
we would like something like:

E,B,C,D,A

Which means with some luck C/D might be still in cache if we return from 
the call.


However we can currently get:

E,B,A,X,D,X,C

where X are other unrelated blocks. This happens since call edges are 
invisible to the layout algorithm.
It even happens when we have (conditional) jumps from B  to C and C to D 
since these are invisible as well!


I came across cases where inverting conditions lead to big performance 
losses since suddenly block layout

got all messed up. (~4% slowdown for the worst offenders).

So I'm looking for solutions there.

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


potential for GHC benchmarks w.r.t. optimisations being incorrect

2018-05-05 Thread Daniel Cartwright
I am admittedly unsure of how GHC's optimisation benchmarks are currently
implemented/carried out, but I feel as though this paper and its findings
could be relevant to GHC devs:

http://cis.upenn.edu/~cis501/papers/producing-wrong-data.pdf

Basically, according to this paper, the cache effects of changing where the
stack starts based on the number of environment variables are huge for many
compiler benchmarks, and adjusting for this effect shows that gcc -O3 is
only in actuality 1% faster than gcc -O2.

Some further thoughts, per http://aftermath.rocks/2016/04/11/wrong_data/ :

"The question they looked at was the following: does the compiler’s -O3
optimization flag result in speedups over -O2? This question is
investigated in the light of measurement biases caused by two sources: Unix
environment size, and linking order.
to the total size of the representation of Unix environment variables (such
as PATH, HOME, etc.). Typically, these variables are part of the memory
image of each process. The call stack begins where the environment ends.
This gives rise to the following hypothesis: changing the sizes of
(unused!) environment variables can change the alignment of variables on
the stack and thus the performance of the program under test due to
different behavior of hardware buffers such as caches or TLBs. (This is the
source of the hypothetical example in the first paragraph, which I made up.
On the machine where I am typing this, my user name appears in 12 of the
environment variables that are set by default. All other things being
equal, another user with a user name of a different length will have an
environment size that differs by a multiple of 12 bytes.)"

"So does this hypothesis hold? Yes. Using a simple computational kernel the
authors observe that changing the size of the environment can often cause a
slowdown of 33% and, in one particular case, by 300%. On larger benchmarks
the effects are less pronounced but still present. Using the C programs
from the standard SPEC CPU2006 benchmark suite, the effects of -O2 and -O3
optimizations were compared across a wide range of environment sizes. For
several of the programs a wide range of variations was observed, and the
results often included both positive and negative observations. The effects
were not correlated with the environment size. All this means that for some
benchmarks, a compiler engineer might by accident test a purported
optimization in a lucky environment and observe a 10% speedup, while users
of the same optimization in an unlucky environment may have a 10% slowdown
on the same workload."

I write this out of curiosity, as well as concern, over how this may affect
GHC.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: End of Windows Vista support in GHC-8.6?

2018-05-05 Thread Phyx
Hi Simon,

Whatever happened to this? The wiki was updated but I don't see a commit
actually removing vista support.

Did you end up not doing this anymore?

Thanks,
Tamar

On Mon, Mar 5, 2018 at 7:21 PM, Simon Jakobi 
wrote:

> Thanks everyone!
>
> I have updated https://ghc.haskell.org/trac/ghc/wiki/Platforms/Windows
> accordingly.
>
> Cheers,
> Simon
>
> 2018-03-05 18:29 GMT+01:00 Phyx :
>
>>
>>
>> On Mon, Mar 5, 2018, 17:23 Ben Gamari  wrote:
>>
>>> Simon Jakobi via ghc-devs  writes:
>>>
>>> > Hi!
>>> >
>>> > Given that Vista’s EOL was in April 2017
>>> > >> a-end-of-support>
>>> > i assume that there’s no intention to keep supporting it in GHC-8.6!?
>>> >
>>> > I’m asking because I intend to use a function
>>> > >> 405488.aspx>
>>> > that requires Windows 7 or newer for #13362
>>> > .
>>> >
>>> Given that it's EOL'd, dropping Vista sounds reasonable to me.
>>>
>>> Tamar, any objection?
>>>
>>
>> No objections, however do make sure to test both 32 and 64 bit builds of
>> ghc when you use the API, it's new enough and rare enough that it may not
>> be implemented in both mingw-64 tool chains (we've had similar issues
>> before).
>>
>> Thanks,
>> Tamar
>>
>>
>>> Cheers,
>>>
>>> - Ben
>>>
>>
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs