Re: Measuring performance of GHC

2016-12-07 Thread Michal Terepeta
On Tue, Dec 6, 2016 at 10:10 PM Ben Gamari  wrote:
> [...]
> > How should we proceed? Should I open a new ticket focused on this?
> > (maybe we could try to figure out all the details there?)
> >
> That sounds good to me.

Cool, opened: https://ghc.haskell.org/trac/ghc/ticket/12941 to track
this.

Cheers,
Michal
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-07 Thread Ben Gamari
Johannes Waldmann  writes:

> Hi Ben, thanks,
>
>
>>  4. run the build, `cabal configure --ghc-options="-p -hc" $args && cabal 
>> build`
>
> cabal configure $args --ghc-options="+RTS -p -hc -RTS"
>
Ahh, yes, of course. I should have tried this before hitting send.

>> You should end up with a .prof and .hp file.
>
> Yes, that works. - Typical output starts like this
>
> COST CENTRE   MODULE   %time %alloc
>
> SimplTopBinds SimplCore 60.7   57.3
> OccAnal   SimplCore  6.06.0
> Simplify  SimplCore  3.00.5
>
Ahh yes. So one of the things I neglected to mention is that the
profiled build flavour includes only a few cost centers. One of the
tricky aspects of the cost-center profiler is that it affects
core-to-core optimizations, meaning that the act of profiling may
actually shift around costs. Consequently by default the 
the build flavour includes a rather conservative set of cost-centers to
avoid distorting the results and preserve compiler performance.

Typically when I've profiled the compiler I already have a region of
interest in mind. I simply add `OPTIONS_GHC -fprof-auto` pragmas to the
modules involved. The build system already adds this flag to a few
top-level modules, hence the cost-centers which you observe (see
compiler/ghc.mk; search for GhcProfiled).

If you don't have a particular piece of the compiler in mind to study,
you certainly can just pepper every module with cost centers by adding
-fprof-auto to GhcStage2HcOpts (e.g. in mk/build.mk). The resulting
compiler may be a bit slow and you may need to be just a tad more
careful in evaluating the profile.

It might be nice if we had a more aggressive profiled build flavour
which added cost centers to a larger fraction of machinery of the
compiler, which excluding low-level utilities like FastString, which are
critical to the compiler's performance.

>
> These files are always called ghc.{prof,hp},
> how could this be changed? Ideally, the output file name
> would depend on the package being compiled,
> then the mechanism could probably be used with 'stack' builds.
>
We really should have a way to do this but sadly do not currently.
Ideally we would also have a way to change the default eventlog
destination path.

> Building executables mentioned in the cabal file will
> already overwrite profiling info from building libraries.
>
Note that you can instruct `cabal` to only build a single component of a
package. For instance, in the case of the `text` package you can build
just the library component with `cabal build text`.

> When I 'cabal build' the 'text' package,
> then the last actual compilation (which leaves
> the profiling info) is for cbits/cbits.c
>
Ahh right. Moreover, there is likely another GHC invocation after that
to link the final library. This is why I typically just use GHC
directly, perhaps stealing the command line produced by `cabal` (with
`-v`).

> I don't see how to build Data/Text.hs alone
> (with ghc, not via cabal), I am getting
> Failed to load interface for ‘Data.Text.Show’
>
Hmm, I'm not sure I see the issue. In the case of `text` I can just run
`ghc` from the source root (ensuring that I set the #include path with
`-I`),

$ git clone git://github.com/bos/text
$ cd text
$ ghc Data/Text.hs -Iinclude


However, some other packages (particularly those that make heavy use of
CPP) aren't entirely straightforward. In these cases I often find myself
copying bits from the command line produced by cabal.

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-07 Thread Joachim Breitner
Hi,

Am Mittwoch, den 07.12.2016, 11:34 +0100 schrieb Johannes Waldmann:
> When I 'cabal build' the 'text' package,
> then the last actual compilation (which leaves
> the profiling info) is for cbits/cbits.c
> 
> I don't see how to build Data/Text.hs alone
> (with ghc, not via cabal), I am getting
> Failed to load interface for ‘Data.Text.Show’

you can run
$ cabal build -v
and then copy’n’paste the command line that you are intested in, add
the flags
+RTS -p -hc -RTS -fforce-recomp
and run that again.

Greetings,
Joachim


-- 
Joachim “nomeata” Breitner
  m...@joachim-breitner.de • https://www.joachim-breitner.de/
  XMPP: nome...@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F
  Debian Developer: nome...@debian.org

signature.asc
Description: This is a digitally signed message part
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-07 Thread Johannes Waldmann
Hi Ben, thanks,


>  4. run the build, `cabal configure --ghc-options="-p -hc" $args && cabal 
> build`

cabal configure $args --ghc-options="+RTS -p -hc -RTS"


> You should end up with a .prof and .hp file.

Yes, that works. - Typical output starts like this

COST CENTRE   MODULE   %time %alloc

SimplTopBinds SimplCore 60.7   57.3
OccAnal   SimplCore  6.06.0
Simplify  SimplCore  3.00.5


These files are always called ghc.{prof,hp},
how could this be changed? Ideally, the output file name
would depend on the package being compiled,
then the mechanism could probably be used with 'stack' builds.

Building executables mentioned in the cabal file will
already overwrite profiling info from building libraries.

When I 'cabal build' the 'text' package,
then the last actual compilation (which leaves
the profiling info) is for cbits/cbits.c

I don't see how to build Data/Text.hs alone
(with ghc, not via cabal), I am getting
Failed to load interface for ‘Data.Text.Show’


- J.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-06 Thread Ben Gamari
Joachim Breitner  writes:

> Hi,
>
> Am Dienstag, den 06.12.2016, 17:14 -0500 schrieb Ben Gamari:
>> Joachim Breitner  writes:
>> 
>> > Hi,
>> > 
>> > Am Dienstag, den 06.12.2016, 19:27 + schrieb Michal Terepeta:
>> > > (isn't that's what perf.haskell.org is doing?)
>> > 
>> > for compiler performance, it only reports the test suite perf test
>> > number so far.
>> > 
>> > If someone modifies the nofib runner to give usable timing results for
>> > the compiler, I can easily track these numbers as well.
>> > 
>> 
>> I have a module [1] that does precisely this for the PITA project (which
>> I still have yet to put up on a public server; I'll try to make time for
>> this soon).
>
> Are you saying that the compile time measurements of a single run of
> the compiler are actually useful?
>
Not really, I generally ignore the compile times. However, knowing
compiler allocations on a per-module basis is quite nice.

> I’d expect we first have to make nofib call the compiler repeatedly.
>
This would be a good idea though.

> Also, shouldn’t this then become part of nofib-analye?
>
The logic for producing these statistics is implemented by
nofib-analyse's Slurp module today. All the script does is produce the
statistics in a more consistent format.

Cheers,

- Ben



signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-06 Thread Joachim Breitner
Hi,

Am Dienstag, den 06.12.2016, 17:14 -0500 schrieb Ben Gamari:
> Joachim Breitner  writes:
> 
> > Hi,
> > 
> > Am Dienstag, den 06.12.2016, 19:27 + schrieb Michal Terepeta:
> > > (isn't that's what perf.haskell.org is doing?)
> > 
> > for compiler performance, it only reports the test suite perf test
> > number so far.
> > 
> > If someone modifies the nofib runner to give usable timing results for
> > the compiler, I can easily track these numbers as well.
> > 
> 
> I have a module [1] that does precisely this for the PITA project (which
> I still have yet to put up on a public server; I'll try to make time for
> this soon).

Are you saying that the compile time measurements of a single run of
the compiler are actually useful? I’d expect we first have to make
nofib call the compiler repeatedly.

Also, shouldn’t this then become part of nofib-analye?

Greetings,
Joachim



-- 
Joachim “nomeata” Breitner
  m...@joachim-breitner.de • https://www.joachim-breitner.de/
  XMPP: nome...@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F
  Debian Developer: nome...@debian.org

signature.asc
Description: This is a digitally signed message part
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-06 Thread Ben Gamari
Joachim Breitner  writes:

> Hi,
>
> Am Dienstag, den 06.12.2016, 19:27 + schrieb Michal Terepeta:
>> (isn't that's what perf.haskell.org is doing?)
>
> for compiler performance, it only reports the test suite perf test
> number so far.
>
> If someone modifies the nofib runner to give usable timing results for
> the compiler, I can easily track these numbers as well.
>
I have a module [1] that does precisely this for the PITA project (which
I still have yet to put up on a public server; I'll try to make time for
this soon).

Cheers,

- Ben

[1] https://github.com/bgamari/ghc-perf-import/blob/master/SummarizeResults.hs



signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-06 Thread Ben Gamari
Johannes Waldmann  writes:

> Hi,
>
>> ... to compile it with a profiled GHC and look at the report?
>
> How hard is it to build hackage or stackage
> with a profiled ghc? (Does it require ghc magic, or can I do it?)
>
Not terribly hard although it could be made smoother.

To start you'll need to compile a profiled GHC. To do this you simply
want to something like the following,

 1. install the necessary build dependencies [1]
 2. get the sources [2]
 3. configure the tree to produce a profiled compiler:
   a. cp mk/build.mk.sample mk/build.mk
   b. uncomment the line `BuildFlavour=prof` in mk/build.mk
 4. `./boot && ./configure --prefix=$dest && make && make install`

Then for a particular package,

 1. get a working directory: `cabal unpack $pkg && cd $pkg-*`
 2. `args="--with-ghc=$dest/bin/ghc 
--allow-newer=base,ghc-prim,template-haskell,..."`
 3. install dependencies: `cabal install --only-dependencies $args .`
 4. run the build, `cabal configure --ghc-options="-p -hc" $args && cabal build`

You should end up with a .prof and .hp file. Honestly, I often skip the
`cabal` step entirely and just use `ghc` to compile a module of interest
directly.


[1] https://ghc.haskell.org/trac/ghc/wiki/Building/Preparation
[2] https://ghc.haskell.org/trac/ghc/wiki/Building/GettingTheSources


>> ... some obvious sub-optimal algorithms in GHC.
>
> obvious to whom? you mean sub-optimality is already known,
> or that it would become obvious once the reports are there?
>
I think "obvious" may have been a bit of a strong word here. There are
sub-optimal algorithms in the compiler and they can be found with a bit
of work. If you have a good testcase tickling such an algorithm finding
the issue can be quite straightforward; if not then the process can be a
bit trickier. However, GHC is just another Haskell program and
performance issues are approached just like in any other project.


> Even without profiling - does hackage collect timing information from
> its automated builds?
>
Sadly it doesn't. But...

> What needs to be done to add timing information in places like
> https://hackage.haskell.org/package/obdd-0.6.1/reports/1 ?
>
I've discussed the possibility with Herbert to add instrumentation in
his matrix builder [3] to collect this sort of information.

As a general note, keep in mind that timings are quite unstable,
dependent upon factors beyond our control at all levels of the stack.
For this reason, I generally prefer to rely on allocations, not
runtimes, while profiling.

As always, don't hesitate to drop by #ghc if you run into trouble.

Cheers,

- Ben


[3] http://matrix.hackage.haskell.org/packages


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-06 Thread Joachim Breitner
Hi,

Am Dienstag, den 06.12.2016, 19:27 + schrieb Michal Terepeta:
> (isn't that's what perf.haskell.org is doing?)

for compiler performance, it only reports the test suite perf test
number so far.

If someone modifies the nofib runner to give usable timing results for
the compiler, I can easily track these numbers as well.

Greetings,
Joachim

-- 
Joachim “nomeata” Breitner
  m...@joachim-breitner.de • https://www.joachim-breitner.de/
  XMPP: nome...@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F
  Debian Developer: nome...@debian.org

signature.asc
Description: This is a digitally signed message part
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-06 Thread Ben Gamari
Michal Terepeta  writes:

>> On Tue, Dec 6, 2016 at 2:44 AM Ben Gamari  wrote:
>>
>>I don't have a strong opinion on which of these would be better.
>>However, I would point out that currently the tests/perf/compiler tests
>>are extremely labor-intensive to maintain while doing relatively little
>>to catch performance regressions. There are a few issues here:
>>
>> * some tests aren't very reproducible between runs, meaning that
>>   contributors sometimes don't catch regressions in their local
>>   validations
>> * many tests aren't very reproducible between platforms and all tests
>>   are inconsistent between differing word sizes. This means that we end
>>   up having many sets of expected performance numbers in the testsuite.
>>   In practice nearly all of these except 64-bit Linux are out-of-date.
>> * our window-based acceptance criterion for performance metrics doesn't
>>   catch most regressions, which typically bump allocations by a couple
>>   percent or less (whereas the acceptance thresholds range from 5% to
>>   20%). This means that the testsuite fails to catch many deltas, only
>>   failing when some unlucky person finally pushes the number over the
>>   threshold.
>>
>> Joachim and I discussed this issue a few months ago at Hac Phi; he had
>> an interesting approach to tracking expected performance numbers which
>> may both alleviate these issues and reduce the maintenance burden that
>> the tests pose. I wrote down some terse notes in #12758.
>
> Thanks for mentioning the ticket!
>
Sure!

> To be honest, I'm not a huge fan of having performance tests being
> treated the same as any other tests. IMHO they are quite different:
>
> - They usually need a quiet environment (e.g., cannot run two different
>   tests at the same time). But with ordinary correctness tests, I can
>   run as many as I want concurrently.
>
This is absolutely true; if I had a nickel for every time I saw the
testsuite fail, only to pass upon re-running I would be able to fund a
great deal of GHC development ;)

> - The output is not really binary (correct vs incorrect) but some kind of a
>   number (or collection of numbers) that we want to track over time.
>
Yes, and this is more or less the idea which the ticket is supposed to
capture; we track performance numbers in the GHC repository in git
notes and have Harbormaster (or some other stable test environment)
maintain them. Exact metrics would be recorded for every commit and we
could warn during validate if something changes suspiciously (e.g. look
at the mean and variance of the metric over the past N commits and
squawk if the commit bumps the metric more than some number of sigmas).

This sort of scheme could be implemented in either the testsuite or
nofib. It's not clear that one is better than the other (although we
would want to teach the testsuite driver to run performance tests
serially).

> - The decision whether to fail is harder. Since output might be noisy, you
>   need to have either quite relaxed bounds (and miss small
>   regressions) or try to enforce stronger bounds (and suffer from the
>   flakiness and maintenance overhead).
>
Yep. That is right.

> So for the purpose of:
>   "I have a small change and want to check its effect on compiler
>   performance and expect, e.g., ~1% difference"
> the model running of benchmarks separately from tests is much nicer. I
> can run them when I'm not doing anything else on the computer and then
> easily compare the results. (that's what I usually do for nofib). For
> tracking the performance over time, one could set something up to run
> the benchmarks when idle. (isn't that's what perf.haskell.org is
> doing?)
>
> Due to that, if we want to extend tests/perf/compiler to support this
> use case, I think we should include there benchmarks that are *not*
> tests (and are not included in ./validate), but there's some easy tool
> to run all of them and give you a quick comparison of what's changed.
>
When you put it like this it does sound like nofib is the natural choice
here.

> To a certain degree this would be then orthogonal to the improvements
> suggested in the ticket. But we could probably reuse some things
> (e.g., dumping .csv files for perf metrics?)
>
Indeed.

> How should we proceed? Should I open a new ticket focused on this?
> (maybe we could try to figure out all the details there?)
>
That sounds good to me.

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-06 Thread Michal Terepeta
> On Tue, Dec 6, 2016 at 2:44 AM Ben Gamari  wrote:
> Michal Terepeta  writes:
>
> [...]
>>
>> Looking at the comments on the proposal from Moritz, most people would
>> prefer to
>> extend/improve nofib or `tests/perf/compiler` tests. So I guess the main
>> question is - what would be better:
>> - Extending nofib with modules that are compile only (i.e., not
>>   runnable) and focus on stressing the compiler?
>> - Extending `tests/perf/compiler` with ability to run all the tests and
do
>>   easy "before and after" comparisons?
>>
>I don't have a strong opinion on which of these would be better.
>However, I would point out that currently the tests/perf/compiler tests
>are extremely labor-intensive to maintain while doing relatively little
>to catch performance regressions. There are a few issues here:
>
> * some tests aren't very reproducible between runs, meaning that
>   contributors sometimes don't catch regressions in their local
>   validations
> * many tests aren't very reproducible between platforms and all tests
>   are inconsistent between differing word sizes. This means that we end
>   up having many sets of expected performance numbers in the testsuite.
>   In practice nearly all of these except 64-bit Linux are out-of-date.
> * our window-based acceptance criterion for performance metrics doesn't
>   catch most regressions, which typically bump allocations by a couple
>   percent or less (whereas the acceptance thresholds range from 5% to
>   20%). This means that the testsuite fails to catch many deltas, only
>   failing when some unlucky person finally pushes the number over the
>   threshold.
>
> Joachim and I discussed this issue a few months ago at Hac Phi; he had
> an interesting approach to tracking expected performance numbers which
> may both alleviate these issues and reduce the maintenance burden that
> the tests pose. I wrote down some terse notes in #12758.

Thanks for mentioning the ticket!

To be honest, I'm not a huge fan of having performance tests being treated
the
same as any other tests. IMHO they are quite different:

- They usually need a quiet environment (e.g., cannot run two different
tests at
  the same time). But with ordinary correctness tests, I can run as many as
I
  want concurrently.

- The output is not really binary (correct vs incorrect) but some kind of a
  number (or collection of numbers) that we want to track over time.

- The decision whether to fail is harder. Since output might be noisy, you
  need to have either quite relaxed bounds (and miss small regressions) or
  try to enforce stronger bounds (and suffer from the flakiness and
maintenance
  overhead).

So for the purpose of:
  "I have a small change and want to check its effect on compiler
performance
  and expect, e.g., ~1% difference"
the model running of benchmarks separately from tests is much nicer. I can
run
them when I'm not doing anything else on the computer and then easily
compare
the results. (that's what I usually do for nofib). For tracking the
performance
over time, one could set something up to run the benchmarks when idle.
(isn't
that's what perf.haskell.org is doing?)

Due to that, if we want to extend tests/perf/compiler to support this use
case,
I think we should include there benchmarks that are *not* tests (and are not
included in ./validate), but there's some easy tool to run all of them and
give
you a quick comparison of what's changed.

To a certain degree this would be then orthogonal to the improvements
suggested
in the ticket. But we could probably reuse some things (e.g., dumping .csv
files
for perf metrics?)

How should we proceed? Should I open a new ticket focused on this? (maybe we
could try to figure out all the details there?)

Thanks,
Michal
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-06 Thread Johannes Waldmann
Hi,

> ... to compile it with a profiled GHC and look at the report?

How hard is it to build hackage or stackage
with a profiled ghc? (Does it require ghc magic, or can I do it?)

> ... some obvious sub-optimal algorithms in GHC.

obvious to whom? you mean sub-optimality is already known,
or that it would become obvious once the reports are there?

Even without profiling - does hackage
collect timing information from its automated builds?

What needs to be done to add timing information in places like
https://hackage.haskell.org/package/obdd-0.6.1/reports/1 ?

- J.W.

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-06 Thread Moritz Angermann

> |  - One of the core issues I see in day to day programming (even though
> |not necessarily with haskell right now) is that the spare time I
> |  have
> |to file bug reports, boil down performance regressions etc. and file
> |them with open source projects is not paid for and hence minimal.
> |Hence whenever the tools I use make it really easy for me to file a
> |bug, performance regression or fix something that takes the least
> |  time
> |the chances of me being able to help out increase greatly.  This was
> |  one
> |of the ideas behind using just pull requests.
> |E.g. This code seems to be really slow, or has subjectively
> |  regressed in
> |compilation time. I also feel confident I can legally share this
> |  code
> |snipped. So I just create a quick pull request with a short
> |  description,
> |and then carry on with what ever pressing task I’m trying to solve
> |  right
> |now.
> 
> There's the same difficulty at the other end too - people who might fix perf 
> regressions are typically not paid for either.  So they (eg me) tend to focus 
> on things where there is a small repro case, which in turn costs work to 
> produce.  Eg #12745 which I fixed recently in part because thomie found a 
> lovely small example.
> 
> So I'm a bit concerned that lowering the barrier to entry for perf reports 
> might not actually lead to better perf.  (But undeniably the suite we built 
> up would be a Good Thing, so we'd be a bit further forward.)
> 
> Simon

I did not intend to imply that there was a surplus of time on the other end :)

If this would result in a bunch of tiny test cases that can pinpoint the
underlying issue, I’m not certain.  Say we would tag the test cases though
(e.g. uses TH, uses GADTs, uses X, Y and Z) and run these samples on every
commit or every other commit (what ever the available hardware would allow the
test suite to run on (and maybe even backtest where possible)) regressions 
w.r.t.
subsets might be identifiable. E.g. commit  made testcases predominantly
with GADTs spike.

Worst case scenario we have to declare defeat and decide that this approach 
has not produced any viable results, and we wasted time of contributes providing
the samples.  On the other hand we would never know without the samples, as they
would have never been provided in the first place?

Cheers,
 moritz
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Measuring performance of GHC

2016-12-06 Thread Simon Peyton Jones via ghc-devs

|  - One of the core issues I see in day to day programming (even though
|not necessarily with haskell right now) is that the spare time I
|  have
|to file bug reports, boil down performance regressions etc. and file
|them with open source projects is not paid for and hence minimal.
|Hence whenever the tools I use make it really easy for me to file a
|bug, performance regression or fix something that takes the least
|  time
|the chances of me being able to help out increase greatly.  This was
|  one
|of the ideas behind using just pull requests.
|E.g. This code seems to be really slow, or has subjectively
|  regressed in
|compilation time. I also feel confident I can legally share this
|  code
|snipped. So I just create a quick pull request with a short
|  description,
|and then carry on with what ever pressing task I’m trying to solve
|  right
|now.

There's the same difficulty at the other end too - people who might fix perf 
regressions are typically not paid for either.  So they (eg me) tend to focus 
on things where there is a small repro case, which in turn costs work to 
produce.  Eg #12745 which I fixed recently in part because thomie found a 
lovely small example.

So I'm a bit concerned that lowering the barrier to entry for perf reports 
might not actually lead to better perf.  (But undeniably the suite we built up 
would be a Good Thing, so we'd be a bit further forward.)

Simon
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-05 Thread Moritz Angermann
Hi,

I see the following challenges here, which have partially be touched
by the discussion in the mentioned proposal.

- The tests we are looking at, might be quite time intensive (lots of
  modules that take substantial time to compile).  Is this practical to
  run when people locally execute nofib to get *some* idea of the
  performance implications?  Where is the threshold for the total
  execution time on running nofib?

- One of the core issues I see in day to day programming (even though
  not necessarily with haskell right now) is that the spare time I have
  to file bug reports, boil down performance regressions etc. and file
  them with open source projects is not paid for and hence minimal.
  Hence whenever the tools I use make it really easy for me to file a
  bug, performance regression or fix something that takes the least time
  the chances of me being able to help out increase greatly.  This was one
  of the ideas behind using just pull requests.
  E.g. This code seems to be really slow, or has subjectively regressed in
  compilation time. I also feel confident I can legally share this code
  snipped. So I just create a quick pull request with a short description,
  and then carry on with what ever pressing task I’m trying to solve right
  now.

- Making sure that measurements are reliable. (E.g. running on a dedicated
  machine with no other applications interfering.) I assume Joachim has
  quite some experience here.

Thanks.

Cheers,
 Moritz


> On Dec 6, 2016, at 9:44 AM, Ben Gamari  wrote:
> 
> Michal Terepeta  writes:
> 
>> Interesting! I must have missed this proposal.  It seems that it didn't meet
>> with much enthusiasm though (but it also proposes to have a completely
>> separate
>> repo on github).
>> 
>> Personally, I'd be happy with something more modest:
>> - A collection of modules/programs that are more representative of real
>>  Haskell programs and stress various aspects of the compiler.
>>  (this seems to be a weakness of nofib, where >90% of modules compile
>>  in less than 0.4s)
> 
> This would be great.
> 
>> - A way to compile all of those and do "before and after" comparisons
>>  easily. To measure the time, we should probably try to compile each
>>  module at least a few times. (it seems that this is not currently
>>  possible with `tests/perf/compiler` and
>>  nofib only compiles the programs once AFAICS)
>> 
>> Looking at the comments on the proposal from Moritz, most people would
>> prefer to
>> extend/improve nofib or `tests/perf/compiler` tests. So I guess the main
>> question is - what would be better:
>> - Extending nofib with modules that are compile only (i.e., not
>>  runnable) and focus on stressing the compiler?
>> - Extending `tests/perf/compiler` with ability to run all the tests and do
>>  easy "before and after" comparisons?
>> 
> I don't have a strong opinion on which of these would be better.
> However, I would point out that currently the tests/perf/compiler tests
> are extremely labor-intensive to maintain while doing relatively little
> to catch performance regressions. There are a few issues here:
> 
> * some tests aren't very reproducible between runs, meaning that
>   contributors sometimes don't catch regressions in their local
>   validations
> * many tests aren't very reproducible between platforms and all tests
>   are inconsistent between differing word sizes. This means that we end
>   up having many sets of expected performance numbers in the testsuite.
>   In practice nearly all of these except 64-bit Linux are out-of-date.
> * our window-based acceptance criterion for performance metrics doesn't
>   catch most regressions, which typically bump allocations by a couple
>   percent or less (whereas the acceptance thresholds range from 5% to
>   20%). This means that the testsuite fails to catch many deltas, only
>   failing when some unlucky person finally pushes the number over the
>   threshold.
> 
> Joachim and I discussed this issue a few months ago at Hac Phi; he had
> an interesting approach to tracking expected performance numbers which
> may both alleviate these issues and reduce the maintenance burden that
> the tests pose. I wrote down some terse notes in #12758.
> 
> Cheers,
> 
> - Ben

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-05 Thread Ben Gamari
Michal Terepeta  writes:

> Interesting! I must have missed this proposal.  It seems that it didn't meet
> with much enthusiasm though (but it also proposes to have a completely
> separate
> repo on github).
>
> Personally, I'd be happy with something more modest:
> - A collection of modules/programs that are more representative of real
>   Haskell programs and stress various aspects of the compiler.
>   (this seems to be a weakness of nofib, where >90% of modules compile
>   in less than 0.4s)

This would be great.

> - A way to compile all of those and do "before and after" comparisons
>   easily. To measure the time, we should probably try to compile each
>   module at least a few times. (it seems that this is not currently
>   possible with `tests/perf/compiler` and
>   nofib only compiles the programs once AFAICS)
>
> Looking at the comments on the proposal from Moritz, most people would
> prefer to
> extend/improve nofib or `tests/perf/compiler` tests. So I guess the main
> question is - what would be better:
> - Extending nofib with modules that are compile only (i.e., not
>   runnable) and focus on stressing the compiler?
> - Extending `tests/perf/compiler` with ability to run all the tests and do
>   easy "before and after" comparisons?
>
I don't have a strong opinion on which of these would be better.
However, I would point out that currently the tests/perf/compiler tests
are extremely labor-intensive to maintain while doing relatively little
to catch performance regressions. There are a few issues here:

 * some tests aren't very reproducible between runs, meaning that
   contributors sometimes don't catch regressions in their local
   validations
 * many tests aren't very reproducible between platforms and all tests
   are inconsistent between differing word sizes. This means that we end
   up having many sets of expected performance numbers in the testsuite.
   In practice nearly all of these except 64-bit Linux are out-of-date.
 * our window-based acceptance criterion for performance metrics doesn't
   catch most regressions, which typically bump allocations by a couple
   percent or less (whereas the acceptance thresholds range from 5% to
   20%). This means that the testsuite fails to catch many deltas, only
   failing when some unlucky person finally pushes the number over the
   threshold.

Joachim and I discussed this issue a few months ago at Hac Phi; he had
an interesting approach to tracking expected performance numbers which
may both alleviate these issues and reduce the maintenance burden that
the tests pose. I wrote down some terse notes in #12758.

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-05 Thread Ben Gamari
Michal Terepeta  writes:

> Hi everyone,
>
> I've been running nofib a few times recently to see the effect of some
> changes
> on compile time (not the runtime of the compiled program). And I've started
> wondering how representative nofib is when it comes to measuring compile
> time
> and compiler allocations? It seems that most of the nofib programs compile
> really quickly...
>
> Is there some collections of modules/libraries/applications that were put
> together with the purpose of benchmarking GHC itself and I just haven't
> seen/found it?
>
Sadly no; I've put out a number of calls for minimal programs (e.g.
small, fairly free-standing real-world applications) but the response
hasn't been terribly strong. I frankly can't blame people for not
wanting to take the time to strip out dependencies from their working
programs. Joachim and I have previously discussed the possibility of
manually collecting a set of popular Hackage libraries on a regular
basis for use in compiler performance characterization.

Cheers,

- Ben



signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-05 Thread Michal Terepeta
On Mon, Dec 5, 2016 at 12:00 PM Moritz Angermann 
wrote:

> Hi,
>
> I’ve started the GHC Performance Regression Collection Proposal[1]
> (Rendered [2])
> a while ago with the idea of having a trivially community curated set of
> small[3]
> real-world examples with performance regressions. I might be at fault here
> for
> not describing this to the best of my abilities. Thus if there is
> interested, and
> this sounds like an useful idea, maybe we should still pursue this
> proposal?
>
> Cheers,
>  moritz
>
> [1]: https://github.com/ghc-proposals/ghc-proposals/pull/26
> [2]:
> https://github.com/angerman/ghc-proposals/blob/prop/perf-regression/proposals/-perf-regression.rst
> [3]: for some definition of small
>

Interesting! I must have missed this proposal.  It seems that it didn't meet
with much enthusiasm though (but it also proposes to have a completely
separate
repo on github).

Personally, I'd be happy with something more modest:
- A collection of modules/programs that are more representative of real
Haskell
  programs and stress various aspects of the compiler.
  (this seems to be a weakness of nofib, where >90% of modules compile in
less
  than 0.4s)
- A way to compile all of those and do "before and after" comparisons
easily. To
  measure the time, we should probably try to compile each module at least
a few
  times.
  (it seems that this is not currently possible with `tests/perf/compiler`
and
  nofib only compiles the programs once AFAICS)

Looking at the comments on the proposal from Moritz, most people would
prefer to
extend/improve nofib or `tests/perf/compiler` tests. So I guess the main
question is - what would be better:
- Extending nofib with modules that are compile only (i.e., not runnable)
and
  focus on stressing the compiler?
- Extending `tests/perf/compiler` with ability to run all the tests and do
  easy "before and after" comparisons?

Personally, I'm slightly leaning towards `tests/perf/compiler` since this
would
allow sharing the same module as a test for `validate` and to be used for
comparing the performance of the compiler before and after a change.

What do you think?

Thanks,
Michal
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-05 Thread Moritz Angermann
Hi,

I’ve started the GHC Performance Regression Collection Proposal[1] (Rendered 
[2])
a while ago with the idea of having a trivially community curated set of 
small[3]
real-world examples with performance regressions. I might be at fault here for
not describing this to the best of my abilities. Thus if there is interested, 
and
this sounds like an useful idea, maybe we should still pursue this proposal?

Cheers,
 moritz

[1]: https://github.com/ghc-proposals/ghc-proposals/pull/26
[2]: 
https://github.com/angerman/ghc-proposals/blob/prop/perf-regression/proposals/-perf-regression.rst
[3]: for some definition of small

> On Dec 5, 2016, at 6:31 PM, Simon Peyton Jones via ghc-devs 
> <ghc-devs@haskell.org> wrote:
> 
> If not, maybe we should create something? IMHO it sounds reasonable to have
> 
> separate benchmarks for:
> 
> - Performance of GHC itself.
> 
> - Performance of the code generated by GHC.
> 
>  
> I think that would be great, Michael.  We have a small and unrepresentative 
> sample in testsuite/tests/perf/compiler
>  
> Simon
>  
> From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Michal 
> Terepeta
> Sent: 04 December 2016 19:47
> To: ghc-devs <ghc-devs@haskell.org>
> Subject: Measuring performance of GHC
>  
> Hi everyone,
> 
>  
> 
> I've been running nofib a few times recently to see the effect of some changes
> 
> on compile time (not the runtime of the compiled program). And I've started
> 
> wondering how representative nofib is when it comes to measuring compile time
> 
> and compiler allocations? It seems that most of the nofib programs compile
> 
> really quickly...
> 
>  
> 
> Is there some collections of modules/libraries/applications that were put
> 
> together with the purpose of benchmarking GHC itself and I just haven't
> 
> seen/found it?
> 
>  
> 
> If not, maybe we should create something? IMHO it sounds reasonable to have
> 
> separate benchmarks for:
> 
> - Performance of GHC itself.
> 
> - Performance of the code generated by GHC.
> 
>  
> 
> Thanks,
> 
> Michal
> 
>  
> 
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Measuring performance of GHC

2016-12-05 Thread Simon Peyton Jones via ghc-devs
If not, maybe we should create something? IMHO it sounds reasonable to have
separate benchmarks for:
- Performance of GHC itself.
- Performance of the code generated by GHC.

I think that would be great, Michael.  We have a small and unrepresentative 
sample in testsuite/tests/perf/compiler

Simon

From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Michal 
Terepeta
Sent: 04 December 2016 19:47
To: ghc-devs <ghc-devs@haskell.org>
Subject: Measuring performance of GHC

Hi everyone,

I've been running nofib a few times recently to see the effect of some changes
on compile time (not the runtime of the compiled program). And I've started
wondering how representative nofib is when it comes to measuring compile time
and compiler allocations? It seems that most of the nofib programs compile
really quickly...

Is there some collections of modules/libraries/applications that were put
together with the purpose of benchmarking GHC itself and I just haven't
seen/found it?

If not, maybe we should create something? IMHO it sounds reasonable to have
separate benchmarks for:
- Performance of GHC itself.
- Performance of the code generated by GHC.

Thanks,
Michal

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-04 Thread David Turner
Seems like a good idea, for sure. I have not, but I might eventually.

On 4 Dec 2016 21:52, "Joachim Breitner"  wrote:

> Hi,
>
> did you try to compile it with a profiled GHC and look at the report? I
> would not be surprised if it would point to some obvious sub-optimal
> algorithms in GHC.
>
> Greetings,
> Joachim
>
> Am Sonntag, den 04.12.2016, 20:04 + schrieb David Turner:
> > Nod nod.
> >
> > amazonka-ec2 has a particularly painful module containing just a
> > couple of hundred type definitions and associated instances and
> > stuff. None of the types is enormous. There's an issue open on
> > GitHub[1] where I've guessed at some possible better ways of
> > splitting the types up to make GHC's life easier, but it'd be great
> > if it didn't need any such shenanigans. It's a bit of a pathological
> > case: auto-generated 15kLoC and lots of deriving, but I still feel it
> > should be possible to compile with less than 2.8GB RSS.
> >
> > [1] https://github.com/brendanhay/amazonka/issues/304
> >
> > Cheers,
> >
> > David
> >
> > On 4 Dec 2016 19:51, "Alan & Kim Zimmerman" 
> > wrote:
> > I agree.
> >
> > I find compilation time on things with large data structures, such as
> > working with the GHC AST via the GHC API get pretty slow.
> >
> > To the point where I have had to explicitly disable optimisation on
> > HaRe, otherwise the build takes too long.
> >
> > Alan
> >
> >
> > On Sun, Dec 4, 2016 at 9:47 PM, Michal Terepeta  > l.com> wrote:
> > > Hi everyone,
> > >
> > > I've been running nofib a few times recently to see the effect of
> > > some changes
> > > on compile time (not the runtime of the compiled program). And I've
> > > started
> > > wondering how representative nofib is when it comes to measuring
> > > compile time
> > > and compiler allocations? It seems that most of the nofib programs
> > > compile
> > > really quickly...
> > >
> > > Is there some collections of modules/libraries/applications that
> > > were put
> > > together with the purpose of benchmarking GHC itself and I just
> > > haven't
> > > seen/found it?
> > >
> > > If not, maybe we should create something? IMHO it sounds reasonable
> > > to have
> > > separate benchmarks for:
> > > - Performance of GHC itself.
> > > - Performance of the code generated by GHC.
> > >
> > > Thanks,
> > > Michal
> > >
> > >
> > > ___
> > > ghc-devs mailing list
> > > ghc-devs@haskell.org
> > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> > >
> >
> >
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> >
> >
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> --
> Joachim “nomeata” Breitner
>   m...@joachim-breitner.de • https://www.joachim-breitner.de/
>   XMPP: nome...@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F
>   Debian Developer: nome...@debian.org
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-04 Thread Joachim Breitner
Hi,

did you try to compile it with a profiled GHC and look at the report? I
would not be surprised if it would point to some obvious sub-optimal
algorithms in GHC.

Greetings,
Joachim

Am Sonntag, den 04.12.2016, 20:04 + schrieb David Turner:
> Nod nod.
> 
> amazonka-ec2 has a particularly painful module containing just a
> couple of hundred type definitions and associated instances and
> stuff. None of the types is enormous. There's an issue open on
> GitHub[1] where I've guessed at some possible better ways of
> splitting the types up to make GHC's life easier, but it'd be great
> if it didn't need any such shenanigans. It's a bit of a pathological
> case: auto-generated 15kLoC and lots of deriving, but I still feel it
> should be possible to compile with less than 2.8GB RSS.
>  
> [1] https://github.com/brendanhay/amazonka/issues/304
> 
> Cheers,
> 
> David
> 
> On 4 Dec 2016 19:51, "Alan & Kim Zimmerman" 
> wrote:
> I agree.
> 
> I find compilation time on things with large data structures, such as
> working with the GHC AST via the GHC API get pretty slow.
> 
> To the point where I have had to explicitly disable optimisation on
> HaRe, otherwise the build takes too long.
> 
> Alan
> 
> 
> On Sun, Dec 4, 2016 at 9:47 PM, Michal Terepeta  l.com> wrote:
> > Hi everyone,
> > 
> > I've been running nofib a few times recently to see the effect of
> > some changes
> > on compile time (not the runtime of the compiled program). And I've
> > started
> > wondering how representative nofib is when it comes to measuring
> > compile time
> > and compiler allocations? It seems that most of the nofib programs
> > compile
> > really quickly...
> > 
> > Is there some collections of modules/libraries/applications that
> > were put
> > together with the purpose of benchmarking GHC itself and I just
> > haven't
> > seen/found it?
> > 
> > If not, maybe we should create something? IMHO it sounds reasonable
> > to have
> > separate benchmarks for:
> > - Performance of GHC itself.
> > - Performance of the code generated by GHC.
> > 
> > Thanks,
> > Michal
> > 
> > 
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> > 
> 
> 
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> 
> 
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- 
Joachim “nomeata” Breitner
  m...@joachim-breitner.de • https://www.joachim-breitner.de/
  XMPP: nome...@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F
  Debian Developer: nome...@debian.org

signature.asc
Description: This is a digitally signed message part
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-04 Thread David Turner
Nod nod.

amazonka-ec2 has a particularly painful module containing just a couple of
hundred type definitions and associated instances and stuff. None of the
types is enormous. There's an issue open on GitHub[1] where I've guessed at
some possible better ways of splitting the types up to make GHC's life
easier, but it'd be great if it didn't need any such shenanigans. It's a
bit of a pathological case: auto-generated 15kLoC and lots of deriving, but
I still feel it should be possible to compile with less than 2.8GB RSS.

[1] https://github.com/brendanhay/amazonka/issues/304

Cheers,

David

On 4 Dec 2016 19:51, "Alan & Kim Zimmerman"  wrote:

I agree.

I find compilation time on things with large data structures, such as
working with the GHC AST via the GHC API get pretty slow.

To the point where I have had to explicitly disable optimisation on HaRe,
otherwise the build takes too long.

Alan


On Sun, Dec 4, 2016 at 9:47 PM, Michal Terepeta 
wrote:

> Hi everyone,
>
> I've been running nofib a few times recently to see the effect of some
> changes
> on compile time (not the runtime of the compiled program). And I've started
> wondering how representative nofib is when it comes to measuring compile
> time
> and compiler allocations? It seems that most of the nofib programs compile
> really quickly...
>
> Is there some collections of modules/libraries/applications that were put
> together with the purpose of benchmarking GHC itself and I just haven't
> seen/found it?
>
> If not, maybe we should create something? IMHO it sounds reasonable to have
> separate benchmarks for:
> - Performance of GHC itself.
> - Performance of the code generated by GHC.
>
> Thanks,
> Michal
>
>
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Measuring performance of GHC

2016-12-04 Thread Alan & Kim Zimmerman
I agree.

I find compilation time on things with large data structures, such as
working with the GHC AST via the GHC API get pretty slow.

To the point where I have had to explicitly disable optimisation on HaRe,
otherwise the build takes too long.

Alan


On Sun, Dec 4, 2016 at 9:47 PM, Michal Terepeta 
wrote:

> Hi everyone,
>
> I've been running nofib a few times recently to see the effect of some
> changes
> on compile time (not the runtime of the compiled program). And I've started
> wondering how representative nofib is when it comes to measuring compile
> time
> and compiler allocations? It seems that most of the nofib programs compile
> really quickly...
>
> Is there some collections of modules/libraries/applications that were put
> together with the purpose of benchmarking GHC itself and I just haven't
> seen/found it?
>
> If not, maybe we should create something? IMHO it sounds reasonable to have
> separate benchmarks for:
> - Performance of GHC itself.
> - Performance of the code generated by GHC.
>
> Thanks,
> Michal
>
>
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Measuring performance of GHC

2016-12-04 Thread Michal Terepeta
Hi everyone,

I've been running nofib a few times recently to see the effect of some
changes
on compile time (not the runtime of the compiled program). And I've started
wondering how representative nofib is when it comes to measuring compile
time
and compiler allocations? It seems that most of the nofib programs compile
really quickly...

Is there some collections of modules/libraries/applications that were put
together with the purpose of benchmarking GHC itself and I just haven't
seen/found it?

If not, maybe we should create something? IMHO it sounds reasonable to have
separate benchmarks for:
- Performance of GHC itself.
- Performance of the code generated by GHC.

Thanks,
Michal
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs