[Haskell-cafe] Encouraging performance with ghc 7.4

2012-01-18 Thread Eugene Kirpichov
Hi cafe,

Just wanted to inform you that I've been benchmarking my compute-intensive
stuff on ghc 7.0.4 vs 7.4.0, and 7.4.0 gave a large speedup - one program
that took 2.9s on a particular input is now taking 2.2s. This result is
repeatable.

So I encourage people to try out GHC 7.4.0. Some stuff may stop compiling
(I had to do 3 basically one-line fixes in 3 packages before I got mine to
compile), but I didn't encounter other problems.

Below is +RTS -s of two runs: one compiled with ghc 7.4.0 and another with
7.0.4.

I can make a more detailed comparison if it's useful and if someone tells
me how - I thought about including +RTS -p, but Simon Marlow wrote recently
that it has started giving meaningful results only in 7.4, so comparison
with 7.0.4 would be unfair.

However in this case I'm sure that the program is heavily compute-bound by
the following two functions (because it's always compute-bound by them):
1) https://github.com/jkff/timeplot/blob/master/Tools/TimePlot/Plots.hs#L226
2) https://github.com/jkff/timeplot/blob/master/Tools/TimePlot/Conf.hs -
localToUTC and strptime.

jkff@jkff-laptop ~/projects/logs/... $ tplot -if e.trace -dk 'within[.]
cumsum 10' -o e.png +RTS -s
tplot -if e.trace -dk within[.] cumsum 10 -o e.png +RTS -s
   2,751,756,296 bytes allocated in the heap
 135,129,112 bytes copied during GC
  33,149,720 bytes maximum residency (22 sample(s))
   1,755,868 bytes maximum slop
  56 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  4994 collections, 0 parallel,  0.22s,  0.23s elapsed
  Generation 1:22 collections, 0 parallel,  0.08s,  0.09s elapsed

  INIT  time0.01s  (  0.00s elapsed)
  MUT   time2.61s  (  2.91s elapsed)
  GCtime0.30s  (  0.32s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time2.91s  (  3.22s elapsed)

  %GC time  10.1%  (9.8% elapsed)

  Alloc rate1,051,262,083 bytes per MUT second

  Productivity  89.6% of total user, 80.9% of total elapsed

jkff@jkff-laptop ~/projects/logs/... $ tplot -if e.trace -dk 'within[.]
cumsum 10' -o e.png +RTS -s
   2,161,811,620 bytes allocated in the heap
 107,589,660 bytes copied during GC
  34,799,400 bytes maximum residency (22 sample(s))
   1,721,152 bytes maximum slop
  58 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed)  Avg pause  Max pause
  Gen  0  3899 colls, 0 par0.13s0.14s 0.s0.0003s
  Gen  122 colls, 0 par0.08s0.09s 0.0043s0.0482s

  INITtime0.00s  (  0.00s elapsed)
  MUT time2.03s  (  2.28s elapsed)
  GC  time0.21s  (  0.23s elapsed)
  EXITtime0.00s  (  0.00s elapsed)
  Total   time2.26s  (  2.51s elapsed)

  %GC time   9.3%  (9.1% elapsed)

  Alloc rate1,056,397,390 bytes per MUT second

  Productivity  90.7% of total user, 81.5% of total elapsed


-- 
Eugene Kirpichov
Principal Engineer, Mirantis Inc. http://www.mirantis.com/
Editor, http://fprog.ru/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Encouraging performance with ghc 7.4

2012-01-18 Thread Eugene Kirpichov
I got even more encouraging results on another input - 5 vs 7 seconds:

=== GHC 7.4.0 ==
jkff@jkff-laptop ~/projects/logs/... $ tplot -if lat.trace -dk 'within[.]
duration quantile 10 0.25,0.5,0.75,0.9,0.95' -o lat3.png +RTS -s
   2,809,230,872 bytes allocated in the heap
 358,393,440 bytes copied during GC
  42,478,364 bytes maximum residency (68 sample(s))
   1,833,848 bytes maximum slop
 113 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed)  Avg pause  Max pause
  Gen  0  4740 colls, 0 par0.39s0.40s 0.0001s0.0005s
  Gen  168 colls, 0 par0.42s0.48s 0.0071s0.1157s

  INITtime0.00s  (  0.00s elapsed)
  MUT time3.76s  (  4.01s elapsed)
  GC  time0.82s  (  0.88s elapsed)
  EXITtime0.00s  (  0.00s elapsed)
  Total   time4.59s  (  4.89s elapsed)

  %GC time  17.8%  (18.0% elapsed)

  Alloc rate744,774,183 bytes per MUT second

  Productivity  82.2% of total user, 77.2% of total elapsed

 GHC 7.0.4 ===
jkff@jkff-laptop ~/projects/logs/... $ tplot -if lat.trace -dk 'within[.]
duration quantile 10 0.25,0.5,0.75,0.9,0.95' -o lat3.png +RTS -s
tplot -if lat.trace -dk within[.] duration quantile 10
0.25,0.5,0.75,0.9,0.95 -o lat3.png +RTS -s
   4,024,048,244 bytes allocated in the heap
 419,997,300 bytes copied during GC
  44,546,920 bytes maximum residency (70 sample(s))
   1,840,208 bytes maximum slop
 113 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  6995 collections, 0 parallel,  0.63s,  0.64s elapsed
  Generation 1:70 collections, 0 parallel,  0.39s,  0.43s elapsed

  INIT  time0.01s  (  0.00s elapsed)
  MUT   time5.96s  (  6.17s elapsed)
  GCtime1.02s  (  1.08s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time6.99s  (  7.24s elapsed)

  %GC time  14.6%  (14.9% elapsed)

  Alloc rate674,478,767 bytes per MUT second

  Productivity  85.3% of total user, 82.3% of total elapsed


On Wed, Jan 18, 2012 at 8:22 PM, Eugene Kirpichov ekirpic...@gmail.comwrote:

 Hi cafe,

 Just wanted to inform you that I've been benchmarking my compute-intensive
 stuff on ghc 7.0.4 vs 7.4.0, and 7.4.0 gave a large speedup - one program
 that took 2.9s on a particular input is now taking 2.2s. This result is
 repeatable.

 So I encourage people to try out GHC 7.4.0. Some stuff may stop compiling
 (I had to do 3 basically one-line fixes in 3 packages before I got mine to
 compile), but I didn't encounter other problems.

 Below is +RTS -s of two runs: one compiled with ghc 7.4.0 and another with
 7.0.4.

 I can make a more detailed comparison if it's useful and if someone tells
 me how - I thought about including +RTS -p, but Simon Marlow wrote recently
 that it has started giving meaningful results only in 7.4, so comparison
 with 7.0.4 would be unfair.

 However in this case I'm sure that the program is heavily compute-bound by
 the following two functions (because it's always compute-bound by them):
 1)
 https://github.com/jkff/timeplot/blob/master/Tools/TimePlot/Plots.hs#L226
 2) https://github.com/jkff/timeplot/blob/master/Tools/TimePlot/Conf.hs -
 localToUTC and strptime.

 jkff@jkff-laptop ~/projects/logs/... $ tplot -if e.trace -dk 'within[.]
 cumsum 10' -o e.png +RTS -s
 tplot -if e.trace -dk within[.] cumsum 10 -o e.png +RTS -s
2,751,756,296 bytes allocated in the heap
  135,129,112 bytes copied during GC
   33,149,720 bytes maximum residency (22 sample(s))
1,755,868 bytes maximum slop
   56 MB total memory in use (0 MB lost due to fragmentation)

   Generation 0:  4994 collections, 0 parallel,  0.22s,  0.23s elapsed
   Generation 1:22 collections, 0 parallel,  0.08s,  0.09s elapsed

   INIT  time0.01s  (  0.00s elapsed)
   MUT   time2.61s  (  2.91s elapsed)
   GCtime0.30s  (  0.32s elapsed)
   EXIT  time0.00s  (  0.00s elapsed)
   Total time2.91s  (  3.22s elapsed)

   %GC time  10.1%  (9.8% elapsed)

   Alloc rate1,051,262,083 bytes per MUT second

   Productivity  89.6% of total user, 80.9% of total elapsed

 jkff@jkff-laptop ~/projects/logs/... $ tplot -if e.trace -dk 'within[.]
 cumsum 10' -o e.png +RTS -s
2,161,811,620 bytes allocated in the heap
  107,589,660 bytes copied during GC
   34,799,400 bytes maximum residency (22 sample(s))
1,721,152 bytes maximum slop
   58 MB total memory in use (0 MB lost due to fragmentation)

 Tot time (elapsed)  Avg pause  Max
 pause
   Gen  0  3899 colls, 0 par0.13s0.14s 0.s
  0.0003s
   Gen  122 colls, 0 par0.08s0.09s 0.0043s
  0.0482s

   INITtime0.00s  (  0.00s elapsed)
   MUT time2.03s  (  2.28s elapsed)
   GC  time0.21s  (  0.23s elapsed)