Simon Marlow [EMAIL PROTECTED] writes:
Not so much code size, but data size (heap size, to be more
precise).
Of course.
There was some talk about storing tags in pointers for 6.8, I couldn't
find the reference, but I wonder if that would help my situation?
It would be interesting to know how much time is spent in the GC - run
the program with +RTS -sstderr.
MUT time decreases a bit (131 to 127s) for x86_64, but GC time
increases a lot (98 to 179s).
i686 version:
94,088,199,152 bytes allocated in the heap
22,294,740,756 bytes copied during GC (scavenged)
2,264,823,784 bytes copied during GC (not scavenged)
124,747,644 bytes maximum residency (4138 sample(s))
179962 collections in generation 0 ( 67.33s)
4138 collections in generation 1 ( 30.92s)
248 Mb total memory in use
INIT time0.00s ( 0.00s elapsed)
MUT time 131.53s (133.03s elapsed)
GCtime 98.25s (100.13s elapsed)
EXIT time0.00s ( 0.00s elapsed)
Total time 229.78s (233.16s elapsed)
%GC time 42.8% (42.9% elapsed)
Alloc rate715,345,865 bytes per MUT second
Productivity 57.2% of total user, 56.4% of total elapsed
x86_64 version:
173,790,326,352 bytes allocated in the heap
59,874,348,560 bytes copied during GC (scavenged)
5,424,298,832 bytes copied during GC (not scavenged)
247,477,744 bytes maximum residency (9856 sample(s))
331264 collections in generation 0 (111.51s)
9856 collections in generation 1 ( 67.80s)
582 Mb total memory in use
INIT time0.00s ( 0.00s elapsed)
MUT time 127.20s (127.76s elapsed)
GCtime 179.32s (179.63s elapsed)
EXIT time0.00s ( 0.00s elapsed)
Total time 306.52s (307.39s elapsed)
%GC time 58.5% (58.4% elapsed)
Alloc rate1,366,233,874 bytes per MUT second
Productivity 41.5% of total user, 41.4% of total elapsed
I've also added results from the 64 bit ghc-6.8.20071011 binary
snapshot, which shows some nice improvements, with one benchmark
improving by 30%(!).
151,807,589,712 bytes allocated in the heap
50,687,462,360 bytes copied during GC (scavenged)
4,472,003,520 bytes copied during GC (not scavenged)
256,532,480 bytes maximum residency (6805 sample(s))
289342 collections in generation 0 ( 89.30s)
6805 collections in generation 1 ( 60.26s)
602 Mb total memory in use
INIT time0.00s ( 0.00s elapsed)
MUT time 83.79s ( 84.36s elapsed)
GCtime 149.57s (151.10s elapsed)
EXIT time0.00s ( 0.00s elapsed)
Total time 233.35s (235.47s elapsed)
%GC time 64.1% (64.2% elapsed)
Alloc rate1,811,779,785 bytes per MUT second
Productivity 35.9% of total user, 35.6% of total elapsed
I'll add some more benchmarks
And I did. Below is a bit more detail from the log. The rc hash
counts traverse a bytestring, hashing fixed-size words into Integers.
As you can see, I haven't yet gotten the SPECIALIZE pragma to work
correctly :-). The global alignment is the previous test,
performing global (Needleman-Wunsch) alignment on pairs of sequences
of length 100 (short) or 1000 (long), implementing the dynamic
programming matrix as a list of lists.
Start:Fri Oct 12 08:48:36 CEST 2007
Linux nmd 2.6.20-16-generic #2 SMP Fri Aug 31 00:55:27 UTC 2007 i686
GNU/Linux
ghc 6.6
--- Sequence bench ---
rc hash counts int (8) . OK, passed 10 tests, CPU time: 34.526157s
rc hash counts int (16) . OK, passed 10 tests, CPU time: 34.746172s
rc hash counts (16) . OK, passed 10 tests, CPU time: 34.642164s
rc hash counts (32) . OK, passed 10 tests, CPU time: 35.378212s
Sequence bench totals, CPU time: 139.292705s, wall clock: 139 secs
--- Alignment bench ---
global alignment, short . OK, passed 10 tests, CPU time: 2.696168s
global alignment, long .. OK, passed 10 tests, CPU time: 90.481655s
Alignment bench totals, CPU time: 93.177823s, wall clock: 94 secs
Total for all tests, CPU time: 232.474528s, wall clock: 233 secs
End:Fri Oct 12 08:52:29 CEST 2007
Start:Fri Oct 12 09:52:33 CEST 2007
Linux nmd.imr.no 2.6.22-13-generic #1 SMP Thu Oct 4 17:52:26 GMT 2007
x86_64 GNU/Linux
ghc 6.6.1
--- Sequence bench ---
rc hash counts int (8) . OK, passed 10 tests, CPU time: 36.634289s
rc hash counts int (16) . OK, passed 10 tests, CPU time: 36.590286s
rc hash counts (16) . OK, passed 10 tests, CPU time: 36.946309s
rc hash counts (32) . OK, passed 10 tests, CPU time: 37.402338s
Sequence bench totals, CPU time: 147.577222s, wall clock: 148 secs
--- Alignment