At Sat, 14 May 2016 22:08:32 -0400, Gabriel Scherer wrote:
> Short version:
>   Racket has relatively poor worst-case GC pause times on a specific
>   benchmark:
>     https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.rkt
> 
>   1) Is this expected? What is the GC algorithm for Racket's old generation?

Racket mostly uses mark-and-sweep for the old generation. It sometimes
compacts, but not usually. (I tried disabling compaction in the GC, and
it doesn't seem to affect the benchmark.)

Although Racket's incremental GC mode can help some, as you noted, it
doesn't help out-of-the-box for the benchmark.

>   2) Can you make it better?

Only a little, in the short term.


> Long version:

Beware that Racket strings are Unicode strings stored as UCS-4. So, I
think the benchmark makes Racket allocate 4 times as much as the OCaml
and Haskell variants.

>From a GC perspective, that maybe shouldn't matter too much, because
the data is mostly atomic and not traversed. Still, it reduces
locality. Making matter worse, the GC in Racket v6.5 is more eager than
intended to return pages to the OS (and the size of those pages is
proportional to the size of the message); making the GC less eager to
return pages cuts the peak GC time from 225ms to 160ms on my OS X
machine. The effect is likely OS-specific.

Changing the benchmark to use a byte string reduces the peak GC time to
140ms. The 20ms saved reflects some general locality effects, but maybe
also some remaining weakness in the GC's handling of newly unused
pages.


A side issue is that inserting consecutive numbers into an immutable
hash table is a bad case for Racket's implementation in v6.5. (I've
been meaning to look at this for a while.) It ends up creating
intermediate nodes that are larger than average, which again reduces
locality. After adjusting the implementation to avoid trouble with
sequential hash codes, the peak GC time ends up around 100ms on my
machine.


Meanwhile, incremental GC doesn't help because it doesn't adapt well to
ramping up from stand-still. The current, first cut at incremental mode
is mostly useful in a steady state (typical for a video game). If I
increase the message count by a factor of 10, incremental GC mode
starts helping a little:

 % env PLT_INCREMENTAL_GC=y racket -W "debug@GC" main.rkt | & grep MAJ
 GC: 0:MAJ @ 51,218K(+37,101K)...; free 3,087K(-3,087K) 17ms @ 331
 GC: 0:MAJ @ 107,847K(+29,624K)...; free 9,304K(-19,560K) 42ms @ 483
 GC: 0:MAJ @ 209,809K(+38,814K)...; free 6,715K(-22,635K) 56ms @ 781
 GC: 0:MAJ @ 416,454K(+44,697K)...; free 9,512K(-6,920K) 86ms @ 1477
 GC: 0:MAJ @ 824,407K(+76,520K)...; free 182,074K(-182,074K) 113ms @ 2888
 GC: 0:MAJ @ 1,294,379K(+98,068K)...; free 417,187K(-417,187K) 114ms @ 5066
 GC: 0:MAJ @ 1,763,263K(+120,704K)...; free 647,816K(-647,816K) 64ms @ 8067
 GC: 0:MAJ @ 2,028,264K(+117,847K)...; free 885,256K(-901,640K) 43ms @ 10918
 GC: 0:MAJ @ 2,035,484K(+127,011K)...; free 912,992K(-912,992K) 47ms @ 13779
 GC: 0:MAJ @ 2,026,836K(+135,659K)...; free 890,614K(-890,614K) 41ms @ 16503
 ....

A program can request incremental mode and encourage more incremental
work through the `collect-garbage` function. Adding

   (when (zero? (modulo i 100))
     (collect-garbage 'incremental)
     (collect-garbage 'minor))

to the `for/fold` loop brings peak GC time down to around 30ms (also
using the GC and HAMT improvements), but the program takes twice as
long to run. Using 50 instead of 100 in the `modulo` brings the peak
time to 20ms, but then the program takes three times as long to run.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-dev/57392adc.02a1420a.3ca33.4917SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to