You might be interested in my gcstats package, which will do some of these statistics for you, and may allow you to run larger heaps with data gathering.
Sam On Sat, May 14, 2016, 9:09 PM Gabriel Scherer <[email protected]> wrote: > Hi racket-devel, > > Short version: > Racket has relatively poor worst-case GC pause times on a specific > benchmark: > https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.rkt > > 1) Is this expected? What is the GC algorithm for Racket's old > generation? > 2) Can you make it better? > > Long version: > > ## Context > > James Fisher has a blog post on a case where GHC's runtime system > imposed unpleasant latencies/pauses on their Haskell program: > > https://blog.pusher.com/latency-working-set-ghc-gc-pick-two/ > > The blog post proposes a very simple, synthetic benchmark that exhibit > the issue -- namely, if the old generation of the GC uses a copying > strategy, then those copies can incur long pauses when many large > objects are live in the old general. > > I ported this synthetic benchmark to OCaml, and could check that the > OCaml GC suffers from no such issue, as its old generation uses > a mark&sweep strategy that does not copy old memory. The Haskell > benchmark has worst-case latencies around 50ms, which James Fisher > finds excessive. The OCaml benchmark has worst-case latencies around > 3ms. > > Max New did a port of the benchmark to Racket, which I later modified; > the results I see on my machine are relatively bad: the worst-case > pause time is between 120ms and 220ms on my tests. > > I think that the results are mostly unrelated to the specific edge > case that this benchmark was designed to exercize (copies of large > objects in the old generation): if I change the inserted strings to be > of size 1 instead of 1024, I also observe fairly high latencies -- > such as 120ms. So I'm mostly observing high latencies by inserting and > removing things from an immutable hash in a loop. > > ## Reproducing > > The benchmark fills an (immutable) associative structure with strings > of length 1024 (the idea is to have relatively high memory usage per > pointer, to see large copy times), keeping at most 200,000 strings in > the working set. In total, it inserts 1,000,000 strings (and thus > removes 800,000, one after each insertion after the first 200,000). We > measure latencies rather than throughput, so the performance details > of the associative map structure do not matter. > > My benchmark code in Haskell, OCaml and Racket can be found here: > https://gitlab.com/gasche/gc-latency-experiment.git > https://gitlab.com/gasche/gc-latency-experiment/tree/master > the Makefile contains my scripts to compile, run and analyze each > language's version. > > To run the Racket benchmark with instrumentation > > PLTSTDERR=debug@GC racket main.rkt 2> racket-gc-log > > To extract the pause times from the resulting log file (in the format > produced by Racket 6.5), I do: > > cat racket-gc-log | grep -v total | cut -d' ' -f7 | sort -n > > Piping `| uniq --count` after that produces a poor-man histogram of > latencies. I get the following result on my machine: > > 1 0ms > 2 1ms > 1 2ms > 1 3ms > 2 4ms > 1 5ms > 1 6ms > 3 8ms > 12 9ms > 1 11ms > 2 12ms > 38 13ms > 126 14ms > 43 15ms > 13 16ms > 19 17ms > 4 18ms > 1 19ms > 1 21ms > 1 48ms > 1 68ms > 1 70ms > 1 133ms > 1 165ms > 1 220ms > 1 227ms > 1 228ms > > > ## Non-incremental vs. incremental GC > > We experimented with PLT_INCREMENTAL_GC=1; on my machine, this does > not decrease the worst-case pause time; on Asumu Takikawa's beefier > machine, I think the pause times decreased a bit -- but still well > above 50ms. Because the incremental GC consumes sensibly more memory, > I am unable to test with both PLT_INCREMENTAL_GC and > PLTSTDERR=debug@gc enabled -- my system runs out of memory. > > If I reduce the benchmark sizes in half (half the working size, half > the number of iteration), I can run the incremental GC with debugging > enabled. On this half instance, I observe the following results: > > for the *non-incremental* GC: > > 2 1ms > 1 2ms > 2 3ms > 2 4ms > 1 5ms > 1 6ms > 9 8ms > 2 9ms > 1 10ms > 38 13ms > 43 14ms > 13 15ms > 8 16ms > 5 17ms > 6 18ms > 1 44ms > 1 66ms > 1 75ms > 2 126ms > 1 136ms > 1 142ms > > for the *incremental* GC > > 2 1ms > 1 2ms > 2 3ms > 3 4ms > 1 5ms > 38 6ms > 155 7ms > 136 8ms > 78 9ms > 56 10ms > 28 11ms > 16 12ms > 2 14ms > 1 15ms > 1 16ms > 2 20ms > 1 32ms > 1 41ms > 1 61ms > 1 101ms > 1 148ms > > As you can see, the incremental GC helps, as the distribution of > pauses moves to shorter pause times: it does more, shorter, > pauses. However, the worst-case pauses do not improve -- in fact they > are even a bit worse. > > -- > You received this message because you are subscribed to the Google Groups > "Racket Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/racket-dev/CAPFanBE6%2BMVxG44nHnw_HCsaNwZ-HSp4L7%2BVT-v-QJ%3Div-EK1g%40mail.gmail.com > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Racket Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/CAK%3DHD%2BaVisQBOmg5PMyp0Ajqdi3DbSqwx6aRuxpPRc1GUWPJLw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
