Optimising a Groovy script

Paul Moore Tue, 28 Mar 2017 13:25:46 -0700

I'm very much a newbie with Groovy, so I apologise in advance if this
is not the right place for questions like this. I couldn't find
anywhere else that looked like a better option - if there is somewhere
I should have asked, feel free to redirect me.


I want to write a simulation script using Groovy - this is something
of a hobby challenge for me, I have a friend who has done a similar
task in C++, and I'm looking for a more user-friendly language to
write the code, while not losing too much performance over the C
version.

The code is basically to simulate "a game". The particular game is
defined by the user, as a function that generates scores. The program
runs the game many times, and summarises the distribution of the
results. Basically, a Monte Carlo simulation. My current code in
Groovy for this, using "roll 3 dice and add up the results" as the
target game, looks as follows:

@Grapes(
    @Grab(group='org.apache.commons', module='commons-math3', version='3.6.1')
)
import org.apache.commons.math3.random.MersenneTwister

def benchmark = { closure ->
  start = System.currentTimeMillis()
  closure.call()
  now = System.currentTimeMillis()
  now - start
}

rng = new MersenneTwister()

int roll() {
    rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
}

int N = 1000000
def results = [:]

def time = benchmark {
    N.times {
        int n = roll()
        results[n] = results.containsKey(n) ? results[n] + 1 : 1
    }
}

println "Took ${time/100} sec"
for (e in results.sort()) {
    println "$e.key: $e.value"
}

This does exactly what I want, but takes about 5 seconds to run the
simulation on my PC. My friend's C++ code runs a similar simulation in
about 0.1 second. That's a massive penalty for Groovy, and likely
means that for more realistic simulations (which would be a lot more
complex than 3d6!) I wouldn't even be close to competitive.

The code above is completely unoptimised. I know that Groovy's dynamic
programming features can introduce some overhead, but I also get the
impression from the documentation that by careful use of exact types,
and similar techniques, this can be speeded up a lot (the docs claim
potentially better than C performance in some cases).

What should I be looking at to optimise the above code? The areas I
can think of are:

1. The RNG. I assume that the apache commons code is pretty efficient,
though. I do want a reasonably decent RNG, and I'd heard that the JVM
RNG is not sufficient for simulation. For now I'm assuming that this
is sufficiently good.
2. The roll() function. This is the core of the inner loop, and likely
the big bottleneck. I've declared the type, which I guess is the first
step, but I don't know what else I can do here. I tried a
CompileStatic annotation, but that gave me errors about referencing
the rng variable. I'm not sure what that implies - is my code doing
something wrong in how it references the global rng variable?
3. Collecting the results in a map is likely not ideal. Is there a
better data structure I should be using? I basically want to be able
to count how many times each result appears - results will be integers
(in certain cases I might want non-integers but I can handle them
exceptionally) but I don't necessarily know the range in advance (so
I'd avoid a static array unless it gives significant performance
benefits - when I did a quick test, I got about a second faster
runtime, noticeable, but not enough to get me anywhere near my sub-1
second target)

I tried using the GProf profiler to see if that shed any light on what
I should do. When I ran with a realistic number of iterations it took
forever and then failed with an out of memory error. Dropping it to
10000 iterations, I got

 %    cumulative   self            self     total    self    total
self    total
time   seconds    seconds  calls  ms/call  ms/call  min ms  min ms
max ms  max ms  name
34.5        0.04     0.04  10000     0.00     0.00    0.00    0.00
0.91    1.13  blackpool$_run_closure2$_closure3.roll
18.5        0.06     0.02  39984     0.00     0.00    0.00    0.00
0.53    0.53  java.lang.Integer.plus
15.9        0.08     0.02  10000     0.00     0.01    0.00    0.00
0.28    1.26  blackpool$_run_closure2$_closure3.doCall
11.0        0.10     0.01  30000     0.00     0.00    0.00    0.00
0.36    0.36  org.apache.commons.math3.random.MersenneTwister.nextInt
 5.1        0.10     0.00  10000     0.00     0.00    0.00    0.00
0.19    0.19  java.util.LinkedHashMap.containsKey
 4.4        0.11     0.00  10000     0.00     0.00    0.00    0.00
0.02    0.02  java.util.LinkedHashMap.putAt
 4.0        0.12     0.00      1     5.25   125.62    5.25  125.62
5.25  125.62  java.lang.Integer.times
 3.9        0.12     0.00   9984     0.00     0.00    0.00    0.00
0.02    0.02  java.util.LinkedHashMap.getAt
 2.2        0.12     0.00      1     2.85   128.48    2.85  128.48
2.85  128.48  blackpool$_run_closure2.doCall

But I don't really know how to interpret that. (Also, am I somehow
using GProf wrongly? It seems like it shouldn't run out of memory
profiling a 5-second program run...)

Can anyone offer any advice on what I should be looking at here?

Thanks,
Paul

Optimising a Groovy script

Reply via email to