I've realized one big slowdown in your code -- `random` by default uses a parameter to find the `current-pseudo-random-generator`. By removing the parameter lookup from the loop, I cut the time by about 33%.
My new version is at the same Gist url, with some very minor additional speedups. Sam On Tue, Nov 12, 2013 at 9:05 AM, Sam Tobin-Hochstadt <sa...@cs.indiana.edu> wrote: > I now know why your custom RNG was slow. Racket fixnums are (a) > signed and (b) have a width 1 less than the width of the host > architecture: [1] > > -> (fixnum? (- (expt 2 64) 1)) > #f > -> (fixnum? (- (expt 2 63) 1)) > #f > -> (fixnum? (- (expt 2 62) 1)) > #t > > Therefore, your random number generation calculation is constantly > overflowing into bignums and allocating. > > [1] > http://www.cs.utah.edu/plt/snapshots/current/doc/reference/numbers.html?q=fixnum#%28tech._fixnum%29 > > Sam > > On Tue, Nov 12, 2013 at 8:59 AM, Sam Tobin-Hochstadt > <sa...@cs.indiana.edu> wrote: >> Here's a start: https://gist.github.com/samth/7431213 >> >> This is a Typed Racket version of your program, converted to use >> flvectors instead of f64vectors. I haven't taken a look at the code >> for your random number generator yet, but you could try doing the >> timing by creating a large array of random numbers ahead of time, and >> then timing the loop just referencing into that array. >> >> Sam >> >> On Tue, Nov 12, 2013 at 7:52 AM, William Cushing >> <william.cush...@gmail.com> wrote: >>> I'm in the process of benchmarking various systems for their ability to >>> handle the inner loop of compilations of MCMC from a higher level >>> probabilistic modeling language (BLOG). >>> >>> Attached are examples of the kind of output (instantiations of MCMC) we'd >>> like to be able to generate given the tried-and-true Bayes Net linking the >>> probabilities of earthquake, burglary, john calling, and mary calling. >>> Classic AI fare. >>> >>> Predictably, C is fastest. But Stalin and SBCL also put in a very fine >>> showing. >>> >>> For 50,000,000 iterations on my machine, I get the following semi-serious, >>> semi-bogus timings for a random assortment of systems (petite chez just for >>> fun): >>> >>> 1 SBCL 2.20 >>> 2 Bigloo 5.25 >>> 3 Chicken 11.49 >>> 4 Stalin 2.15 >>> 5 Racket 16.67 >>> 6 Petite-Chez 30.42 >>> 7 GCC 0.82 >>> 8 Clang 0.92 >>> >>> There is much about the comparison that isn't fair. Chiefly, this is >>> primarily a test of speed of random number generation, but I haven't >>> controlled for the algorithm being used to generate random numbers. Stalin >>> may very well be using GLIBC random number generation, for example. I know >>> SBCL uses MT; in fact, the SBCL experts responded to my little encoding by >>> halving the shown runtime figure via dynamically linking in a new and >>> improved (SIMD-aware) implementation of MT. Which is quite impressive, as >>> that puts it within 30% of my C version...and my C version is "totally >>> cheating" by employing a simple little LCPRNG. >>> >>> Porting the same generator to Racket in the obvious way more than triples >>> its running time. (!) >>> (Is it the global var? Is the conversion to double being compiled as a real >>> division?) >>> >>> In any case, even just sticking with Racket's default PRNG, it seems the >>> performance of a simple number crunching loop leaves something to be >>> desired. I have done what I can to try and shave off time here and there. >>> Using unsafe ops, and threading most of the state through the parameters of >>> a tail-recursive loop (thus avoiding set!), I managed to shave off 3 seconds >>> from my original encoding (so about 15%), which isn't too shabby. Of >>> course, for even less effort, Stalin is already 800% times faster. >>> >>> I couldn't get typed/racket to type check, and /no-check doesn't seem to >>> help performance at all (not surprising). I suspect that getting >>> typed/racket to be able to process this file might help substantially...but >>> I don't really understand its type theory enough to help it figure this >>> benchmark out. >>> >>> I though Racket experts might have some suggestions for how to convince >>> Racket to run a simple number crunching loop without quite so much in the >>> way of constant-factor slowdown. >>> >>> -Will >>> >>> >>> ____________________ >>> Racket Users list: >>> http://lists.racket-lang.org/users >>> ____________________ Racket Users list: http://lists.racket-lang.org/users