On Sun, Feb 17, 2019 at 10:52 AM Fabien COELHO <coe...@cri.ensmp.fr> wrote:
> > I'm trying to use random_zipfian() for benchmarking of skewed data sets, > > and I ran head-first into an issue with rather excessive CPU costs. > > [...] This happens because generalizedHarmonicNumber() does this: > > > > for (i = n; i > 1; i--) > > ans += pow(i, -s); > > > > where n happens to be 1000000000 (range passed to random_zipfian), so > > the loop takes quite a bit of time. > > If you find a better formula for the harmonic number, you are welcome > and probably get your name on it:-) > There are pretty good approximations for s > 1.0 using Riemann zeta function and Euler derived a formula for the s = 1 case. I also noticed that i is int in this function, but n is int64. That seems like an oversight. Regards, Ants Aasma