Hello Alik,

So I would be in favor of expanding the documentation but notrestricting the parameter beyond avoiding value 1.0.I have removed restriction and expanded documentation in attaching patch v5.

`I've done some math investigations, which consisted in spending one hour`

`with Christian, a statistician colleague of mine. He took an old book out`

`of a shelf, opened it to page 550 (roughly in the middle), and explained`

`to me how to build a real zipfian distribution random generator.`

`The iterative method is for parameter a>1 and works for unbounded values.`

`It is simple to add a bound. In practice the iterative method is quite`

`effective, i.e. number of iterations is typically small, at least if the`

`bound is large and if parameter a is not too close to 1.`

`I've attached a python3 script which implements the algorithm. It looks`

`like magic. Beware that a C implementation should take care of float and`

`int overflows.`

# usage: a, #values, #tests sh> zipf.py 1.5 1000 1000000 # after 1.7 seconds c = [391586, 138668, 75525, 49339, 35222, 26621, ... ... 11, 13, 12, 11, 16] (1.338591 iterations per draw) sh> zipf.py 1.1 1000 1000000 # after 3.1 seconds c = [179302, 83927, 53104, 39015, 30557, 25164, ... ... 82, 95, 93, 81, 80] (2.681451 iterations per draw)

`I think that this method should be used for a>1, and the other very rough`

`one can be kept for parameter a in [0, 1), a case which does not make much`

`sense to a mathematician as it diverges if unbounded.`

-- Fabien.

#! /usr/bin/env python3 # # generate Zipf distribution # # method taken from: # Luc Devroye, # "Non-Uniform Random Variate Generation" # p. 550-551. # Springer 1986 # # the method works for an infinite bound, the finite bound condition has been # added. a = 1.1 N = 1000000 M = 1 import sys if len(sys.argv) >= 3: a = float(sys.argv[1]) N = int(sys.argv[2]) if len(sys.argv) >= 4: M = int(sys.argv[3]) # beware, a close to 1 and n small (eg 100) leads to large number of iterations # i.e. rejection probability is high when a -> 1 # - 1.001: 280 # - 1.002: 139.2 # - 1.005: 55.9 # - 1.010: 28.4 # - 1.020: 14.8 # - 1.050: 6.2 # - 1.100: 3.5 # however if n is larger the number of iterations decreases significantly from random import random from math import exp def zipfgen(a, N): assert a > 1.0, "a must be greater than 1" b = 2.0 ** (a - 1.0) i = 0 # count iterations while True: i += 1 u, v = random(), random() try: x = int(u ** (- 1.0 / (a - 1.0))) t = (1.0 + 1.0 / x) ** (a - 1.0) # reject if too large or out of bound if v * x * (t - 1.0) / (b - 1.0) <= t / b and x <= N: break except OverflowError: # on u ** ... pass return (x, i) if M == 1: x, i = zipfgen(a, N) print("X = %d (%d)" % (x, i)) else: c = [0 for i in range(0, N)] cost = 0 for i in range(0, M): x, i = zipfgen(a, N) # assert 1 <= x and x <= N, "x = %d" % x cost += i c[x-1] += 1 print("c = %s (%f iterations per draw)" % (c, cost/M))

-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers