This is less a Python question and more a optimization/probability question. Imaging that you have a list of objects and there frequency in a population e.g.
lst = [(a, 0.01), (b, 0.05), (c, 0.50), (d, 0.30), (e, 0.04), (f, 0.10)] and you want to drawn n items from that list (duplicates allowed), with that probability distribution. The fastest algorithm that I have been able to devise for doing so is: O(n * log(len(lst))). Can anyone think or a solution with a better time complexity? If not, is there an obviously better way to do this (assuming n is big and the list size is small). Here is the code: from random import random from bisect import bisect def draw(n, lst): ps = [] last = 0 for p in lst: ps.append(last + p) last += p # ps = [0.01, 0.06, 0.56, 0.86, 0.90, 1.00] chosen = [0] * len(lst) # track frequency for i in range(n): r = random() chosen[bisect(ps, r)] += 1 # binary search and increment result = [] # rescale probability based on frequency for c in chosen: result.append(float(c) / n) return result lst = [0.01, 0.05, 0.50, 0.30, 0.04, 0.10] print draw(10000, lst) -- http://mail.python.org/mailman/listinfo/python-list