On 14 May 2018 at 13:27, Chris Angelico <ros...@gmail.com> wrote: > On Mon, May 14, 2018 at 9:59 PM, Paul Moore <p.f.mo...@gmail.com> wrote: >> The problem is that supplying cum_weights allows the code to run in >> O(log n) by using bisection. This is significantly faster on large >> populations. Adding a test that the cumulative weights are >> nondecreasing would add an O(n) step to the code. >> > > Hang on - are the 'n' and 'log n' there referring to the same n?
Yes. The number of elements in the sample population (which is the same as the number of entries in the weights/cum_weights arrays). See https://github.com/python/cpython/blob/master/Lib/random.py#L382 for details, but basically calculating cum_weights from weights costs O(n), and locating the right index into the population by doing a bisection search (bisect.bisect) on the cum_weights sequence costs O(log n). Using the cum_weights argument rather than the weights argument skips the O(n) step. If it's possible to check that cum_weights is nondecreasing in O(log n) time (either directly here, or in bisect.bisect), then the check wouldn't affect the algorithmic complexity of that case (it would affect the constants, but I assume we don't care too much about that). But I don't know of a way of doing that. Improving the documentation is of course free of runtime cost. And making it clear that "you should only use cum_weights if you know what you're doing, and in particular it doesn't cost you O(n) to work them out" would seem entirely reasonable to me. Paul -- https://mail.python.org/mailman/listinfo/python-list