Re: random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

Paul Moore Mon, 14 May 2018 05:52:21 -0700

On 14 May 2018 at 13:27, Chris Angelico <[email protected]> wrote:
> On Mon, May 14, 2018 at 9:59 PM, Paul Moore <[email protected]> wrote:
>> The problem is that supplying cum_weights allows the code to run in
>> O(log n) by using bisection. This is significantly faster on large
>> populations. Adding a test that the cumulative weights are
>> nondecreasing would add an O(n) step to the code.
>>
>
> Hang on - are the 'n' and 'log n' there referring to the same n?


Yes. The number of elements in the sample population (which is the
same as the number of entries in the weights/cum_weights arrays). See
https://github.com/python/cpython/blob/master/Lib/random.py#L382 for
details, but basically calculating cum_weights from weights costs
O(n), and locating the right index into the population by doing a
bisection search (bisect.bisect) on the cum_weights sequence costs
O(log n). Using the cum_weights argument rather than the weights
argument skips the O(n) step.

If it's possible to check that cum_weights is nondecreasing in O(log
n) time (either directly here, or in bisect.bisect), then the check
wouldn't affect the algorithmic complexity of that case (it would
affect the constants, but I assume we don't care too much about that).
But I don't know of a way of doing that.

Improving the documentation is of course free of runtime cost. And
making it clear that "you should only use cum_weights if you know what
you're doing, and in particular it doesn't cost you O(n) to work them
out" would seem entirely reasonable to me.

Paul
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

Reply via email to