[issue18844] allow weights in random.choice

2017-03-31 Thread Donald Stufft

Changes by Donald Stufft :


--
pull_requests: +1022

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-10-29 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 09a87b16d5e5 by Raymond Hettinger in branch '3.6':
Issue #18844:  Strengthen tests to include a case with unequal weighting
https://hg.python.org/cpython/rev/09a87b16d5e5

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-10-29 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 32bfc8b6 by Raymond Hettinger in branch '3.6':
Issue #18844: Make the various ways for specifing weights produce the same 
results.
https://hg.python.org/cpython/rev/32bfc8b6

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-10-13 Thread Roundup Robot

Roundup Robot added the comment:

New changeset d4e715e725ef by Raymond Hettinger in branch '3.6':
Issue #18844:  Add more tests
https://hg.python.org/cpython/rev/d4e715e725ef

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-10-11 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 433cff92d565 by Raymond Hettinger in branch '3.6':
Issue #18844:  Fix-up examples for random.choices().  Remove over-specified 
test.
https://hg.python.org/cpython/rev/433cff92d565

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-09-26 Thread Raymond Hettinger

Raymond Hettinger added the comment:

###
# Flipping a biased coin

from collections import Counter
from random import choices

print(Counter(choices(range(2), [0.9, 0.1], k=1000)))

###
# Bootstrapping

'From a small statistical sample infer a 90% confidence interval for the mean'
# http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm

from statistics import mean
from random import choices

data = 1, 2, 4, 4, 10
means = sorted(mean(choices(data, k=5)) for i in range(20))
print('The sample mean of {:.1f} has a 90% confidence interval from {:.1f} to 
{:.1f}'.format(
  mean(data), means[1], means[-2]))

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-09-26 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Equidistributed examples:

choices(c.execute('SELECT name FROM Employees').fetchall(), k=20)
choices(['hearts', 'diamonds', 'spades', 'clubs'], k=5)
choices(list(product(card_facevalues, suits)), k=5)

Weighted selection examples:

  Counter(choices(['red', 'black', 'green'], [18, 18, 2], k=3800))   # american 
roulette
  Counter(choices(['hit', 'miss'], [5, 1], k=600))   # russian 
roulette
  choices(fetch('employees'), fetch('years_of_service'), k=100)  # tenure 
weighted
  choices(cohort, map(cancer_risk, map(risk_factors, cohort)), k=50) # risk 
weighted

Star unpacking example:

   transpose = lambda s: zip(*s)
   craps = [(2, 1), (3, 2), (4, 3), (5, 4), (6, 5), (7, 6), (8, 5), (9, 4), 
(10, 3), (11, 2), (12, 1)]
   print(choices(*transpose(craps), k=10))

Comparative APIs from other languages:

http://www.mathworks.com/help/stats/randsample.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html
https://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html
https://reference.wolfram.com/language/ref/RandomChoice.html

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-09-26 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 39a4be5e003d by Raymond Hettinger in branch '3.6':
Issue #18844: Make the number of selections a keyword-only argument for 
random.choices().
https://hg.python.org/cpython/rev/39a4be5e003d

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-09-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Using a generator doesn't prevents state to be saved and restored.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-09-07 Thread Raymond Hettinger

Raymond Hettinger added the comment:

There isn't really an option to return a generator because it conflicts the 
rest of the module that uses lists elsewhere and that allows state to be saved 
and restored before and after any function call.  One of the design reviewers 
also said that the generator form would harder for students to use.

I left the text in the examples section unchanged because it is still valid 
(showing how to make a cumulative distribution and how to build a fast 
alternative for the special case of small integer weights). Before the 3.6 
release, I expect to expand this section to provide recipes for a MCMC 
application (using choices() with a passed-in CDF) and some other examples 
suggested by the design reviewers.

The optimization hacks in the other patch don't seem worth it.  The current 
code is readable and runs fast (the principal steps are all C functions).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-09-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

1. Returning a list instead of an iterator looks unpythonic to me. Values 
generated sequentially, there are no advantages of returning a list.

2. An implementation lacks optimizations used in my patch.

3. The documentation still contains a receipt for weighted choice. It is 
incompatible with new function.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-09-06 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Thanks Davin.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-09-06 Thread Roundup Robot

Roundup Robot added the comment:

New changeset a5856153d942 by Raymond Hettinger in branch 'default':
Issue #18844: Add random.weighted_choices()
https://hg.python.org/cpython/rev/a5856153d942

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-09-06 Thread Davin Potts

Davin Potts added the comment:

I've gone through the patch -- looks good to me.

--
nosy: +davin

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-09-06 Thread Raymond Hettinger

Changes by Raymond Hettinger :


Added file: http://bugs.python.org/file44407/weighted_choice2.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-09-06 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Latest draft patch attached (w/o tests or docs).
Incorporates consultation from Alan Downey and Jake Vanderplas.

* Population and weights are separate arguments (like numpy.random.choice() and 
sample() in R).  Matches the way data would arrive in Pandas.  Easily extracted 
from a Counter or dict using keys() and values().  Suitable for applications 
that sample the population multiple times but using different weights.  See 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html and 
http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html

* Excludes a replacement=False option. That use case necessarily has integer 
weights and may be better suited to the existing random.sample() rather than 
trying to recompute a CDF on every iteration as we would have to in this 
function.

* Allows cumulative_weights to be submitted instead of individual weights.  
This supports uses cases where the CDF already exists (as in the ThinkBayes 
examples) and where we want to periodically reuse the same CDF for repeated 
samples of the same population -- this occurs in resampling applications, Gibbs 
sampling, and Monte Carlo Markov Chain applications.  Per Jake, "MCMC/Gibbs 
Sampling approaches generally boil down to a simple weighted coin toss at each 
step" and "It's definitely common to do aggregation of multiple samples, e.g. 
to compute sample statistics"

* The API allows the weights to be integers, fractions, decimals, or floats.  
Likewise, the population and weights can be any Sequence.  Population elements 
need not be hashable.

* Returning a list means that the we don't have to save state in mid-stream 
(that is why we can't use a generator).  A list feeds nicely into Counters, 
mean, median, stdev, etc for summary statistics.  Returning a list parallels 
what random.sample() does, keeping the module internally consistent.

* Default uniform weighting falls back to random.choice() which would be more 
efficient than bisecting.

* Bisecting tends to beat other approaches in the general case.  See 
http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python

* Incorporates error checks for len(population)==len(cum_weights) and for 
conflicting specification of both weights and cumulative weights.

There API is not perfect and there are some aspects that give me heartburn.  1) 
Not saving the computed CDF is waste and forces the user to pre-build the CDF 
if they want to save it for later use (the API could return both the selections 
and the CDF but that would be awkward and atypical).  2) For the common case of 
having small integer weights on a small population, the bisecting approach is 
slower than using random.choice on a population expanded to include the 
selections multiple times in proportion to their weights (that said, short of 
passing in a flag, there is no cheap easy way for this function to detect that 
case and give it a fast path).  3) Outputting a list is inefficient if all 
you're doing with result is summarizing it with a Counter, histogram tool, 
mean, median, or stdev.  4)  There is no cheap way to check to see if the user 
supplied cum_weights is sorted or if the weights contain negative values.

--
Added file: http://bugs.python.org/file44394/weighted_choice.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-08-15 Thread Raymond Hettinger

Raymond Hettinger added the comment:

FWIW, I have four full days set aside for the upcoming pre-feature release 
sprint which is dedicated to taking time to thoughtfully evaluate pending 
feature requests.  In the meantime, I'm contacting Alan Downey for a 
consultation for the best API for this.  As mentioned previously, the generator 
version isn't compatible with the design of the rest of the module that allows 
streams to have their state saved and restored at arbitrary points in the 
sequence.  One API would be to create a list all at once (like random.sample 
does).  Another would be to have two steps (like str.maketrans and 
str.translate).  Ideally, the API should integrate neatly with 
collections.Counter as a possible input for the weighting.  Hopefully, Alan can 
also comment on the relative frequency of small integer weightings versus the 
general case (the former benefits from a design using random.choice() applied 
to Counter.elements() and the latter benefits from a design with accumulate() 
and bisect()).  Note, this
  is a low priority feature (no real demonstrated need, there is already a 
recipe for it in the docs, and once the best API have been determined, the code 
is so simple that any of us could implement it in only a few minutes).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-08-15 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Raymond, any chance to get weighted random choices generator in 3.6? Less than 
month is left to feature code freeze.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-06-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Raymond, do you have a time for this issue?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-04-07 Thread Steven Basart

Steven Basart added the comment:

Left in a line of code that was supposed to be removed. Fixed.

--
Added file: http://bugs.python.org/file42393/weighted_choice_v5.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-04-07 Thread Steven Basart

Changes by Steven Basart :


Removed file: http://bugs.python.org/file42392/weighted_choice_v5.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-04-07 Thread Steven Basart

Steven Basart added the comment:

Re-implemented with suggested improvements taken into account. Thanks 
@mark.dickinson and @pitrou for the suggestions.  

I also removed the redundant "fast path" portion for this code since it doesn't 
deal with generators anyways.

Let me know additional thoughts about it.

--
Added file: http://bugs.python.org/file42392/weighted_choice_v5.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-04-07 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> Suggestion: if you want to go that way, return a single number if `amount` is 
> not provided (so make the default value for `amount` None rather than 1). If 
> `amount=1` is explicitly given, a list containing one item should be returned.

+1

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-04-07 Thread Mark Dickinson

Mark Dickinson added the comment:

> One to make it return a single number if amount == 1 and the other to check 
> that the amount > 1.

Suggestion: if you want to go that way, return a single number if `amount` is 
not provided (so make the default value for `amount` None rather than 1). If 
`amount=1` is explicitly given, a list containing one item should be returned.

I also think there's no reason to raise an exception when `amount = 0`: just 
return an empty list.

For comparison, here's NumPy's "uniform" generator, which generates a scalar if 
the "size" parameter is not given, and an array if "size" is given, even if 
it's 1.

>>> np.random.uniform()
0.4964992470265117
>>> np.random.uniform(size=1)
array([ 0.64817717])
>>> np.random.uniform(size=0)
array([], dtype=float64)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-04-07 Thread Mark Dickinson

Mark Dickinson added the comment:

> One to make it return a single number if amount == 1 and the other to check 
> that the amount > 1.

I think that's a dangerous API. Any code making a call to "weighted_choice(..., 
amount=n)" for variable n now has to be prepared to deal with two possible 
result types. It would be easy to introduce buggy code that fails in the corner 
case n = 1.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-04-06 Thread Steven Basart

Steven Basart added the comment:

I reuploaded the file.  The spacing on the if amount < 1 was off.  Hopefully 
its fixed now.

--
Added file: http://bugs.python.org/file42386/weighted_choice_v4.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-04-06 Thread Steven Basart

Changes by Steven Basart :


Removed file: http://bugs.python.org/file42385/weighted_choice_v4.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-04-06 Thread Steven Basart

Steven Basart added the comment:

Okay so I added a few lines of code.  One to make it return a single number if 
amount == 1 and the other to check that the amount > 1.

The main difference I've noticed between this implementation and previous 
versions compared to say R is that in R they provide a boolean flag to ask if 
sampling with replacement.

Here's there documentation and source code:
https://github.com/wch/r-source/blob/e5b21d0397c607883ff25cca379687b86933d730/src/library/base/man/sample.Rd

https://github.com/wch/r-source/blob/e5b21d0397c607883ff25cca379687b86933d730/src/library/base/R/sample.R

Maybe someone else can comment more on the use cases.  I can only say for 
myself that I've needed this function plenty of times when working with samples 
that have a non uniform distribution.

--
Added file: http://bugs.python.org/file42385/weighted_choice_v3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-04-06 Thread Westley Martínez

Westley Martínez added the comment:

I still like Serhiy's implementation more. A function that returns a list 
instead of the item is unnatural and doesn't fit with the rest of the module.

I think there's need to be some discussion about use cases. What do users 
actually want? Maybe post this on the ideas list.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-04-01 Thread Christian Kleineidam

Christian Kleineidam added the comment:

A user can use map(), filter(), zip() without knowing anything about 
generators. In most cases those function will do their magic and provide a 
finite number of outputs. 

The weighted_choice_generator on the other hand isn't as easy to use. If the 
user wants 5 values from it, they need to know about `take()` from itertools or 
call `next()`.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-03-30 Thread Steven Basart

Steven Basart added the comment:

Hey serhiy.storchaka

I can edit the code to output just one value if called with simply a list and 
then return a list of values if called with the optional amount parameter.  My 
code also needs to check that amount >= 1.  

My code was mostly just to restart this discussion as I personally like the 
idea of the function for weighted choice and would like it to be standard in 
the random library. 

I have no qualms with adding both weighted_choice and weighted_choice_generator 
but my concern is mostly that you are asking too much and it won't go through 
by trying to add two functions at the same time.  The other thing is that I 
believe that weighted_choice could suffice with just one function call.

I just think my last concern is that generators are different from the other 
functions in random.py.  Whereas they are more intuitive and accepted in the 
builtins like map and zip etc.  There isn't any other functions in the random 
library that return that type of object when called. They instead return a 
numerical result.  

Those are my concerns and hence why I rewrote the code.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-03-30 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I disagree.

My patch adds two functions because they serve two different purposes. 
weighted_choice() returns one random value as other functions in the random 
module. weighted_choice_generator() provides more efficient way to generate 
random values, since startup cost is more significant than for other random 
value generators. Generators are widely used in Python, especially in Python 3. 
If they considered confusing, we should deprecate builtins map(), filter(), 
zip() and the itertools module at first place.

Your function, Steven, returns a list containing one random value by default. 
It does not match the interface of other functions in the random module. It 
matches the interface of NumPy random module. In Python you need two separate 
functions, one that returns single value,  and other that returns a list of 
values. But returning iterator and generating values by demand is more 
preferable in Python 3. Generatorsa are more flexible. With 
weighted_choice_generator() it is easy to get the result of your function: 
list(islice(weighted_choice_generator(data), amount)). But generating dynamic 
amount of values with your interface is impossible.

Raymond, if you have now free time, could you please make a review of 
weighted_choice_generator_2.patch?

--
keywords: +needs review
versions: +Python 3.6 -Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-03-30 Thread Steven Basart

Steven Basart added the comment:

Hello rhettinger.  I filled out the form thanks for letting me know about it.  
Is there anything else I have to do?

Hey serhiy.storchaka

There were several things "wrong" with the previous implementation in my 
opinion. 

1st they tried to add too much.  Which would if allowed would clutter up the 
random library if every function had both it's implementation as well as an 
accompanied generator.  The other problem being that both were attempted to be 
made as callable to the public API.  I would prefer the generator if present to 
be hidden and would also have to be more sophisticated to be able to check if 
it was being called with new input.

2nd by adding in the generator to the pulbic API of the random library it makes 
it far more confusing and obfuscates the true purpose of this function anyways 
which is to get a weighted choice.  

So basically there is nothing wrong with generators but they don't necessarily 
belong here so I removed it to try to get back to the core principles of what 
the function should be doing, by making it simpler.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-03-29 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

What is wrong with generators?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-03-29 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Thanks for the patch.   Can I get you to fill out a contributor agreement?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-03-29 Thread Steven Basart

Steven Basart added the comment:

The entire function of weighted choice. I removed the generator and replaced it 
by adding an optional argument to specify an amount by which you want to call 
this function.

--
Added file: http://bugs.python.org/file42323/weighted_choice_v3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2016-03-29 Thread Steven Basart

Steven Basart added the comment:

Reopen this idea but removing the generator from weighted choice.

--
nosy: +progressive-o
Added file: http://bugs.python.org/file42322/weighted_choice_v3.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-09-14 Thread Christian Kleineidam

Christian Kleineidam added the comment:

I like the idea of adding a weights keyword to choice and creating an 
additional choice_generator() that also takes weights.

A choice_generator() could take a further argument to allow unique choices and 
be a generator version of sample().

In some cases you want to draw randomly from a sequence till you draw an item 
that fulfills certain criteria. At the moment neither the sample nor the 
shuffle method seems to be optimal for that use case.

Given that items are commonly drawn from an urn in math, urn seems a good 
alternative for choice_generator().

random.urn(data, *, weights=None, unique=False)

--
nosy: +Christian.Kleineidam

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-08-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Updated patch. Synchronized with tip and added optimizations.

--
Added file: http://bugs.python.org/file36331/weighted_choice_generator_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-08-10 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I'm adverse to adding the generator magic and the level of complexity in this 
patch.  Please aim for the approach I outlined above (one function to build 
cumulative weights and another function to choose the value).

Since this isn't a new problem, please take a look at how other languages and 
packages have solved the problem.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-08-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Other languages have no such handly feature as generators. NumPy provides the 
size parameter to all functions and generates a bunch of random numbers at 
time. This doesn't look pythonic (Python stdlib prefers iterators).

I believe a generator is most Pythonic and most handly solution of this issue 
on Python. And it is efficient enough.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-08-10 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I agree with Serhiy. There is nothing magic about generators in Python. Also, 
the concept of an infinite stream of random numbers (or random whatevers) is 
perfectly common (/dev/urandom being an obvious example); it is not a notion we 
are inventing.

By contrast, the two-function approach only makes things clumsier for people 
since they have to remember to combine them.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-08-10 Thread Raymond Hettinger

Raymond Hettinger added the comment:

When I get a chance, I'll work up an approach that is consistent with the rest 
of the module in terms of implementation, restartability, and API.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-08-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Raymond, what is your opinion?

--
versions: +Python 3.5 -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-08-06 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I don't want to speak for Raymond, but the proposed API looks good, and it 
seems Roulette Wheel 2 should be the implementation choice given its 
characteristics (simple, reasonably good and balanced performance).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-08-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Roulette Wheel 2 has twice slower initializations than Roulette Wheel, but 
then generates every new item twice faster.

It is possible to implement hybrid generator, which yields first item using 
Roulette Wheel, and then rescales cumulative_dist and continues with 
Roulette Wheel 2. It will be so fast as Roulette Wheel for generating only 
one item and so fast as Roulette Wheel 2 for generating multiple items.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-08-06 Thread Antoine Pitrou

Antoine Pitrou added the comment:

The setup cost of RW2 should always be within a small constant multiplier of 
RW's, so I'm not sure it's worth the hassle to complicate things. But it's your 
patch :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-08-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Non-generator weighted_choice() function is purposed to produce exactly one 
item. This is a use case for such optimization.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-07-23 Thread Mark Dickinson

Mark Dickinson added the comment:

Closed issue 22048 as a duplicate of this one.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2014-07-23 Thread Mark Dickinson

Changes by Mark Dickinson dicki...@gmail.com:


--
nosy: +dkorchem

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Most existing implementation produce just index. That is why weighted_choice() 
accepts singular weights list and returns index. On the other hand, I think 
working with mapping will be wished feature too (especially because Counter is 
in stdlib). Indexable sequences and mappings are similar. In both cases 
weighted_choice() returns value which can be used as index/key of input 
argument.

If you need choice an element from some sequence, just use 
seq[weighted_choice(weights)]. Actually weighted_choice() has no common code 
with choice() and has too different use cases. They should be dissimilar as far 
as possible. Perhaps we even should avoid the choice part in function names 
(are there any ideas?) to accent this.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-24 Thread Madison May

Madison May added the comment:

You have me convinced, Serhiy.  I see the value in making the two functions 
distinct.

For naming purposes, perhaps weighted_index() would be more descriptive.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-15 Thread Madison May

Madison May added the comment:

Serhiy, from a technical standpoint, your latest patch looks like a solid 
solution.  From an module design standpoint we still have a few options to 
think through, though. What if random.weighted_choice_generator was moved to 
random.choice_generator and refactored to take an array of weights as an 
optional argument?  Likewise, random.weighted_choice could still be implemented 
with an optional arg to random.choice.  Here's the pros and cons of each 
implementation as I see them.

Implementation: weighted_choice_generator + weighted_choice
Pros: 
Distinct functions help indicate that weighted_choice should be used in a 
different manner than choice -- [weighted_choice(x) for _ in range(n)] isn't 
efficient.
Can take Mapping or Sequence as argument.
Has a single parameter
Cons:
Key, not value, is returned
Requires two new functions
Dissimilar to random.choice
Long function name (weighted_choice_generator)

Implementation: choice_generator + optional arg to choice
Pros: 
Builds on existing code layout
Value returned directly
Only a single new function required
More compact function name

Cons:
Difficult to support Mappings
Two args required for choice_generator and random.choice
Users may use [choice(x, weights) for _ in range(n)] expecting efficient results

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-15 Thread Westley Martínez

Westley Martínez added the comment:

I think Storchaka's solution is more transparent and I agree with him on the 
point that the choice generator should be exposed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-15 Thread Madison May

Madison May added the comment:

  I think Storchaka's solution is more transparent and I agree with him on the 
 point that the choice generator should be exposed.

Valid point -- transparency should be priority #1

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-13 Thread Eli Bendersky

Changes by Eli Bendersky eli...@gmail.com:


--
nosy:  -eli.bendersky

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you Neil. It is interesting.

Vose's alias method has followed disadvantages (in comparison with the roulette 
wheel selection proposed above):

1. It operates with probabilities and uses floats, therefore it can be a little 
less accurate.

2. It consumes two random number (an integer and a float) for generating one 
sample. It can be fixed however (in the cost of additional precision lost).

3. While it has same time and memory O(n) cost for initialization, it has 
larger multiplication, Vose's alias method requires several times larger time 
and memory for initialization.

4. It requires more memory in process of generating samples.

However it has an advantage. It really has constant time cost to generate each 
sample.

Here are some benchmark results. Roulette Wheel is proposed above 
implementation. Roulette Wheel 2 is its modification with normalized 
cumulative sums. It has twice more initialization time, but 1.5-2x faster 
generates each sample. Vose's Alias is an implementation of Vose's alias 
method directly translated from Java. Vose's Alias 2 is optimized 
implementation which uses Python specific.

Second column is a size of distribution, third column is initialization time 
(in milliseconds), fourth column is time to generate each sample (in 
microseconds), fifth column is a number of generated samples after which this 
method will overtake Roulette Wheel (including initialization time).

Roulette Wheel  10 0.059 7.165 0
Roulette Wheel 210 0.076 4.105 5
Vose's Alias10 0.12913.206 -
Vose's Alias 2  10 0.105 6.50169
Roulette Wheel 100 0.128 8.651 0
Roulette Wheel 2   100 0.198 4.63017
Vose's Alias   100 0.69112.839 -
Vose's Alias 2 100 0.441 6.547   148
Roulette Wheel1000 0.71910.949 0
Roulette Wheel 2  1000 1.458 5.177   128
Vose's Alias  1000 6.61413.052 -
Vose's Alias 21000 3.704 6.531   675
Roulette Wheel   1 7.49513.249 0
Roulette Wheel 2 114.961 6.051  1037
Vose's Alias 169.93713.830 -
Vose's Alias 2   137.017 6.746  4539
Roulette Wheel  1073.98816.180 0
Roulette Wheel 210   148.176 8.182  9275
Vose's Alias10   690.09913.808259716
Vose's Alias 2  10   391.367 7.095 34932
Roulette Wheel 100   743.41519.493 0
Roulette Wheel 2   100  1505.409 8.930 72138
Vose's Alias   100  7017.66913.798   1101673
Vose's Alias 2 100  4044.746 7.152267507

As you can see Vose's alias method has very large initialization time. 
Non-optimized version will never overtake Roulette Wheel with small 
distributions (10), and even optimized version will never overtake 
Roulette Wheel with small distributions (10). Only with very large 
distributions Vose's alias method has an advantage (when you needs very larger 
number of samples).

Because for generating only one sample we need a method with fastest 
initialization we need Roulette Wheel implementation. And because large 
distributions are rare, I think there is no need in alternative implementation. 
In worst case for generating 100 samples from 100-elements distribution 
the difference between Roulette Wheel and Vose's Alias 2 is a difference 
between 20 and 11 seconds.

--
Added file: http://bugs.python.org/file31734/wcg_bench.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The proposed patch add two methods to the Random class and two module level 
functions: weighted_choice() and weighted_choice_generator().

weighted_choice(data) accepts either mapping or sequence and returns a key or 
index x with probability which is proportional to data[x].

If you need several elements with same distribution, use 
weighted_choice_generator(data) which returns an iterator which produces random 
keys or indices of the data. It is more faster than calling 
weighted_choice(data) repeatedly and is more flexible than generating a list of 
random values at specified size (as in NumPy).

--
Added file: http://bugs.python.org/file31732/weighted_choice_generator.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-11 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
stage:  - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-11 Thread Neil Girdhar

Neil Girdhar added the comment:

Should this really be implemented using the cumulative distribution and binary 
search algorithm?  Vose's Alias Method has the same initialization and memory 
usage cost (O(n)), but is constant time to generate each sample.

An excellent tutorial is here: http://www.keithschwarz.com/darts-dice-coins/

--
nosy: +neil.g

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-01 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 A more efficient approach for many use-cases would do the precomputation 
 once, returning some kind of 'distribution' object from which samples can be 
 generated.

I like the idea about adding a family of distribution generators. They should 
check input parameters and make a precomputation and then generate infinite 
sequence of specially distributed random numbers.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-01 Thread Madison May

Madison May added the comment:

[Raymond Hettinger]
 The sticking point is going to be that we don't want to recompute the 
 cumulative weights for every call to weighted_choice.

 So there should probably be two functions:

  cw = make_cumulate_weights(weight_list) 
  x = choice(choice_list, cw)

That's pretty much how I broke things up when I decided to test out 
optimization with lru_cache.  That version of the patch is now attached.

[Serhiy Storchaka]
 I like the idea about adding a family of distribution generators. 
 They should check input parameters and make a precomputation and then  
 generate infinite sequence of specially distributed random numbers.

Would these distribution generators be implemented internally (see attached 
patch) or publicly exposed?

--
Added file: http://bugs.python.org/file31546/weighted_choice_v2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-01 Thread Madison May

Changes by Madison May madison@students.olin.edu:


Removed file: http://bugs.python.org/file31546/weighted_choice_v2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-01 Thread Madison May

Changes by Madison May madison@students.olin.edu:


Added file: http://bugs.python.org/file31547/weighted_choice_v2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-01 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 Would these distribution generators be implemented internally (see attached 
 patch) or publicly exposed?

See issue18900. Even if this proposition will be rejected I think we should 
publicly expose weighted choice_generator(). A generator or a builder which 
returns function are only ways how efficiently implement this feature. Use 
lru_cache isn't good because several choice generators can be used in a program 
and because it left large data in a cache long time after it was used.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-01 Thread Madison May

Madison May added the comment:

 Use lru_cache isn't good because several choice generators can be used in a 
 program and because it left large data in a cache long time after it was used.

Yeah, I just did a quick search of the stdlib and only found one instance of 
lru_cache in use -- another sign that lru_cache is a bad choice.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-01 Thread Raymond Hettinger

Raymond Hettinger added the comment:

 I like the idea about adding a family of distribution generators

Let's stay focused on the OP's feature request for a weighted version of 
choice().

For the most part, it's not a good idea to just add a family of anything to 
the standard library.  We wait for user requests and use cases to guide the 
design and error on the side of less, rather than more.  This helps avoid 
bloat.   Also, it would be a good idea to start something like this as a 
third-party to module to let it iterate and mature before deciding whether 
there was sufficient user uptake to warrant inclusion in the standard library.

For the current request, we should also do some research on existing solutions 
in other languages.  This isn't new territory.  What do R, SciPy, Fortran, 
Matlab or other statistical packages already do?  Their experiences can be used 
to inform our design.  Alan Kay's big criticism of Python developers is that 
they have a strong propensity invent from scratch rather than taking advantage 
of the mountain of work done by the developers who came before them.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-01 Thread Madison May

Madison May added the comment:

 What do R, SciPy, Fortran, Matlab or other statistical packages already do? 

Numpy avoids recalculating the cumulative distribution by introducing a 'size' 
argument to numpy.random.choice().  The cumulative distribution is calculated 
once, then 'size' random choices are generated and returned.

Their overall implementation is quite similar to the method suggested in the 
python docs.  

 choices, weights = zip(*weighted_choices)
 cumdist = list(itertools.accumulate(weights))
 x = random.random() * cumdist[-1]
 choices[bisect.bisect(cumdist, x)]

The addition of a 'size' argument to random.choice() has already been discussed 
(and rejected) in Issue18414, but this was on the grounds that the standard 
idiom for generating a list of random choices ([random.choice(seq) for i in 
range(k)]) is obvious and efficient.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-01 Thread Westley Martínez

Westley Martínez added the comment:

Honestly, I think adding weights to any of the random functions are trivial 
enough to implement as is.  Just because something becomes a common task does 
not mean it ought to be added to the stdlib.

Anyway, from a user point of view, I think it'd be useful to be able to send a 
sequence to a function that'll weight the sequence for use by random.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-09-01 Thread Madison May

Madison May added the comment:

Just ran across a great blog post on the topic of weighted random generation 
from Eli Bendersky for anyone interested:
http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python/

--
nosy: +eli.bendersky

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-08-31 Thread Raymond Hettinger

Raymond Hettinger added the comment:

+1 for the overall idea.  I'll take a detailed look at the patch when I get a 
chance.

--
assignee:  - rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-08-31 Thread Raymond Hettinger

Raymond Hettinger added the comment:

The sticking point is going to be that we don't want to recompute the 
cumulative weights for every call to weighted_choice.

So there should probably be two functions:

  cw = make_cumulate_weights(weight_list) 
  x = choice(choice_list, cw)

This is similar to what was done with string.maketrans() and str.translate().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-08-31 Thread Westley Martínez

Changes by Westley Martínez aniko...@gmail.com:


--
nosy: +anikom15

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-08-30 Thread Mark Dickinson

Mark Dickinson added the comment:

[Madison May]
  - Should negative weights cause a ValueError to be raised, or should they be 
 converted to 0s?
  - Should passing a list full of zeros as the weights arg raise a ValueError 
 or be treated as if no weights arg was passed?

Both those seem like clear error conditions to me, though I think it would be 
fine if the second condition produced a ZeroDivisionError rather than a 
ValueError.

I'm not 100% sold on the feature request.  For one thing, the direct 
implementation is going to be inefficient for repeated sampling, building the 
table of cumulative sums each time random.choice is called.  A more efficient 
approach for many use-cases would do the precomputation once, returning some 
kind of 'distribution' object from which samples can be generated.  (Walker's 
aliasing method is one route for doing this efficiently, though there are 
others.)  I agree that this is a commonly needed and commonly requested 
operation;  I'm just not convinced either that an efficient implementation fits 
well into the random module, or that it makes sense to add an inefficient 
implementation.

--
nosy: +tim.peters

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-08-30 Thread Madison May

Madison May added the comment:

[Mark Dickinson]
 Both those seem like clear error conditions to me, though I think it would be 
 fine if the second condition produced a ZeroDivisionError rather than a 
 ValueError.

Yeah, in hindsight it makes sense that both of those conditions should raise 
errors.  After all: Explicit is better than implicit.

As far as optimization goes, could we potentially use functools.lru_cache to 
cache the cumulative distribution produced by the weights argument and optimize 
repeated sampling? 

Without @lru_cache:
 timeit.timeit(x = choice(list(range(100)), list(range(100))), setup=from 
 random import choice, number=10)
36.7109281539997

With @lru_cache(max=128):
 timeit.timeit(x = choice(list(range(100)), list(range(100))), setup=from 
 random import choice, number=10)
6.6788657720007905

Of course it's a contrived example, but you get the idea.

Walker's aliasing method looks intriguing.  I'll have to give it a closer look. 
 

I agree that an efficient implementation would be preferable but would feel out 
of place in random because of the return type.  I still believe a relatively 
inefficient addition to random.choice would be valuable, though.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-08-26 Thread Alan Isaac

New submission from Alan Isaac:

The need for weighted random choices is so common that it is addressed as a 
common task in the docs:
http://docs.python.org/dev/library/random.html

This enhancement request is to add an optional argument to random.choice, which 
must be a sequence of non-negative numbers (the weights) having the same length 
as the main argument.

--
messages: 196229
nosy: aisaac
priority: normal
severity: normal
status: open
title: allow weights in random.choice
type: enhancement

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-08-26 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
components: +Library (Lib)
nosy: +mark.dickinson, rhettinger, serhiy.storchaka
versions: +Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-08-26 Thread Madison May

Madison May added the comment:

+1. I've found myself in need of this feature often enough to wonder why it's 
not part of the stdlib.

--
nosy: +madison.may

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-08-26 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Agreed with the feature request. The itertools dance won't be easy to 
understand, for many people.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18844] allow weights in random.choice

2013-08-26 Thread Madison May

Madison May added the comment:

I realize its probably quite early to begin putting a patch together, but 
here's some preliminary code for anyone interested.  It builds off of the 
common task example in the docs and adds in validation for the weights list.

There are a few design decisions I'd like to hash out.  
In particular: 

  - Should negative weights cause a ValueError to be raised, or should they be 
converted to 0s?
  - Should passing a list full of zeros as the weights arg raise a ValueError 
or be treated as if no weights arg was passed?

--
keywords: +patch
Added file: http://bugs.python.org/file31479/weighted_choice.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com