New submission from Raymond Hettinger <raymond.hettin...@gmail.com>:
Weighted sampling without replacement isn't easy for a user to do with the current tooling. There is a StackOverflow question about how to do it. Also, this service is currently offered by numpy. Use it like this: >>> sample(['katniss', 'prim', 'gale', 'peeta'] , weights=[20, 1, 42, 10], k=2) ['prim', 'peeta'] Here's an efficient implementation similar to how numpy does it: --- a/Lib/random.py +++ b/Lib/random.py @@ -331,7 +331,7 @@ class Random(_random.Random): j = _int(random() * (i+1)) x[i], x[j] = x[j], x[i] - def sample(self, population, k, *, counts=None): + def sample(self, population, k, *, counts=None, weights=None): """Chooses k unique random elements from a population sequence or set. Returns a new list containing elements from the population while @@ -392,6 +392,18 @@ class Random(_random.Random): if not isinstance(population, _Sequence): raise TypeError("Population must be a sequence. For dicts or sets, use sorted(d).") n = len(population) + if weights is not None: + if counts is not None: + raise TypeError('Cannot specify both counts and weights') + weights = list(weights) + positions = range(n) + indices = [] + while (needed := k - len(indices)): + for i in choices(positions, weights, k=needed): + if weights[i]: + weights[i] = 0.0 + indices.append(i) + return [population[i] for i in indices] if counts is not None: cum_counts = list(_accumulate(counts)) if len(cum_counts) != n: ---------- components: Library (Lib) messages: 368458 nosy: mark.dickinson, rhettinger, tim.peters priority: normal severity: normal status: open title: Add optional weights to random.sample() type: enhancement versions: Python 3.9 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue40569> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com