[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-03 Thread Chris Angelico
On Tue, Aug 4, 2020 at 10:54 AM Steven D'Aprano  wrote:
> Yes, I remember the last time I played poker with some friends, and the
> dealer handed me the deck of cards and asked me to take a sample of 52
> cards *wink*

Most dealers want you to shuffle the deck *in place*. Although I'd be
highly amused to watch a group of computer scientists playing poker,
and starting out with a Fisher-Yates...

For the case of "create a new list that is a random permutation of
these items", I don't personally see a problem with (1) create a new
list, and then (2) shuffle that new list. If the naming bothers you,
don't call it shuffled_numbers at all - call it something based on its
purpose later on!

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FAQFXWDZUV6MSAGH5XHYGT2JLOIE6EIP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-03 Thread Christopher Barker
On Mon, Aug 3, 2020 at 5:50 PM Steven D'Aprano  wrote:

> According to my testing in Python 3.8, the version with sample is about
> 10% slower than the "shuffled" helper I gave.


I got similar results, but my conclusion was that 10% isn’t significant:-)

Is it likely to be run in an inner loop?

- CHB
-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SYCF7J5CWZQXKJ4OWEFSSTID3VIT3HZH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-03 Thread Steven D'Aprano
On Mon, Aug 03, 2020 at 03:04:40AM -, raymond.hettin...@gmail.com wrote:
> Steven D'Aprano wrote:
> > > This is easily solved with a three-line helper:
> > def shuffled(iterable):
>  ...
> > I have implemented this probably a half a dozen times, and I expect 
> > others have too.
> 

> FWIW, we've already documented a clean way to do it, 
> https://docs.python.org/3/library/random.html#random.shuffle , "To 
> shuffle an immutable sequence and return a new shuffled list, use 
> sample(x, k=len(x)) instead."

Yes, I remember the last time I played poker with some friends, and the 
dealer handed me the deck of cards and asked me to take a sample of 52 
cards *wink*

While you are technically correct that a sample of N from a sequence of 
length N is equivalent to shuffling, that's not a particularly obvious 
thing to do, and the semantics of shuffling and sampling are not the 
same. Hence the need to document it.

According to my testing in Python 3.8, the version with sample is about 
10% slower than the "shuffled" helper I gave. That wouldn't be too bad 
if the operation was fast, but for a sequence of 30,000 items on my 
computer, that takes nearly half a second. So a 10% slowdown is quite 
significant.

I think I'll continue using my shuffled helper function, and while I 
personally won't re-raise this issue, I'll continue to give it my 
support next time somebody raises it.


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BDMLDARYAUQ3O3KJQ4X4JDKQWXKZREC6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-03 Thread Steven D'Aprano
On Mon, Aug 03, 2020 at 08:50:32AM -, raymond.hettin...@gmail.com wrote:

> Please also consider that we thought about all of this when sample() 
> was first created.  The current API is intentional.  As you noted, 
> this suggestion was also already rejected on the bug tracker.  So, 
> this thread seems like an attempt to second guess that outcome as well 
> as the original design decision.  If you're going to do something like 
> that, save it for something important :-)

The difficulty is judging when something is important or not, and 
that's part of the purpose of posting here :-)


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BYMIYCVCFZI7VTAALWNOKM4FVELZMAMH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-03 Thread Christopher Barker
On Sun, Aug 2, 2020 at 8:05 PM  wrote:

> FWIW, we've already documented a clean way to do it,
> https://docs.python.org/3/library/random.html#random.shuffle , "To
> shuffle an immutable sequence and return a new shuffled list, use sample(x,
> k=len(x)) instead."
>

one downside of this is that it won't work on a non-sized iterable -- but I
suppose that's not really an important use-case. It Is a use case, though,
'cause while a shuffled collection is going to be sized by definition, the
source could be a generator or some other non-sized iterable. But not hard
to "realize" the iterable first by making it a list or tuple.

My other question was about performance. Without looking at the code, I
thought it *might* be faster to shuffle than build up a list with multiple
samples. but in profiling, the sample version is only about 30% slower.
(for this one example :-) )

In [13]: def shuffled_1(it):
...: result = list(it)
...: random.shuffle(result)
...: return result

In [14]: def shuffled_2(it):
...: return random.sample(it, k=len(it))

In [15]: %timeit shuffled_1(population)
3.71 ms ± 30.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [16]: %timeit shuffled_2(population)
4.23 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

where:

In [17]: population
Out[17]: range(0, 1)

So yeah, this is a fine solution.

-CHB


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/B46HMCSSNUTH4W7EZNQX7AO63XYIB5ZX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-03 Thread Ricky Teachey
On Mon, Aug 3, 2020 at 8:26 AM Ram Rachum  wrote:

> On Mon, Aug 3, 2020 at 3:20 PM  wrote:
>
>> Ram Rachum wrote:.
>> > I notice that the random.sample function doesn't have a default behavior
>> > set when you don't specify k. This is fortunate, because we could make
>> > that behavior just automatically take the length of the first argument.
>> So
>> > we could do this:
>> > shuffled_numbers = random.sample(range(10, 10 ** 5))
>> > What do you think?
>>
>> This is bad API design.  The most likely user mistake is to omit the *k*
>> argument.  We want that to be an error.  It is common to sample from large
>> populations, we don't want the default to do anything terrible — for
>> example, you're in a Jupyter notebook and type "sample(range(10_000_000))"
>> and forget to enter the sample size.
>>
>> Also, having *k* default to the population size would be surprisingly
>> inconsistent given that choices() has a default k=1.  API design principle:
>> don't have unexpectedly different defaults in related functions.
>>
>
> Hmm, yes, I agree with both these points.
>
> I do think that `sample(x, k=len(x))` is cumbersome when `x` is not a
> variable but defined inline. But I guess I'll let this one go.
>

I've found it cumbersome in the past myself, but an easy way around that
now is the walrus:

`sample(_:=[1,2,3], len(_))`

---
Ricky.

"I've never met a Kentucky man who wasn't either thinking about going home
or actually going home." - Happy Chandler
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SKBUCZPABJM7B7MEE6IYLQCTOYHZLRUS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-03 Thread Ram Rachum
On Mon, Aug 3, 2020 at 3:20 PM  wrote:

> Ram Rachum wrote:.
> > I notice that the random.sample function doesn't have a default behavior
> > set when you don't specify k. This is fortunate, because we could make
> > that behavior just automatically take the length of the first argument.
> So
> > we could do this:
> > shuffled_numbers = random.sample(range(10, 10 ** 5))
> > What do you think?
>
> This is bad API design.  The most likely user mistake is to omit the *k*
> argument.  We want that to be an error.  It is common to sample from large
> populations, we don't want the default to do anything terrible — for
> example, you're in a Jupyter notebook and type "sample(range(10_000_000))"
> and forget to enter the sample size.
>
> Also, having *k* default to the population size would be surprisingly
> inconsistent given that choices() has a default k=1.  API design principle:
> don't have unexpectedly different defaults in related functions.
>

Hmm, yes, I agree with both these points.

I do think that `sample(x, k=len(x))` is cumbersome when `x` is not a
variable but defined inline. But I guess I'll let this one go.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/V3A6KXSFJTKQ43MMBTNYKK4PQP4WNNGX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-03 Thread raymond . hettinger
Ram Rachum wrote:.
> I notice that the random.sample function doesn't have a default behavior
> set when you don't specify k. This is fortunate, because we could make
> that behavior just automatically take the length of the first argument. So
> we could do this:
> shuffled_numbers = random.sample(range(10, 10 ** 5))
> What do you think?

This is bad API design.  The most likely user mistake is to omit the *k* 
argument.  We want that to be an error.  It is common to sample from large 
populations, we don't want the default to do anything terrible — for example, 
you're in a Jupyter notebook and type "sample(range(10_000_000))" and forget to 
enter the sample size.

Also, having *k* default to the population size would be surprisingly 
inconsistent given that choices() has a default k=1.  API design principle: 
don't have unexpectedly different defaults in related functions.

Lastly, the use for in-line shuffling is not the primary use case.  If there 
were a default argument, it should cater to the principal use case,.  API 
design principle:  don't do anything weird or unexpected by default.

IMO you're trying too hard to jam a round peg into a square hole. There isn't a 
substantive problem being solved — being explicit by writing "sample(p, 
len(p))"  instead of "sample(p)" isn't an undue burden.

Please also consider that we thought about all of this when sample() was first 
created.  The current API is intentional.  As you noted, this suggestion was 
also already rejected on the bug tracker.  So, this thread seems like an 
attempt to second guess that outcome as well as the original design decision.  
If you're going to do something like that, save it for something important :-)


Raymond
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/K4RQTFYD43OHQTSCWC32R2KYFQGXHR36/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-02 Thread raymond . hettinger
Steven D'Aprano wrote:
> > This is easily solved with a three-line helper:
> def shuffled(iterable):
 ...
> I have implemented this probably a half a dozen times, and I expect 
> others have too.

FWIW, we've already documented a clean way to do it, 
https://docs.python.org/3/library/random.html#random.shuffle , "To shuffle an 
immutable sequence and return a new shuffled list, use sample(x, k=len(x)) 
instead."

>>> data = 'random module'
>>> ''.join(sample(data, len(data)))
'uaemdor odmln'

Given that we already have shuffle() and sample(), I really don't think we need 
a third way to it.  How about we save API extensions for ideas that add genuine 
new, useful capabilities.

Raymond
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VVKZU6ABPBYZORXMURCIHBZZRNRREMIS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-02 Thread Ram Rachum
I submitted a patch now, but Serhiy showed me that it's already been
proposed before, and rejected by Raymond Hettinger and Terry Reedy in
issues 26393 and 27964.

On Sun, Aug 2, 2020 at 8:05 AM Steven D'Aprano  wrote:

> On Sat, Aug 01, 2020 at 08:54:16PM +0300, Ram Rachum wrote:
>
> > When writing some code now, I needed to produce a shuffled version of
> > `range(10, 10 ** 5)`.
> >
> > This is one way to do it:
> >
> > shuffled_numbers = list(range(10, 10 ** 5))
> > random.shuffle(shuffled_numbers)
> >
> >
> > I don't like it because (1) it's too imperative and (2) I'm calling the
> > list "shuffled" even before it's shuffled.
>
> This is easily solved with a three-line helper:
>
> def shuffled(iterable):
> L = list(iterable)
> random.shuffle(L)
> return L
>
> I have implemented this probably a half a dozen times, and I expect
> others have too. I agree with Alex that this would make a nice addition
> to the random module.
>
>
> --
> Steven
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/47JMNMYPEETQFKPDK4OVLGM2IXCQ4GIA/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QSJGCZLIGPELSSBKX7TJJIT7ZPPV4LXT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-01 Thread Steven D'Aprano
On Sat, Aug 01, 2020 at 08:54:16PM +0300, Ram Rachum wrote:

> When writing some code now, I needed to produce a shuffled version of
> `range(10, 10 ** 5)`.
> 
> This is one way to do it:
> 
> shuffled_numbers = list(range(10, 10 ** 5))
> random.shuffle(shuffled_numbers)
> 
> 
> I don't like it because (1) it's too imperative and (2) I'm calling the
> list "shuffled" even before it's shuffled.

This is easily solved with a three-line helper:

def shuffled(iterable):
L = list(iterable)
random.shuffle(L)
return L

I have implemented this probably a half a dozen times, and I expect 
others have too. I agree with Alex that this would make a nice addition 
to the random module.


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/47JMNMYPEETQFKPDK4OVLGM2IXCQ4GIA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-01 Thread Neil Girdhar
Can you not just use 
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.random_permutation
 ?

On Saturday, August 1, 2020 at 2:26:23 PM UTC-4 Ram Rachum wrote:

> I would also prefer a `random.shuffled` function. The reason I didn't 
> propose it is because there's usually more resistance for adding new 
> functions. But in my view that'll be the best solution.
>
> On Sat, Aug 1, 2020 at 9:17 PM Alex Hall  wrote:
>
>> I agree that calling random.shuffle imperatively is annoying. But I don't 
>> think your proposed solution is readable. You're not taking a sample. A 
>> sample generally implies a strict subset, usually quite a small one.
>>
>> I've often thought there should just be a `random.shuffled()` function 
>> which returns a shuffled copy, similar to `.sort()` and `sorted()` or 
>> `.reverse()` and `reversed()`.
>>
>> On Sat, Aug 1, 2020 at 7:59 PM Ram Rachum  wrote:
>>
>>> When writing some code now, I needed to produce a shuffled version of 
>>> `range(10, 10 ** 5)`.
>>>
>>> This is one way to do it: 
>>>
>>> shuffled_numbers = list(range(10, 10 ** 5))
>>> random.shuffle(shuffled_numbers)
>>>
>>>
>>> I don't like it because (1) it's too imperative and (2) I'm calling the 
>>> list "shuffled" even before it's shuffled.
>>>
>>> Another solution is this: 
>>>
>>> shuffled_numbers = random.sample(range(10, 10 ** 5), k=len(range(10, 10 
>>> ** 5)))
>>>
>>> This is better because it solves the 2 points above. However, it is 
>>> quite cumbersome.
>>>
>>> I notice that the `random.sample` function doesn't have a default 
>>> behavior set when you don't specify `k`. This is fortunate, because we 
>>> could make that behavior just automatically take the length of the first 
>>> argument. So we could do this: 
>>>
>>> shuffled_numbers = random.sample(range(10, 10 ** 5))
>>>
>>> What do you think? 
>>>
>>>
>>> Thanks,
>>> Ram.
>>> ___
>>> Python-ideas mailing list -- python...@python.org
>>> To unsubscribe send an email to python-id...@python.org
>>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>>> Message archived at 
>>> https://mail.python.org/archives/list/python...@python.org/message/OHLXVKIBMNSQO6BCFK6LEHSYNXDB6OQJ/
>>>  
>>> 
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RG346EKJMPYZAI6PHCCZRKOIJUIML3HB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-01 Thread Ram Rachum
I would also prefer a `random.shuffled` function. The reason I didn't
propose it is because there's usually more resistance for adding new
functions. But in my view that'll be the best solution.

On Sat, Aug 1, 2020 at 9:17 PM Alex Hall  wrote:

> I agree that calling random.shuffle imperatively is annoying. But I don't
> think your proposed solution is readable. You're not taking a sample. A
> sample generally implies a strict subset, usually quite a small one.
>
> I've often thought there should just be a `random.shuffled()` function
> which returns a shuffled copy, similar to `.sort()` and `sorted()` or
> `.reverse()` and `reversed()`.
>
> On Sat, Aug 1, 2020 at 7:59 PM Ram Rachum  wrote:
>
>> When writing some code now, I needed to produce a shuffled version of
>> `range(10, 10 ** 5)`.
>>
>> This is one way to do it:
>>
>> shuffled_numbers = list(range(10, 10 ** 5))
>> random.shuffle(shuffled_numbers)
>>
>>
>> I don't like it because (1) it's too imperative and (2) I'm calling the
>> list "shuffled" even before it's shuffled.
>>
>> Another solution is this:
>>
>> shuffled_numbers = random.sample(range(10, 10 ** 5), k=len(range(10, 10
>> ** 5)))
>>
>> This is better because it solves the 2 points above. However, it is quite
>> cumbersome.
>>
>> I notice that the `random.sample` function doesn't have a default
>> behavior set when you don't specify `k`. This is fortunate, because we
>> could make that behavior just automatically take the length of the first
>> argument. So we could do this:
>>
>> shuffled_numbers = random.sample(range(10, 10 ** 5))
>>
>> What do you think?
>>
>>
>> Thanks,
>> Ram.
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/OHLXVKIBMNSQO6BCFK6LEHSYNXDB6OQJ/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RFJHHVAWYAMSRRD5ZYGH7VHAOKMNAR67/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Default behavior for random.sample when no k

2020-08-01 Thread Alex Hall
I agree that calling random.shuffle imperatively is annoying. But I don't
think your proposed solution is readable. You're not taking a sample. A
sample generally implies a strict subset, usually quite a small one.

I've often thought there should just be a `random.shuffled()` function
which returns a shuffled copy, similar to `.sort()` and `sorted()` or
`.reverse()` and `reversed()`.

On Sat, Aug 1, 2020 at 7:59 PM Ram Rachum  wrote:

> When writing some code now, I needed to produce a shuffled version of
> `range(10, 10 ** 5)`.
>
> This is one way to do it:
>
> shuffled_numbers = list(range(10, 10 ** 5))
> random.shuffle(shuffled_numbers)
>
>
> I don't like it because (1) it's too imperative and (2) I'm calling the
> list "shuffled" even before it's shuffled.
>
> Another solution is this:
>
> shuffled_numbers = random.sample(range(10, 10 ** 5), k=len(range(10, 10 **
> 5)))
>
> This is better because it solves the 2 points above. However, it is quite
> cumbersome.
>
> I notice that the `random.sample` function doesn't have a default behavior
> set when you don't specify `k`. This is fortunate, because we could make
> that behavior just automatically take the length of the first argument. So
> we could do this:
>
> shuffled_numbers = random.sample(range(10, 10 ** 5))
>
> What do you think?
>
>
> Thanks,
> Ram.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/OHLXVKIBMNSQO6BCFK6LEHSYNXDB6OQJ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UMPJOTCY4SK5LFBPWFBJYFXLBF76EA2S/
Code of Conduct: http://python.org/psf/codeofconduct/