Re: [Numpy-discussion] Pull Request Review: R-like sample function

2011-09-02 Thread Christopher Jordan-Squire
I made the changes discussed here and pushed them to pull request.

https://github.com/numpy/numpy/pull/143#issuecomment-1980897

I changed the new function's name from sample to choice and added the
size=1 default as Robert suggested. I also reverted all the changes
for sample, random, and ranf and added them to the reference docs.

-Chris JS

On Thu, Sep 1, 2011 at 10:55 PM, Nathaniel Smith n...@pobox.com wrote:
 On Thu, Sep 1, 2011 at 8:31 PM, Christopher Jordan-Squire
 cjord...@uw.edu wrote:
 On Thu, Sep 1, 2011 at 11:14 PM, Robert Kern robert.k...@gmail.com wrote:
 On Thu, Sep 1, 2011 at 22:07, Christopher Jordan-Squire cjord...@uw.edu 
 wrote:

 So in the mean time, are there any suggestions for what this R sample
 function should be called, since random.sample is apparently taken?

 If you default to size=1 (which you probably should anyways), then
 np.random.choice() makes sense, in analogy to random.choice() from the
 standard library.

 Alright. I can make that change tomorrow. I'd prefer np.sample in the
 long-run, for compatibility with R. (False friends are loathsome
 things.)  How does one petition to get function names deprecated?

 I was about to argue that random.choice was a better name anyway,
 but then I remembered that the standard library random.sample exists
 and does something similar. So instead I'd like to argue that making
 this compatible with Python is more important than making it
 compatible with R :-).

 Specifically, 'np.random.sample(array_like, k)', with no further
 arguments, should perform random sampling *without* replacement.
 Ideally it should also efficiently handle the case where array_like is
 an xrange object. Docs are here:
 http://docs.python.org/library/random.html

 -- Nathaniel
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull Request Review: R-like sample function

2011-09-02 Thread Christopher Jordan-Squire
On Thu, Sep 1, 2011 at 10:55 PM, Nathaniel Smith n...@pobox.com wrote:
 On Thu, Sep 1, 2011 at 8:31 PM, Christopher Jordan-Squire
 cjord...@uw.edu wrote:
 On Thu, Sep 1, 2011 at 11:14 PM, Robert Kern robert.k...@gmail.com wrote:
 On Thu, Sep 1, 2011 at 22:07, Christopher Jordan-Squire cjord...@uw.edu 
 wrote:

 So in the mean time, are there any suggestions for what this R sample
 function should be called, since random.sample is apparently taken?

 If you default to size=1 (which you probably should anyways), then
 np.random.choice() makes sense, in analogy to random.choice() from the
 standard library.

 Alright. I can make that change tomorrow. I'd prefer np.sample in the
 long-run, for compatibility with R. (False friends are loathsome
 things.)  How does one petition to get function names deprecated?

 I was about to argue that random.choice was a better name anyway,
 but then I remembered that the standard library random.sample exists
 and does something similar. So instead I'd like to argue that making
 this compatible with Python is more important than making it
 compatible with R :-).

 Specifically, 'np.random.sample(array_like, k)', with no further
 arguments, should perform random sampling *without* replacement.
 Ideally it should also efficiently handle the case where array_like is
 an xrange object. Docs are here:
 http://docs.python.org/library/random.html


I mentioned this at the end of my initial response to Josef. I agree
that my use of sample is more consistent with the python random
library's usage, except that I don't want to have the same name but a
different default. I think that also counts as a 'false friend'. So I
think np.random.choice is a better name for right now. Simpler to have
the same function with the same defaults but a different name than the
same function with the same name and different defaults.

-Chris


 -- Nathaniel
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull Request Review: R-like sample function

2011-09-02 Thread Nathaniel Smith
On Fri, Sep 2, 2011 at 10:14 AM, Christopher Jordan-Squire
cjord...@uw.edu wrote:
 I made the changes discussed here and pushed them to pull request.

 https://github.com/numpy/numpy/pull/143#issuecomment-1980897

I think you mean https://github.com/numpy/numpy/pull/151

 I changed the new function's name from sample to choice and added the
 size=1 default as Robert suggested. I also reverted all the changes
 for sample, random, and ranf and added them to the reference docs.

I still think the default should be sampling without replacement, so
as to reduce confusion later when we make this available as
np.random.sample.

You should also deprecate the current 'sample' alias (I believe this
is basically a matter of documenting that it is deprecated, and making
sure that it fires a deprecation warning when used). But perhaps you
want to do that in a separate pull request, I dunno.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull Request Review: R-like sample function

2011-09-02 Thread Christopher Jordan-Squire
On Fri, Sep 2, 2011 at 12:58 PM, Nathaniel Smith n...@pobox.com wrote:
 On Fri, Sep 2, 2011 at 10:14 AM, Christopher Jordan-Squire
 cjord...@uw.edu wrote:
 I made the changes discussed here and pushed them to pull request.

 https://github.com/numpy/numpy/pull/143#issuecomment-1980897

 I think you mean https://github.com/numpy/numpy/pull/151


Yes. Sorry for the mix-up.

 I changed the new function's name from sample to choice and added the
 size=1 default as Robert suggested. I also reverted all the changes
 for sample, random, and ranf and added them to the reference docs.

 I still think the default should be sampling without replacement, so
 as to reduce confusion later when we make this available as
 np.random.sample.


np.random.sample could be just a call to np.random.choice(blah, blah,
replace=False, blah). I'm not opposed to python compatibility, I just
would like a function with the defaults I want.

 You should also deprecate the current 'sample' alias (I believe this
 is basically a matter of documenting that it is deprecated, and making
 sure that it fires a deprecation warning when used). But perhaps you
 want to do that in a separate pull request, I dunno.


Yes, I think I need a separate pull request after getting some
agreement that it should be changed.

-Chris

 -- Nathaniel
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull Request Review: R-like sample function

2011-09-01 Thread Christopher Jordan-Squire
On Thu, Sep 1, 2011 at 10:01 PM,  josef.p...@gmail.com wrote:
 On Thu, Sep 1, 2011 at 6:02 PM, Christopher Jordan-Squire
 cjord...@uw.edu wrote:
 Hi--I've just submitted a numpy 2.0 pull request for a function sample
 in np.random. It's essentially an implementation of R's sample
 function. It allows possibly non-uniform, possibly without-replacement
 sampling from a given 1-D array-like. This is very useful for quickly
 and cleanly creating samples from, for example, a list of strings or a
 list of non-contiguous, non-evenly spaced integers. Both occur in data
 analysis with categorical data.

 It is, essentially, a convenience function that wraps a number of
 existing ways to take a random sample. I think it belongs in
 numpy.random rather than scipy.stats because it's just a random
 sampler, rather than a probability distribution. It isn't possible to
 define a scipy.stats discrete random variable on strings--it would
 have to instead be done on the indices of the list containing the
 possible samples. And (as far as I can tell) the scipy.stats
 distributions can't be used for sampling without replacement.

 https://github.com/numpy/numpy/pull/151

 I don't think you can kill numpy.random.random and similar mixed in
 with an adding a new function commit.


Killjoy.

 First these functions would need to be deprecated.


I discussed this with a few other people, and they suggested that it
could be alright since it's for numpy 2.0 rather than numpy 1.x. For
the 2.0 version it would be perfectly reasonable to have a break with
the API. (Though, as I said, it's not a break with the API.)

 it does not break the API as the previous function was not in the docs

 This is a doc bug, I assume. I don't think it means users/developers
 don't rely on it.


You apparently don't subscribe to the view that the API is infallible
revelation revealed.

(That's a joke, if it's not obvious.)

 searching for np.random.random shows 120 threads in my gmail reader,
 python uses random.random()
 dir(np.random) shows it
 I copied it from mailing list examples. It's used quite a bit in
 scipy, as I saw because of your work.

 I also find the historical multiplicity of aliases confusing, but
 which names should be deprecated would at least require a discussion
 and a separate commit.

 Josef


I hadn't thought about the random.random connection. I'm fine with
leaving random.random as an alias for random.random_sample. I just
wanted to claim random.sample for my own function.

I can't think of many other instances of aliased functions like that
in numpy, though--but perhaps I'm not thinking hard enough. It
certainly seemed strange to have 4 names for the same function.

Now that you mention the standard library connection, my use of
random.sample seems more in line with the standard library random
package than the alias to random.random_sample. Though that would
suggest using the default replace=True, which I'd prefer not to do.

-Chris JS



 -Chris JS
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull Request Review: R-like sample function

2011-09-01 Thread Robert Kern
On Thu, Sep 1, 2011 at 21:39, Christopher Jordan-Squire cjord...@uw.edu wrote:
 On Thu, Sep 1, 2011 at 10:01 PM,  josef.p...@gmail.com wrote:

 First these functions would need to be deprecated.

 I discussed this with a few other people, and they suggested that it
 could be alright since it's for numpy 2.0 rather than numpy 1.x. For
 the 2.0 version it would be perfectly reasonable to have a break with
 the API. (Though, as I said, it's not a break with the API.)

Yes it is. A very long-standing API. The fact that you had to go
remove a number of actual uses of the aliases should have told you
this. The documentation is not the API. You cannot remove these
aliases without a deprecation period lasting one full minor release.
2.0 is not license to make backwards-incompatible changes solely for
aesthetic reasons. There is no reason not to follow the standard
deprecation schedule here.

 I can't think of many other instances of aliased functions like that
 in numpy, though--but perhaps I'm not thinking hard enough. It
 certainly seemed strange to have 4 names for the same function.

numpy.random was actually replacing multiple libraries at once. The
aliases kind of accreted.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull Request Review: R-like sample function

2011-09-01 Thread Christopher Jordan-Squire
On Thu, Sep 1, 2011 at 10:48 PM, Robert Kern robert.k...@gmail.com wrote:
 On Thu, Sep 1, 2011 at 21:39, Christopher Jordan-Squire cjord...@uw.edu 
 wrote:
 On Thu, Sep 1, 2011 at 10:01 PM,  josef.p...@gmail.com wrote:

 First these functions would need to be deprecated.

 I discussed this with a few other people, and they suggested that it
 could be alright since it's for numpy 2.0 rather than numpy 1.x. For
 the 2.0 version it would be perfectly reasonable to have a break with
 the API. (Though, as I said, it's not a break with the API.)

 Yes it is. A very long-standing API. The fact that you had to go
 remove a number of actual uses of the aliases should have told you
 this. The documentation is not the API. You cannot remove these
 aliases without a deprecation period lasting one full minor release.
 2.0 is not license to make backwards-incompatible changes solely for
 aesthetic reasons. There is no reason not to follow the standard
 deprecation schedule here.


Then I was misinformed, and I hadn't realized it was a long-standing use.

I suspected it might be problematic for np.random.random. But
np.random.sample was only used once, and np.random.ranf not at all.
Those two didn't appear in the statsmodels code base either, I don't
think. (And random.random only appeared once in a docstring.)

So in the mean time, are there any suggestions for what this R sample
function should be called, since random.sample is apparently taken?

-Chris

 I can't think of many other instances of aliased functions like that
 in numpy, though--but perhaps I'm not thinking hard enough. It
 certainly seemed strange to have 4 names for the same function.

 numpy.random was actually replacing multiple libraries at once. The
 aliases kind of accreted.

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
   -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull Request Review: R-like sample function

2011-09-01 Thread Robert Kern
On Thu, Sep 1, 2011 at 22:07, Christopher Jordan-Squire cjord...@uw.edu wrote:

 So in the mean time, are there any suggestions for what this R sample
 function should be called, since random.sample is apparently taken?

If you default to size=1 (which you probably should anyways), then
np.random.choice() makes sense, in analogy to random.choice() from the
standard library.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull Request Review: R-like sample function

2011-09-01 Thread Christopher Jordan-Squire
On Thu, Sep 1, 2011 at 11:14 PM, Robert Kern robert.k...@gmail.com wrote:
 On Thu, Sep 1, 2011 at 22:07, Christopher Jordan-Squire cjord...@uw.edu 
 wrote:

 So in the mean time, are there any suggestions for what this R sample
 function should be called, since random.sample is apparently taken?

 If you default to size=1 (which you probably should anyways), then
 np.random.choice() makes sense, in analogy to random.choice() from the
 standard library.

Alright. I can make that change tomorrow. I'd prefer np.sample in the
long-run, for compatibility with R. (False friends are loathsome
things.)  How does one petition to get function names deprecated?

-Chris JS


 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
   -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull Request Review: R-like sample function

2011-09-01 Thread Robert Kern
On Thu, Sep 1, 2011 at 22:31, Christopher Jordan-Squire cjord...@uw.edu wrote:
 On Thu, Sep 1, 2011 at 11:14 PM, Robert Kern robert.k...@gmail.com wrote:
 On Thu, Sep 1, 2011 at 22:07, Christopher Jordan-Squire cjord...@uw.edu 
 wrote:

 So in the mean time, are there any suggestions for what this R sample
 function should be called, since random.sample is apparently taken?

 If you default to size=1 (which you probably should anyways), then
 np.random.choice() makes sense, in analogy to random.choice() from the
 standard library.

 Alright. I can make that change tomorrow. I'd prefer np.sample in the
 long-run, for compatibility with R. (False friends are loathsome
 things.)

We have a number of software packages we might want to be compatible
with and a limited English vocabulary. There is always going to be
some overlap with someone that is misleading in some contexts. It's a
concern, but not an overriding one.

 How does one petition to get function names deprecated?

Ask here. Obtain general agreement. Write a patch. Not necessarily in
that order.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull Request Review: R-like sample function

2011-09-01 Thread Nathaniel Smith
On Thu, Sep 1, 2011 at 8:31 PM, Christopher Jordan-Squire
cjord...@uw.edu wrote:
 On Thu, Sep 1, 2011 at 11:14 PM, Robert Kern robert.k...@gmail.com wrote:
 On Thu, Sep 1, 2011 at 22:07, Christopher Jordan-Squire cjord...@uw.edu 
 wrote:

 So in the mean time, are there any suggestions for what this R sample
 function should be called, since random.sample is apparently taken?

 If you default to size=1 (which you probably should anyways), then
 np.random.choice() makes sense, in analogy to random.choice() from the
 standard library.

 Alright. I can make that change tomorrow. I'd prefer np.sample in the
 long-run, for compatibility with R. (False friends are loathsome
 things.)  How does one petition to get function names deprecated?

I was about to argue that random.choice was a better name anyway,
but then I remembered that the standard library random.sample exists
and does something similar. So instead I'd like to argue that making
this compatible with Python is more important than making it
compatible with R :-).

Specifically, 'np.random.sample(array_like, k)', with no further
arguments, should perform random sampling *without* replacement.
Ideally it should also efficiently handle the case where array_like is
an xrange object. Docs are here:
http://docs.python.org/library/random.html

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion