Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-17 Thread josef.pktd
On Thu, Oct 16, 2014 at 10:50 PM, Nathaniel Smith n...@pobox.com wrote:
 On Fri, Oct 17, 2014 at 2:35 AM,  josef.p...@gmail.com wrote:
 On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith n...@pobox.com wrote:
 On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:


 On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
  On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:
 
  Regarding names: shuffle/permutation is a terrible naming convention
  IMHO and shouldn't be propagated further. We already have a good
  naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
  reversed, etc.
 
  So, how about:
 
  scramble + scrambled shuffle individual entries within each
  row/column/..., as in Warren's suggestion.
 
  shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
  these break a 2d array into a bunch of 1d cards, and then shuffle
  those cards).
 
  permuted remains indefinitely, with the docstring: Deprecated alias
  for 'shuffled'.
 
  That sounds good to me.  (I might go with 'randomize' instead of
  'scramble',
  but that's a second-order decision for the API.)

 I hesitate to use names like randomize because they're less
 informative than they feel seem -- if asked what this operation does
 to an array, then it would be natural to say it randomizes the
 array. But if told that the random module has a function called
 randomize, then that's not very informative -- everything in random
 randomizes something somehow.

 I had some similar concerns (hence my original disarrange), but
 randomize seemed more likely to be found when searching or browsing the
 docs, and while it might be a bit too generic-sounding, it does feel like a
 natural verb for the process.   On the other hand, permute and permuted
 are even more natural and unambiguous.  Any objections to those?  (The
 existing function is permutation.)
 [...]
 By the way, permutation has a feature not yet mentioned here: if the
 argument is an integer 'n', it generates a permutation of arange(n).  In
 this case, it acts like matlab's randperm function.  Unless we replicate
 that in the new function, we shouldn't deprecate permutation.

 I guess we could do something like:

 permutation(n):

 Return a random permutation on n items. Equivalent to permuted(arange(n)).

 Note: for backwards compatibility, a call like permutation(an_array)
 currently returns the same as shuffled(an_array). (This is *not*
 equivalent to permuted(an_array).) This functionality is deprecated.

 OTOH np.random.permute as a name does have a downside: someday we'll
 probably add a function called np.permute (for applying a given
 permutation in place -- the O(n) algorithm for this is useful and
 tricky), and having two functions with the same name and very
 different semantics would be pretty confusing.

 I like `permute`. That's the one term I'm looking for first.

 If np.permute does some kind of deterministic permutation or pivoting,
 then I wouldn't find it confusing if np.random.permute does random
 permutation.

 Yeah, but:

 from ... import permute
 # 500 lines later
 def foo(...):
 permute(...)  # what the heck is this

 It definitely *can* be confusing; basically everything else in
 np.random has a name that suggests randomness even without seeing the
 full path.

I usually/always avoid importing names from random into the module namespace

np.random.xxx

from numpy.random import power
power(...)

 power(5, 3)
array([ 0.93771162,  0.96180884,  0.80191961])

???

and f and beta and gamma, ...

 bytes(10)
'\xa3\xf0%\x88\x11\xda\x0e\x81\x0c\x8e'
 bytes(5)
'\xb0B\x8e\xa1\x80'



 It's not a huge deal, though.

 (I definitely don't like scrambled, sounds like eggs or cable TV that
 needs to be unscrambled.)

 I vote that in this kind of bikeshed we try to restrict ourselves to
 arguments that we can at least pretend are motivated by some
 technical/UX concern ;-). (I guess unscrambling eggs would be
 technically impressive tho ;-))

Ignoring the eggs, it still sounds like a cheap encryption and is a
word I would never look for when looking for something to implement a
permutation test.

Josef



 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread Warren Weckesser
On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:

 On Sun, Oct 12, 2014 at 5:14 PM, Sebastian se...@sebix.at wrote:
 
  On 2014-10-12 16:54, Warren Weckesser wrote:
 
 
  On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com
  mailto:robert.k...@gmail.com wrote:
 
  On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
  warren.weckes...@gmail.com mailto:warren.weckes...@gmail.com
  wrote:
 
   A small wart in this API is the meaning of
  
 shuffle(a, independent=False, axis=None)
  
   It could be argued that the correct behavior is to leave the
   array unchanged. (The current behavior can be interpreted as
   shuffling a 1-d sequence of monolithic blobs; the axis argument
   specifies which axis of the array corresponds to the
   sequence index.  Then `axis=None` means the argument is
   a single monolithic blob, so there is nothing to shuffle.)
   Or an error could be raised.
  
   What do you think?
 
  It seems to me a perfectly good reason to have two methods instead
 of
  one. I can't imagine when I wouldn't be using a literal True or
 False
  for this, so it really should be two different methods.
 
 
 
  I agree, and my first inclination was to propose a different method
  (and I had the bikeshedding conversation with myself about the name:
  disarrange, scramble, disorder, randomize, ashuffle, some
  other variation of the word shuffle, ...), but I figured the first
  thing folks would say is Why not just add options to shuffle?  So,
  choose your battles and all that.
 
  What do other folks think of making a separate method
  I'm not a fan of more methods with similar functionality in Numpy. It's
  already hard to overlook the existing functions and all their possible
  applications and variants. The axis=None proposal for shuffling all
  items is very intuitive.
 
  I think we don't want to take the path of matlab: a huge amount of
  powerful functions, but few people know of their powerful possibilities.

 I totally agree with this principle, but I think this is an exception
 to the rule, b/c unfortunately in this case the function that we *do*
 have is weird and inconsistent with how most other functions in numpy
 work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc
 (k,)-(k,) would work. Also, it's easy to implement the current
 'shuffle' in terms of any 1d shuffle function, with no explicit loops,
 Warren's disarrange requires an explicit loop. So, we really
 implemented the wrong one, oops. What this means going forward,
 though, is that our only options are either to implement both
 behaviours with two functions, or else to give up on have the more
 natural behaviour altogether. I think the former is the lesser of two
 evils.

 Regarding names: shuffle/permutation is a terrible naming convention
 IMHO and shouldn't be propagated further. We already have a good
 naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
 reversed, etc.

 So, how about:

 scramble + scrambled shuffle individual entries within each
 row/column/..., as in Warren's suggestion.

 shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
 these break a 2d array into a bunch of 1d cards, and then shuffle
 those cards).

 permuted remains indefinitely, with the docstring: Deprecated alias
 for 'shuffled'.



That sounds good to me.  (I might go with 'randomize' instead of
'scramble', but that's a second-order decision for the API.)

Warren


-n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread Jaime Fernández del Río
On Thu, Oct 16, 2014 at 8:39 AM, Warren Weckesser 
warren.weckes...@gmail.com wrote:



 On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:

 On Sun, Oct 12, 2014 at 5:14 PM, Sebastian se...@sebix.at wrote:
 
  On 2014-10-12 16:54, Warren Weckesser wrote:
 
 
  On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com
  mailto:robert.k...@gmail.com wrote:
 
  On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
  warren.weckes...@gmail.com mailto:warren.weckes...@gmail.com
  wrote:
 
   A small wart in this API is the meaning of
  
 shuffle(a, independent=False, axis=None)
  
   It could be argued that the correct behavior is to leave the
   array unchanged. (The current behavior can be interpreted as
   shuffling a 1-d sequence of monolithic blobs; the axis argument
   specifies which axis of the array corresponds to the
   sequence index.  Then `axis=None` means the argument is
   a single monolithic blob, so there is nothing to shuffle.)
   Or an error could be raised.
  
   What do you think?
 
  It seems to me a perfectly good reason to have two methods instead
 of
  one. I can't imagine when I wouldn't be using a literal True or
 False
  for this, so it really should be two different methods.
 
 
 
  I agree, and my first inclination was to propose a different method
  (and I had the bikeshedding conversation with myself about the name:
  disarrange, scramble, disorder, randomize, ashuffle, some
  other variation of the word shuffle, ...), but I figured the first
  thing folks would say is Why not just add options to shuffle?  So,
  choose your battles and all that.
 
  What do other folks think of making a separate method
  I'm not a fan of more methods with similar functionality in Numpy. It's
  already hard to overlook the existing functions and all their possible
  applications and variants. The axis=None proposal for shuffling all
  items is very intuitive.
 
  I think we don't want to take the path of matlab: a huge amount of
  powerful functions, but few people know of their powerful possibilities.

 I totally agree with this principle, but I think this is an exception
 to the rule, b/c unfortunately in this case the function that we *do*
 have is weird and inconsistent with how most other functions in numpy
 work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc
 (k,)-(k,) would work. Also, it's easy to implement the current
 'shuffle' in terms of any 1d shuffle function, with no explicit loops,
 Warren's disarrange requires an explicit loop. So, we really
 implemented the wrong one, oops. What this means going forward,
 though, is that our only options are either to implement both
 behaviours with two functions, or else to give up on have the more
 natural behaviour altogether. I think the former is the lesser of two
 evils.

 Regarding names: shuffle/permutation is a terrible naming convention
 IMHO and shouldn't be propagated further. We already have a good
 naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
 reversed, etc.

 So, how about:

 scramble + scrambled shuffle individual entries within each
 row/column/..., as in Warren's suggestion.

 shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
 these break a 2d array into a bunch of 1d cards, and then shuffle
 those cards).

 permuted remains indefinitely, with the docstring: Deprecated alias
 for 'shuffled'.



 That sounds good to me.  (I might go with 'randomize' instead of
 'scramble', but that's a second-order decision for the API.)


So the only little detail left is someone actually rolling up his/her
sleeves and creating a PR... ;-)

The current shuffle and permutation are implemented here:

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L4551

It's in Cython, so it is a good candidate for anyone wanting to contribute
to numpy, but wary of C code.

Jaime





 Warren


 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread Nathaniel Smith
On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
warren.weckes...@gmail.com wrote:

 On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:

 Regarding names: shuffle/permutation is a terrible naming convention
 IMHO and shouldn't be propagated further. We already have a good
 naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
 reversed, etc.

 So, how about:

 scramble + scrambled shuffle individual entries within each
 row/column/..., as in Warren's suggestion.

 shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
 these break a 2d array into a bunch of 1d cards, and then shuffle
 those cards).

 permuted remains indefinitely, with the docstring: Deprecated alias
 for 'shuffled'.

 That sounds good to me.  (I might go with 'randomize' instead of 'scramble',
 but that's a second-order decision for the API.)

I hesitate to use names like randomize because they're less
informative than they feel seem -- if asked what this operation does
to an array, then it would be natural to say it randomizes the
array. But if told that the random module has a function called
randomize, then that's not very informative -- everything in random
randomizes something somehow.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread Warren Weckesser
On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
  On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:
 
  Regarding names: shuffle/permutation is a terrible naming convention
  IMHO and shouldn't be propagated further. We already have a good
  naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
  reversed, etc.
 
  So, how about:
 
  scramble + scrambled shuffle individual entries within each
  row/column/..., as in Warren's suggestion.
 
  shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
  these break a 2d array into a bunch of 1d cards, and then shuffle
  those cards).
 
  permuted remains indefinitely, with the docstring: Deprecated alias
  for 'shuffled'.
 
  That sounds good to me.  (I might go with 'randomize' instead of
 'scramble',
  but that's a second-order decision for the API.)

 I hesitate to use names like randomize because they're less
 informative than they feel seem -- if asked what this operation does
 to an array, then it would be natural to say it randomizes the
 array. But if told that the random module has a function called
 randomize, then that's not very informative -- everything in random
 randomizes something somehow.



I had some similar concerns (hence my original disarrange), but
randomize seemed more likely to be found when searching or browsing the
docs, and while it might be a bit too generic-sounding, it does feel like a
natural verb for the process.   On the other hand, permute and permuted
are even more natural and unambiguous.  Any objections to those?  (The
existing function is permutation.)

Whatever the names, the docstrings for the four functions should be
cross-referenced in their See Also sections to help users find the
appropriate function.

By the way, permutation has a feature not yet mentioned here: if the
argument is an integer 'n', it generates a permutation of arange(n).  In
this case, it acts like matlab's randperm function.  Unless we replicate
that in the new function, we shouldn't deprecate permutation.

Warren



 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread Nathaniel Smith
On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser
warren.weckes...@gmail.com wrote:


 On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
  On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:
 
  Regarding names: shuffle/permutation is a terrible naming convention
  IMHO and shouldn't be propagated further. We already have a good
  naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
  reversed, etc.
 
  So, how about:
 
  scramble + scrambled shuffle individual entries within each
  row/column/..., as in Warren's suggestion.
 
  shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
  these break a 2d array into a bunch of 1d cards, and then shuffle
  those cards).
 
  permuted remains indefinitely, with the docstring: Deprecated alias
  for 'shuffled'.
 
  That sounds good to me.  (I might go with 'randomize' instead of
  'scramble',
  but that's a second-order decision for the API.)

 I hesitate to use names like randomize because they're less
 informative than they feel seem -- if asked what this operation does
 to an array, then it would be natural to say it randomizes the
 array. But if told that the random module has a function called
 randomize, then that's not very informative -- everything in random
 randomizes something somehow.

 I had some similar concerns (hence my original disarrange), but
 randomize seemed more likely to be found when searching or browsing the
 docs, and while it might be a bit too generic-sounding, it does feel like a
 natural verb for the process.   On the other hand, permute and permuted
 are even more natural and unambiguous.  Any objections to those?  (The
 existing function is permutation.)
[...]
 By the way, permutation has a feature not yet mentioned here: if the
 argument is an integer 'n', it generates a permutation of arange(n).  In
 this case, it acts like matlab's randperm function.  Unless we replicate
 that in the new function, we shouldn't deprecate permutation.

I guess we could do something like:

permutation(n):

Return a random permutation on n items. Equivalent to permuted(arange(n)).

Note: for backwards compatibility, a call like permutation(an_array)
currently returns the same as shuffled(an_array). (This is *not*
equivalent to permuted(an_array).) This functionality is deprecated.

OTOH np.random.permute as a name does have a downside: someday we'll
probably add a function called np.permute (for applying a given
permutation in place -- the O(n) algorithm for this is useful and
tricky), and having two functions with the same name and very
different semantics would be pretty confusing.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread josef.pktd
On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith n...@pobox.com wrote:
 On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:


 On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
  On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:
 
  Regarding names: shuffle/permutation is a terrible naming convention
  IMHO and shouldn't be propagated further. We already have a good
  naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
  reversed, etc.
 
  So, how about:
 
  scramble + scrambled shuffle individual entries within each
  row/column/..., as in Warren's suggestion.
 
  shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
  these break a 2d array into a bunch of 1d cards, and then shuffle
  those cards).
 
  permuted remains indefinitely, with the docstring: Deprecated alias
  for 'shuffled'.
 
  That sounds good to me.  (I might go with 'randomize' instead of
  'scramble',
  but that's a second-order decision for the API.)

 I hesitate to use names like randomize because they're less
 informative than they feel seem -- if asked what this operation does
 to an array, then it would be natural to say it randomizes the
 array. But if told that the random module has a function called
 randomize, then that's not very informative -- everything in random
 randomizes something somehow.

 I had some similar concerns (hence my original disarrange), but
 randomize seemed more likely to be found when searching or browsing the
 docs, and while it might be a bit too generic-sounding, it does feel like a
 natural verb for the process.   On the other hand, permute and permuted
 are even more natural and unambiguous.  Any objections to those?  (The
 existing function is permutation.)
 [...]
 By the way, permutation has a feature not yet mentioned here: if the
 argument is an integer 'n', it generates a permutation of arange(n).  In
 this case, it acts like matlab's randperm function.  Unless we replicate
 that in the new function, we shouldn't deprecate permutation.

 I guess we could do something like:

 permutation(n):

 Return a random permutation on n items. Equivalent to permuted(arange(n)).

 Note: for backwards compatibility, a call like permutation(an_array)
 currently returns the same as shuffled(an_array). (This is *not*
 equivalent to permuted(an_array).) This functionality is deprecated.

 OTOH np.random.permute as a name does have a downside: someday we'll
 probably add a function called np.permute (for applying a given
 permutation in place -- the O(n) algorithm for this is useful and
 tricky), and having two functions with the same name and very
 different semantics would be pretty confusing.

I like `permute`. That's the one term I'm looking for first.

If np.permute does some kind of deterministic permutation or pivoting,
then I wouldn't find it confusing if np.random.permute does random
permutation.

(I definitely don't like scrambled, sounds like eggs or cable TV that
needs to be unscrambled.)

Josef



 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread Nathaniel Smith
On Fri, Oct 17, 2014 at 2:35 AM,  josef.p...@gmail.com wrote:
 On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith n...@pobox.com wrote:
 On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:


 On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
  On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:
 
  Regarding names: shuffle/permutation is a terrible naming convention
  IMHO and shouldn't be propagated further. We already have a good
  naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
  reversed, etc.
 
  So, how about:
 
  scramble + scrambled shuffle individual entries within each
  row/column/..., as in Warren's suggestion.
 
  shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
  these break a 2d array into a bunch of 1d cards, and then shuffle
  those cards).
 
  permuted remains indefinitely, with the docstring: Deprecated alias
  for 'shuffled'.
 
  That sounds good to me.  (I might go with 'randomize' instead of
  'scramble',
  but that's a second-order decision for the API.)

 I hesitate to use names like randomize because they're less
 informative than they feel seem -- if asked what this operation does
 to an array, then it would be natural to say it randomizes the
 array. But if told that the random module has a function called
 randomize, then that's not very informative -- everything in random
 randomizes something somehow.

 I had some similar concerns (hence my original disarrange), but
 randomize seemed more likely to be found when searching or browsing the
 docs, and while it might be a bit too generic-sounding, it does feel like a
 natural verb for the process.   On the other hand, permute and permuted
 are even more natural and unambiguous.  Any objections to those?  (The
 existing function is permutation.)
 [...]
 By the way, permutation has a feature not yet mentioned here: if the
 argument is an integer 'n', it generates a permutation of arange(n).  In
 this case, it acts like matlab's randperm function.  Unless we replicate
 that in the new function, we shouldn't deprecate permutation.

 I guess we could do something like:

 permutation(n):

 Return a random permutation on n items. Equivalent to permuted(arange(n)).

 Note: for backwards compatibility, a call like permutation(an_array)
 currently returns the same as shuffled(an_array). (This is *not*
 equivalent to permuted(an_array).) This functionality is deprecated.

 OTOH np.random.permute as a name does have a downside: someday we'll
 probably add a function called np.permute (for applying a given
 permutation in place -- the O(n) algorithm for this is useful and
 tricky), and having two functions with the same name and very
 different semantics would be pretty confusing.

 I like `permute`. That's the one term I'm looking for first.

 If np.permute does some kind of deterministic permutation or pivoting,
 then I wouldn't find it confusing if np.random.permute does random
 permutation.

Yeah, but:

from ... import permute
# 500 lines later
def foo(...):
permute(...)  # what the heck is this

It definitely *can* be confusing; basically everything else in
np.random has a name that suggests randomness even without seeing the
full path.

It's not a huge deal, though.

 (I definitely don't like scrambled, sounds like eggs or cable TV that
 needs to be unscrambled.)

I vote that in this kind of bikeshed we try to restrict ourselves to
arguments that we can at least pretend are motivated by some
technical/UX concern ;-). (I guess unscrambling eggs would be
technically impressive tho ;-))

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Eelco Hoogendoorn
Thanks Warren, I think these are sensible additions.

I would argue to treat the None-False condition as an error. Indeed I agree
one might argue the correcr behavior is to 'shuffle' the singleton block of
data, which does nothing; but its more likely to come up as an unintended
error than as a natural outcome of parametrized behavior.

On Sun, Oct 12, 2014 at 3:31 AM, John Zwinck jzwi...@gmail.com wrote:

 On Sun, Oct 12, 2014 at 6:51 AM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
  I created an issue on github for an enhancement
  to numpy.random.shuffle:
  https://github.com/numpy/numpy/issues/5173

 I like this idea.  I was a bit surprised there wasn't something like
 this already.

  A small wart in this API is the meaning of
 
shuffle(a, independent=False, axis=None)
 
  It could be argued that the correct behavior is to leave the
  array unchanged. (The current behavior can be interpreted as
  shuffling a 1-d sequence of monolithic blobs; the axis argument
  specifies which axis of the array corresponds to the
  sequence index.  Then `axis=None` means the argument is
  a single monolithic blob, so there is nothing to shuffle.)
  Or an error could be raised.

 Let's think about it from the other direction: if a user wants to
 shuffle all the elements as if it were 1-d, as you point out they
 could do this:

   shuffle(a, axis=None, independent=True)

 But that's a lot of typing.  Maybe we should just let this do the same
 thing:

   shuffle(a, axis=None)

 That seems to be in keeping with the other APIs taking axis as you
 mentioned.  To me, independent has no relevance when the array is
 1-d, it can simply be ignored.

 John Zwinck
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread John Zwinck
On Sun, Oct 12, 2014 at 3:51 PM, Eelco Hoogendoorn
hoogendoorn.ee...@gmail.com wrote:
 I would argue to treat the None-False condition as an error. Indeed I agree
 one might argue the correcr behavior is to 'shuffle' the singleton block of
 data, which does nothing; but its more likely to come up as an unintended
 error than as a natural outcome of parametrized behavior.

I'm interested to know why you think axis=None should raise an error
if independent=False when independent=False is the default.  What I
mean is, if someone uses this function and wants axis=None (which
seems not totally unusual), why force them to always type in the
boilerplate independent=True to make it work?

John Zwinck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Stefan van der Walt
Hi Warren

On 2014-10-12 00:51:56, Warren Weckesser warren.weckes...@gmail.com wrote:
 A small wart in this API is the meaning of

   shuffle(a, independent=False, axis=None)

 It could be argued that the correct behavior is to leave the
 array unchanged.

I like the suggested changes.  Since independent loses its meaning
when axis is None, I would expect this to have the same effect as
`shuffle(a, independent=True, axis=None)`.  I think a shuffle function
that doesn't shuffle will confuse a lot of people!

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Eelco Hoogendoorn
yeah, a shuffle function that does not shuffle indeed seems like a major
source of bugs to me.

Indeed one could argue that setting axis=None should suffice to give a
clear enough declaration of intent; though I wouldn't mind typing the extra
bit to ensure consistent semantics.

On Sun, Oct 12, 2014 at 10:56 AM, Stefan van der Walt ste...@sun.ac.za
wrote:

 Hi Warren

 On 2014-10-12 00:51:56, Warren Weckesser warren.weckes...@gmail.com
 wrote:
  A small wart in this API is the meaning of
 
shuffle(a, independent=False, axis=None)
 
  It could be argued that the correct behavior is to leave the
  array unchanged.

 I like the suggested changes.  Since independent loses its meaning
 when axis is None, I would expect this to have the same effect as
 `shuffle(a, independent=True, axis=None)`.  I think a shuffle function
 that doesn't shuffle will confuse a lot of people!

 Stéfan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Robert Kern
On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
warren.weckes...@gmail.com wrote:

 A small wart in this API is the meaning of

   shuffle(a, independent=False, axis=None)

 It could be argued that the correct behavior is to leave the
 array unchanged. (The current behavior can be interpreted as
 shuffling a 1-d sequence of monolithic blobs; the axis argument
 specifies which axis of the array corresponds to the
 sequence index.  Then `axis=None` means the argument is
 a single monolithic blob, so there is nothing to shuffle.)
 Or an error could be raised.

 What do you think?

It seems to me a perfectly good reason to have two methods instead of
one. I can't imagine when I wouldn't be using a literal True or False
for this, so it really should be two different methods.

That said, I would just make the axis=None behavior the same for both
methods. axis=None does *not* mean treat this like a single
monolithic blob in any of the axis=-having methods; it means flatten
the array and do the operation on the single flattened axis. I think
the latter behavior is a reasonable interpretation of axis=None for
both methods.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Warren Weckesser
On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com wrote:

 On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:

  A small wart in this API is the meaning of
 
shuffle(a, independent=False, axis=None)
 
  It could be argued that the correct behavior is to leave the
  array unchanged. (The current behavior can be interpreted as
  shuffling a 1-d sequence of monolithic blobs; the axis argument
  specifies which axis of the array corresponds to the
  sequence index.  Then `axis=None` means the argument is
  a single monolithic blob, so there is nothing to shuffle.)
  Or an error could be raised.
 
  What do you think?

 It seems to me a perfectly good reason to have two methods instead of
 one. I can't imagine when I wouldn't be using a literal True or False
 for this, so it really should be two different methods.



I agree, and my first inclination was to propose a different method (and I
had the bikeshedding conversation with myself about the name: disarrange,
scramble, disorder, randomize, ashuffle, some other variation of
the word shuffle, ...), but I figured the first thing folks would say is
Why not just add options to shuffle?  So, choose your battles and all
that.

What do other folks think of making a separate method?



 That said, I would just make the axis=None behavior the same for both
 methods. axis=None does *not* mean treat this like a single
 monolithic blob in any of the axis=-having methods; it means flatten
 the array and do the operation on the single flattened axis. I think
 the latter behavior is a reasonable interpretation of axis=None for
 both methods.



Sounds good to me.

Warren




 --
 Robert Kern
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread josef.pktd
On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser
warren.weckes...@gmail.com wrote:


 On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com wrote:

 On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:

  A small wart in this API is the meaning of
 
shuffle(a, independent=False, axis=None)
 
  It could be argued that the correct behavior is to leave the
  array unchanged. (The current behavior can be interpreted as
  shuffling a 1-d sequence of monolithic blobs; the axis argument
  specifies which axis of the array corresponds to the
  sequence index.  Then `axis=None` means the argument is
  a single monolithic blob, so there is nothing to shuffle.)
  Or an error could be raised.
 
  What do you think?

 It seems to me a perfectly good reason to have two methods instead of
 one. I can't imagine when I wouldn't be using a literal True or False
 for this, so it really should be two different methods.



 I agree, and my first inclination was to propose a different method (and I
 had the bikeshedding conversation with myself about the name: disarrange,
 scramble, disorder, randomize, ashuffle, some other variation of the
 word shuffle, ...), but I figured the first thing folks would say is Why
 not just add options to shuffle?  So, choose your battles and all that.

 What do other folks think of making a separate method?

I'm not a fan of many similar functions.

What's the difference between permute, shuffle and scramble?
And how do I find or remember which is which?





 That said, I would just make the axis=None behavior the same for both
 methods. axis=None does *not* mean treat this like a single
 monolithic blob in any of the axis=-having methods; it means flatten
 the array and do the operation on the single flattened axis. I think
 the latter behavior is a reasonable interpretation of axis=None for
 both methods.



 Sounds good to me.

+1 (since all the arguments have been already given


Josef
- Why does sort treat columns independently instead of sorting rows?
- because there is lexsort
- Oh, lexsort, I haven thought about it in 5 years. It's not even next
to sort in the pop up code completion



 Warren




 --
 Robert Kern
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Warren Weckesser
On Sun, Oct 12, 2014 at 11:20 AM, josef.p...@gmail.com wrote:

 On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
 
  On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com
 wrote:
 
  On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
  warren.weckes...@gmail.com wrote:
 
   A small wart in this API is the meaning of
  
 shuffle(a, independent=False, axis=None)
  
   It could be argued that the correct behavior is to leave the
   array unchanged. (The current behavior can be interpreted as
   shuffling a 1-d sequence of monolithic blobs; the axis argument
   specifies which axis of the array corresponds to the
   sequence index.  Then `axis=None` means the argument is
   a single monolithic blob, so there is nothing to shuffle.)
   Or an error could be raised.
  
   What do you think?
 
  It seems to me a perfectly good reason to have two methods instead of
  one. I can't imagine when I wouldn't be using a literal True or False
  for this, so it really should be two different methods.
 
 
 
  I agree, and my first inclination was to propose a different method (and
 I
  had the bikeshedding conversation with myself about the name:
 disarrange,
  scramble, disorder, randomize, ashuffle, some other variation of
 the
  word shuffle, ...), but I figured the first thing folks would say is
 Why
  not just add options to shuffle?  So, choose your battles and all that.
 
  What do other folks think of making a separate method?

 I'm not a fan of many similar functions.

 What's the difference between permute, shuffle and scramble?



The difference between `shuffle` and the new method being proposed is
explained in the first email in this thread.
`np.random.permutation` with an array argument returns a shuffled copy of
the array; it does not modify its argument. (It should also get an `axis`
argument when `shuffle` gets an `axis` argument.)


And how do I find or remember which is which?



You could start with `doc(np.random)` (or `np.random?` in ipython).

Warren





 
 
 
  That said, I would just make the axis=None behavior the same for both
  methods. axis=None does *not* mean treat this like a single
  monolithic blob in any of the axis=-having methods; it means flatten
  the array and do the operation on the single flattened axis. I think
  the latter behavior is a reasonable interpretation of axis=None for
  both methods.
 
 
 
  Sounds good to me.

 +1 (since all the arguments have been already given


 Josef
 - Why does sort treat columns independently instead of sorting rows?
 - because there is lexsort
 - Oh, lexsort, I haven thought about it in 5 years. It's not even next
 to sort in the pop up code completion


 
  Warren
 
 
 
 
  --
  Robert Kern
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread josef.pktd
On Sun, Oct 12, 2014 at 11:33 AM, Warren Weckesser
warren.weckes...@gmail.com wrote:


 On Sun, Oct 12, 2014 at 11:20 AM, josef.p...@gmail.com wrote:

 On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
 
  On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com
  wrote:
 
  On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
  warren.weckes...@gmail.com wrote:
 
   A small wart in this API is the meaning of
  
 shuffle(a, independent=False, axis=None)
  
   It could be argued that the correct behavior is to leave the
   array unchanged. (The current behavior can be interpreted as
   shuffling a 1-d sequence of monolithic blobs; the axis argument
   specifies which axis of the array corresponds to the
   sequence index.  Then `axis=None` means the argument is
   a single monolithic blob, so there is nothing to shuffle.)
   Or an error could be raised.
  
   What do you think?
 
  It seems to me a perfectly good reason to have two methods instead of
  one. I can't imagine when I wouldn't be using a literal True or False
  for this, so it really should be two different methods.
 
 
 
  I agree, and my first inclination was to propose a different method (and
  I
  had the bikeshedding conversation with myself about the name:
  disarrange,
  scramble, disorder, randomize, ashuffle, some other variation of
  the
  word shuffle, ...), but I figured the first thing folks would say is
  Why
  not just add options to shuffle?  So, choose your battles and all that.
 
  What do other folks think of making a separate method?

 I'm not a fan of many similar functions.

 What's the difference between permute, shuffle and scramble?



 The difference between `shuffle` and the new method being proposed is
 explained in the first email in this thread.
 `np.random.permutation` with an array argument returns a shuffled copy of
 the array; it does not modify its argument. (It should also get an `axis`
 argument when `shuffle` gets an `axis` argument.)


 And how do I find or remember which is which?



 You could start with `doc(np.random)` (or `np.random?` in ipython).

If you have to check the docstring each time, then there is something wrong.
In my opinion all docstrings should be read only once.

It's like a Windows program where the GUI menus are not **self-explanatory**.

What did Save-As do ?

Josef



 Warren





 
 
 
  That said, I would just make the axis=None behavior the same for both
  methods. axis=None does *not* mean treat this like a single
  monolithic blob in any of the axis=-having methods; it means flatten
  the array and do the operation on the single flattened axis. I think
  the latter behavior is a reasonable interpretation of axis=None for
  both methods.
 
 
 
  Sounds good to me.

 +1 (since all the arguments have been already given


 Josef
 - Why does sort treat columns independently instead of sorting rows?
 - because there is lexsort
 - Oh, lexsort, I haven thought about it in 5 years. It's not even next
 to sort in the pop up code completion


 
  Warren
 
 
 
 
  --
  Robert Kern
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Warren Weckesser
On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser 
warren.weckes...@gmail.com wrote:

 I created an issue on github for an enhancement
 to numpy.random.shuffle:
 https://github.com/numpy/numpy/issues/5173
 I'd like to get some feedback on the idea.

 Currently, `shuffle` shuffles the first dimension of an array
 in-place.  For example, shuffling a 2D array shuffles the rows:

 In [227]: a
 Out[227]:
 array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11]])

 In [228]: np.random.shuffle(a)

 In [229]: a
 Out[229]:
 array([[ 0,  1,  2],
[ 9, 10, 11],
[ 3,  4,  5],
[ 6,  7,  8]])


 To add an axis keyword, we could (in effect) apply `shuffle` to
 `a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
 the columns:

 In [232]: a = np.arange(15).reshape(3,5)

 In [233]: a
 Out[233]:
 array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

 In [234]: axis = 1

 In [235]: np.random.shuffle(a.swapaxes(axis, 0))

 In [236]: a
 Out[236]:
 array([[ 3,  2,  4,  0,  1],
[ 8,  7,  9,  5,  6],
[13, 12, 14, 10, 11]])

 So that's the first part--adding an `axis` keyword.

 The other part of the enhancement request is to add a shuffle
 behavior that shuffles the 1-d slices *independently*.  That is,
 for a 2-d array, shuffling with `axis=0` would apply a different
 shuffle to each column.  In the github issue, I defined a
 function called `disarrange` that implements this behavior:

 In [240]: a
 Out[240]:
 array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11],
[12, 13, 14]])

 In [241]: disarrange(a, axis=0)

 In [242]: a
 Out[242]:
 array([[ 6,  1,  2],
[ 3, 13, 14],
[ 9, 10,  5],
[12,  7,  8],
[ 0,  4, 11]])

 Note that each column has been shuffled independently.

 This behavior is analogous to how `sort` handles the `axis`
 keyword.  `sort` sorts the 1-d slices along the given axis
 independently.

 In the github issue, I suggested the following signature
 for `shuffle` (but I'm not too fond of the name `independent`):

   def shuffle(a, independent=False, axis=0)

 If `independent` is False, the current behavior of `shuffle`
 is used.  If `independent` is True, each 1-d slice is shuffled
 independently (in the same way that `sort` sorts each 1-d
 slice).

 Like most functions that take an `axis` argument, `axis=None`
 means to shuffle the flattened array.  With `independent=True`,
 it would act like `np.random.shuffle(a.flat)`, e.g.

 In [247]: a
 Out[247]:
 array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

 In [248]: np.random.shuffle(a.flat)

 In [249]: a
 Out[249]:
 array([[ 0, 14,  9,  1, 13],
[ 2,  8,  5,  3,  4],
[ 6, 10,  7, 12, 11]])


 A small wart in this API is the meaning of

   shuffle(a, independent=False, axis=None)

 It could be argued that the correct behavior is to leave the
 array unchanged. (The current behavior can be interpreted as
 shuffling a 1-d sequence of monolithic blobs; the axis argument
 specifies which axis of the array corresponds to the
 sequence index.  Then `axis=None` means the argument is
 a single monolithic blob, so there is nothing to shuffle.)
 Or an error could be raised.

 What do you think?

 Warren




It is clear from the comments so far that, when `axis` is None, the result
should be a shuffle of all the elements in the array, for both methods of
shuffling (whether implemented as a new method or with a boolean argument
to `shuffle`).  Forget I ever suggested doing nothing or raising an error.
:)

Josef's comment reminded me that `numpy.random.permutation` returns a
shuffled copy of the array (when its argument is an array).  This function
should also get an `axis` argument.  `permutation` shuffles the same way
`shuffle` does--it simply makes a copy and then calls `shuffle` on the
copy.  If a new method is added for the new shuffling style, then it would
be consistent to also add a new method that uses the new shuffling style
and returns a copy of the shuffled array.   Then we would then have four
methods:

   In-placeCopy
Current shuffle style  shuffle permutation
New shuffle style  (name TBD)  (name TBD)

(All of them will have an `axis` argument.)

I suspect this will make some folks prefer the approach of adding a boolean
argument to `shuffle` and `permutation`.

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Sebastian

On 2014-10-12 16:54, Warren Weckesser wrote:


 On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com
 mailto:robert.k...@gmail.com wrote:

 On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
 warren.weckes...@gmail.com mailto:warren.weckes...@gmail.com
 wrote:

  A small wart in this API is the meaning of
 
shuffle(a, independent=False, axis=None)
 
  It could be argued that the correct behavior is to leave the
  array unchanged. (The current behavior can be interpreted as
  shuffling a 1-d sequence of monolithic blobs; the axis argument
  specifies which axis of the array corresponds to the
  sequence index.  Then `axis=None` means the argument is
  a single monolithic blob, so there is nothing to shuffle.)
  Or an error could be raised.
 
  What do you think?

 It seems to me a perfectly good reason to have two methods instead of
 one. I can't imagine when I wouldn't be using a literal True or False
 for this, so it really should be two different methods.



 I agree, and my first inclination was to propose a different method
 (and I had the bikeshedding conversation with myself about the name:
 disarrange, scramble, disorder, randomize, ashuffle, some
 other variation of the word shuffle, ...), but I figured the first
 thing folks would say is Why not just add options to shuffle?  So,
 choose your battles and all that.

 What do other folks think of making a separate method
I'm not a fan of more methods with similar functionality in Numpy. It's
already hard to overlook the existing functions and all their possible
applications and variants. The axis=None proposal for shuffling all
items is very intuitive.

I think we don't want to take the path of matlab: a huge amount of
powerful functions, but few people know of their powerful possibilities.

regards,
Sebastian


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread josef.pktd
On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser
warren.weckes...@gmail.com wrote:


 On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:

 I created an issue on github for an enhancement
 to numpy.random.shuffle:
 https://github.com/numpy/numpy/issues/5173
 I'd like to get some feedback on the idea.

 Currently, `shuffle` shuffles the first dimension of an array
 in-place.  For example, shuffling a 2D array shuffles the rows:

 In [227]: a
 Out[227]:
 array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11]])

 In [228]: np.random.shuffle(a)

 In [229]: a
 Out[229]:
 array([[ 0,  1,  2],
[ 9, 10, 11],
[ 3,  4,  5],
[ 6,  7,  8]])


 To add an axis keyword, we could (in effect) apply `shuffle` to
 `a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
 the columns:

 In [232]: a = np.arange(15).reshape(3,5)

 In [233]: a
 Out[233]:
 array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

 In [234]: axis = 1

 In [235]: np.random.shuffle(a.swapaxes(axis, 0))

 In [236]: a
 Out[236]:
 array([[ 3,  2,  4,  0,  1],
[ 8,  7,  9,  5,  6],
[13, 12, 14, 10, 11]])

 So that's the first part--adding an `axis` keyword.

 The other part of the enhancement request is to add a shuffle
 behavior that shuffles the 1-d slices *independently*.  That is,
 for a 2-d array, shuffling with `axis=0` would apply a different
 shuffle to each column.  In the github issue, I defined a
 function called `disarrange` that implements this behavior:

 In [240]: a
 Out[240]:
 array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11],
[12, 13, 14]])

 In [241]: disarrange(a, axis=0)

 In [242]: a
 Out[242]:
 array([[ 6,  1,  2],
[ 3, 13, 14],
[ 9, 10,  5],
[12,  7,  8],
[ 0,  4, 11]])

 Note that each column has been shuffled independently.

 This behavior is analogous to how `sort` handles the `axis`
 keyword.  `sort` sorts the 1-d slices along the given axis
 independently.

 In the github issue, I suggested the following signature
 for `shuffle` (but I'm not too fond of the name `independent`):

   def shuffle(a, independent=False, axis=0)

 If `independent` is False, the current behavior of `shuffle`
 is used.  If `independent` is True, each 1-d slice is shuffled
 independently (in the same way that `sort` sorts each 1-d
 slice).

 Like most functions that take an `axis` argument, `axis=None`
 means to shuffle the flattened array.  With `independent=True`,
 it would act like `np.random.shuffle(a.flat)`, e.g.

 In [247]: a
 Out[247]:
 array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

 In [248]: np.random.shuffle(a.flat)

 In [249]: a
 Out[249]:
 array([[ 0, 14,  9,  1, 13],
[ 2,  8,  5,  3,  4],
[ 6, 10,  7, 12, 11]])


 A small wart in this API is the meaning of

   shuffle(a, independent=False, axis=None)

 It could be argued that the correct behavior is to leave the
 array unchanged. (The current behavior can be interpreted as
 shuffling a 1-d sequence of monolithic blobs; the axis argument
 specifies which axis of the array corresponds to the
 sequence index.  Then `axis=None` means the argument is
 a single monolithic blob, so there is nothing to shuffle.)
 Or an error could be raised.

 What do you think?

 Warren




 It is clear from the comments so far that, when `axis` is None, the result
 should be a shuffle of all the elements in the array, for both methods of
 shuffling (whether implemented as a new method or with a boolean argument to
 `shuffle`).  Forget I ever suggested doing nothing or raising an error. :)

 Josef's comment reminded me that `numpy.random.permutation`

which kind of proofs my point

I sometimes have problems finding `shuffle` because I want a function
that does permutation.

Josef

returns a
 shuffled copy of the array (when its argument is an array).  This function
 should also get an `axis` argument.  `permutation` shuffles the same way
 `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy.
 If a new method is added for the new shuffling style, then it would be
 consistent to also add a new method that uses the new shuffling style and
 returns a copy of the shuffled array.   Then we would then have four
 methods:

In-placeCopy
 Current shuffle style  shuffle permutation
 New shuffle style  (name TBD)  (name TBD)

 (All of them will have an `axis` argument.)

 I suspect this will make some folks prefer the approach of adding a boolean
 argument to `shuffle` and `permutation`.

 Warren


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org

Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Warren Weckesser
On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser 
warren.weckes...@gmail.com wrote:



 On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser 
 warren.weckes...@gmail.com wrote:

 I created an issue on github for an enhancement
 to numpy.random.shuffle:
 https://github.com/numpy/numpy/issues/5173
 I'd like to get some feedback on the idea.

 Currently, `shuffle` shuffles the first dimension of an array
 in-place.  For example, shuffling a 2D array shuffles the rows:

 In [227]: a
 Out[227]:
 array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11]])

 In [228]: np.random.shuffle(a)

 In [229]: a
 Out[229]:
 array([[ 0,  1,  2],
[ 9, 10, 11],
[ 3,  4,  5],
[ 6,  7,  8]])


 To add an axis keyword, we could (in effect) apply `shuffle` to
 `a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
 the columns:

 In [232]: a = np.arange(15).reshape(3,5)

 In [233]: a
 Out[233]:
 array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

 In [234]: axis = 1

 In [235]: np.random.shuffle(a.swapaxes(axis, 0))

 In [236]: a
 Out[236]:
 array([[ 3,  2,  4,  0,  1],
[ 8,  7,  9,  5,  6],
[13, 12, 14, 10, 11]])

 So that's the first part--adding an `axis` keyword.

 The other part of the enhancement request is to add a shuffle
 behavior that shuffles the 1-d slices *independently*.  That is,
 for a 2-d array, shuffling with `axis=0` would apply a different
 shuffle to each column.  In the github issue, I defined a
 function called `disarrange` that implements this behavior:

 In [240]: a
 Out[240]:
 array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11],
[12, 13, 14]])

 In [241]: disarrange(a, axis=0)

 In [242]: a
 Out[242]:
 array([[ 6,  1,  2],
[ 3, 13, 14],
[ 9, 10,  5],
[12,  7,  8],
[ 0,  4, 11]])

 Note that each column has been shuffled independently.

 This behavior is analogous to how `sort` handles the `axis`
 keyword.  `sort` sorts the 1-d slices along the given axis
 independently.

 In the github issue, I suggested the following signature
 for `shuffle` (but I'm not too fond of the name `independent`):

   def shuffle(a, independent=False, axis=0)

 If `independent` is False, the current behavior of `shuffle`
 is used.  If `independent` is True, each 1-d slice is shuffled
 independently (in the same way that `sort` sorts each 1-d
 slice).

 Like most functions that take an `axis` argument, `axis=None`
 means to shuffle the flattened array.  With `independent=True`,
 it would act like `np.random.shuffle(a.flat)`, e.g.

 In [247]: a
 Out[247]:
 array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

 In [248]: np.random.shuffle(a.flat)

 In [249]: a
 Out[249]:
 array([[ 0, 14,  9,  1, 13],
[ 2,  8,  5,  3,  4],
[ 6, 10,  7, 12, 11]])


 A small wart in this API is the meaning of

   shuffle(a, independent=False, axis=None)

 It could be argued that the correct behavior is to leave the
 array unchanged. (The current behavior can be interpreted as
 shuffling a 1-d sequence of monolithic blobs; the axis argument
 specifies which axis of the array corresponds to the
 sequence index.  Then `axis=None` means the argument is
 a single monolithic blob, so there is nothing to shuffle.)
 Or an error could be raised.

 What do you think?

 Warren




 It is clear from the comments so far that, when `axis` is None, the result
 should be a shuffle of all the elements in the array, for both methods of
 shuffling (whether implemented as a new method or with a boolean argument
 to `shuffle`).  Forget I ever suggested doing nothing or raising an error.
 :)

 Josef's comment reminded me that `numpy.random.permutation` returns a
 shuffled copy of the array (when its argument is an array).  This function
 should also get an `axis` argument.  `permutation` shuffles the same way
 `shuffle` does--it simply makes a copy and then calls `shuffle` on the
 copy.  If a new method is added for the new shuffling style, then it would
 be consistent to also add a new method that uses the new shuffling style
 and returns a copy of the shuffled array.   Then we would then have four
 methods:

In-placeCopy
 Current shuffle style  shuffle permutation
 New shuffle style  (name TBD)  (name TBD)

 (All of them will have an `axis` argument.)



That table makes me think that, *if* we go with new methods, the names
should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix
that is to be determined.  That will ensure that the names appear together
in alphabetical lists, and should show up together as options in
tab-completion or code-completion.

Warren


 I suspect this will make some folks prefer the approach of adding a
 boolean argument to `shuffle` and `permutation`.

 Warren


___
NumPy-Discussion mailing list

Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Jaime Fernández del Río
On Sun, Oct 12, 2014 at 9:29 AM, Warren Weckesser 
warren.weckes...@gmail.com wrote:



 On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser 
 warren.weckes...@gmail.com wrote:



 On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser 
 warren.weckes...@gmail.com wrote:

 I created an issue on github for an enhancement
 to numpy.random.shuffle:
 https://github.com/numpy/numpy/issues/5173
 I'd like to get some feedback on the idea.

 Currently, `shuffle` shuffles the first dimension of an array
 in-place.  For example, shuffling a 2D array shuffles the rows:

 In [227]: a
 Out[227]:
 array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11]])

 In [228]: np.random.shuffle(a)

 In [229]: a
 Out[229]:
 array([[ 0,  1,  2],
[ 9, 10, 11],
[ 3,  4,  5],
[ 6,  7,  8]])


 To add an axis keyword, we could (in effect) apply `shuffle` to
 `a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
 the columns:

 In [232]: a = np.arange(15).reshape(3,5)

 In [233]: a
 Out[233]:
 array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

 In [234]: axis = 1

 In [235]: np.random.shuffle(a.swapaxes(axis, 0))

 In [236]: a
 Out[236]:
 array([[ 3,  2,  4,  0,  1],
[ 8,  7,  9,  5,  6],
[13, 12, 14, 10, 11]])

 So that's the first part--adding an `axis` keyword.

 The other part of the enhancement request is to add a shuffle
 behavior that shuffles the 1-d slices *independently*.  That is,
 for a 2-d array, shuffling with `axis=0` would apply a different
 shuffle to each column.  In the github issue, I defined a
 function called `disarrange` that implements this behavior:

 In [240]: a
 Out[240]:
 array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11],
[12, 13, 14]])

 In [241]: disarrange(a, axis=0)

 In [242]: a
 Out[242]:
 array([[ 6,  1,  2],
[ 3, 13, 14],
[ 9, 10,  5],
[12,  7,  8],
[ 0,  4, 11]])

 Note that each column has been shuffled independently.

 This behavior is analogous to how `sort` handles the `axis`
 keyword.  `sort` sorts the 1-d slices along the given axis
 independently.

 In the github issue, I suggested the following signature
 for `shuffle` (but I'm not too fond of the name `independent`):

   def shuffle(a, independent=False, axis=0)

 If `independent` is False, the current behavior of `shuffle`
 is used.  If `independent` is True, each 1-d slice is shuffled
 independently (in the same way that `sort` sorts each 1-d
 slice).

 Like most functions that take an `axis` argument, `axis=None`
 means to shuffle the flattened array.  With `independent=True`,
 it would act like `np.random.shuffle(a.flat)`, e.g.

 In [247]: a
 Out[247]:
 array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

 In [248]: np.random.shuffle(a.flat)

 In [249]: a
 Out[249]:
 array([[ 0, 14,  9,  1, 13],
[ 2,  8,  5,  3,  4],
[ 6, 10,  7, 12, 11]])


 A small wart in this API is the meaning of

   shuffle(a, independent=False, axis=None)

 It could be argued that the correct behavior is to leave the
 array unchanged. (The current behavior can be interpreted as
 shuffling a 1-d sequence of monolithic blobs; the axis argument
 specifies which axis of the array corresponds to the
 sequence index.  Then `axis=None` means the argument is
 a single monolithic blob, so there is nothing to shuffle.)
 Or an error could be raised.

 What do you think?

 Warren




 It is clear from the comments so far that, when `axis` is None, the
 result should be a shuffle of all the elements in the array, for both
 methods of shuffling (whether implemented as a new method or with a boolean
 argument to `shuffle`).  Forget I ever suggested doing nothing or raising
 an error. :)

 Josef's comment reminded me that `numpy.random.permutation` returns a
 shuffled copy of the array (when its argument is an array).  This function
 should also get an `axis` argument.  `permutation` shuffles the same way
 `shuffle` does--it simply makes a copy and then calls `shuffle` on the
 copy.  If a new method is added for the new shuffling style, then it would
 be consistent to also add a new method that uses the new shuffling style
 and returns a copy of the shuffled array.   Then we would then have four
 methods:

In-placeCopy
 Current shuffle style  shuffle permutation
 New shuffle style  (name TBD)  (name TBD)

 (All of them will have an `axis` argument.)



 That table makes me think that, *if* we go with new methods, the names
 should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix
 that is to be determined.  That will ensure that the names appear together
 in alphabetical lists, and should show up together as options in
 tab-completion or code-completion.


Just to add some noise to a productive conversation: if you add a 'copy'
flag to shuffle, then all the functionality is in one 

Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Stephan Hoyer
On Sun, Oct 12, 2014 at 10:56 AM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 Just to add some noise to a productive conversation: if you add a 'copy'
 flag to shuffle, then all the functionality is in one place, and
 'permutation' can either be deprecated, or trivially implemented in terms
 of the new 'shuffle'.


+1

Unfortunately, shuffle has the better name, but permutation has the better
default behavior.

(also, I think inplace might be a less ambiguous name for the argument
than copy)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Nathaniel Smith
On Sun, Oct 12, 2014 at 5:14 PM, Sebastian se...@sebix.at wrote:

 On 2014-10-12 16:54, Warren Weckesser wrote:


 On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com
 mailto:robert.k...@gmail.com wrote:

 On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
 warren.weckes...@gmail.com mailto:warren.weckes...@gmail.com
 wrote:

  A small wart in this API is the meaning of
 
shuffle(a, independent=False, axis=None)
 
  It could be argued that the correct behavior is to leave the
  array unchanged. (The current behavior can be interpreted as
  shuffling a 1-d sequence of monolithic blobs; the axis argument
  specifies which axis of the array corresponds to the
  sequence index.  Then `axis=None` means the argument is
  a single monolithic blob, so there is nothing to shuffle.)
  Or an error could be raised.
 
  What do you think?

 It seems to me a perfectly good reason to have two methods instead of
 one. I can't imagine when I wouldn't be using a literal True or False
 for this, so it really should be two different methods.



 I agree, and my first inclination was to propose a different method
 (and I had the bikeshedding conversation with myself about the name:
 disarrange, scramble, disorder, randomize, ashuffle, some
 other variation of the word shuffle, ...), but I figured the first
 thing folks would say is Why not just add options to shuffle?  So,
 choose your battles and all that.

 What do other folks think of making a separate method
 I'm not a fan of more methods with similar functionality in Numpy. It's
 already hard to overlook the existing functions and all their possible
 applications and variants. The axis=None proposal for shuffling all
 items is very intuitive.

 I think we don't want to take the path of matlab: a huge amount of
 powerful functions, but few people know of their powerful possibilities.

I totally agree with this principle, but I think this is an exception
to the rule, b/c unfortunately in this case the function that we *do*
have is weird and inconsistent with how most other functions in numpy
work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc
(k,)-(k,) would work. Also, it's easy to implement the current
'shuffle' in terms of any 1d shuffle function, with no explicit loops,
Warren's disarrange requires an explicit loop. So, we really
implemented the wrong one, oops. What this means going forward,
though, is that our only options are either to implement both
behaviours with two functions, or else to give up on have the more
natural behaviour altogether. I think the former is the lesser of two
evils.

Regarding names: shuffle/permutation is a terrible naming convention
IMHO and shouldn't be propagated further. We already have a good
naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
reversed, etc.

So, how about:

scramble + scrambled shuffle individual entries within each
row/column/..., as in Warren's suggestion.

shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
these break a 2d array into a bunch of 1d cards, and then shuffle
those cards).

permuted remains indefinitely, with the docstring: Deprecated alias
for 'shuffled'.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-11 Thread Warren Weckesser
I created an issue on github for an enhancement
to numpy.random.shuffle:
https://github.com/numpy/numpy/issues/5173
I'd like to get some feedback on the idea.

Currently, `shuffle` shuffles the first dimension of an array
in-place.  For example, shuffling a 2D array shuffles the rows:

In [227]: a
Out[227]:
array([[ 0,  1,  2],
   [ 3,  4,  5],
   [ 6,  7,  8],
   [ 9, 10, 11]])

In [228]: np.random.shuffle(a)

In [229]: a
Out[229]:
array([[ 0,  1,  2],
   [ 9, 10, 11],
   [ 3,  4,  5],
   [ 6,  7,  8]])


To add an axis keyword, we could (in effect) apply `shuffle` to
`a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
the columns:

In [232]: a = np.arange(15).reshape(3,5)

In [233]: a
Out[233]:
array([[ 0,  1,  2,  3,  4],
   [ 5,  6,  7,  8,  9],
   [10, 11, 12, 13, 14]])

In [234]: axis = 1

In [235]: np.random.shuffle(a.swapaxes(axis, 0))

In [236]: a
Out[236]:
array([[ 3,  2,  4,  0,  1],
   [ 8,  7,  9,  5,  6],
   [13, 12, 14, 10, 11]])

So that's the first part--adding an `axis` keyword.

The other part of the enhancement request is to add a shuffle
behavior that shuffles the 1-d slices *independently*.  That is,
for a 2-d array, shuffling with `axis=0` would apply a different
shuffle to each column.  In the github issue, I defined a
function called `disarrange` that implements this behavior:

In [240]: a
Out[240]:
array([[ 0,  1,  2],
   [ 3,  4,  5],
   [ 6,  7,  8],
   [ 9, 10, 11],
   [12, 13, 14]])

In [241]: disarrange(a, axis=0)

In [242]: a
Out[242]:
array([[ 6,  1,  2],
   [ 3, 13, 14],
   [ 9, 10,  5],
   [12,  7,  8],
   [ 0,  4, 11]])

Note that each column has been shuffled independently.

This behavior is analogous to how `sort` handles the `axis`
keyword.  `sort` sorts the 1-d slices along the given axis
independently.

In the github issue, I suggested the following signature
for `shuffle` (but I'm not too fond of the name `independent`):

  def shuffle(a, independent=False, axis=0)

If `independent` is False, the current behavior of `shuffle`
is used.  If `independent` is True, each 1-d slice is shuffled
independently (in the same way that `sort` sorts each 1-d
slice).

Like most functions that take an `axis` argument, `axis=None`
means to shuffle the flattened array.  With `independent=True`,
it would act like `np.random.shuffle(a.flat)`, e.g.

In [247]: a
Out[247]:
array([[ 0,  1,  2,  3,  4],
   [ 5,  6,  7,  8,  9],
   [10, 11, 12, 13, 14]])

In [248]: np.random.shuffle(a.flat)

In [249]: a
Out[249]:
array([[ 0, 14,  9,  1, 13],
   [ 2,  8,  5,  3,  4],
   [ 6, 10,  7, 12, 11]])


A small wart in this API is the meaning of

  shuffle(a, independent=False, axis=None)

It could be argued that the correct behavior is to leave the
array unchanged. (The current behavior can be interpreted as
shuffling a 1-d sequence of monolithic blobs; the axis argument
specifies which axis of the array corresponds to the
sequence index.  Then `axis=None` means the argument is
a single monolithic blob, so there is nothing to shuffle.)
Or an error could be raised.

What do you think?

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-11 Thread John Zwinck
On Sun, Oct 12, 2014 at 6:51 AM, Warren Weckesser
warren.weckes...@gmail.com wrote:
 I created an issue on github for an enhancement
 to numpy.random.shuffle:
 https://github.com/numpy/numpy/issues/5173

I like this idea.  I was a bit surprised there wasn't something like
this already.

 A small wart in this API is the meaning of

   shuffle(a, independent=False, axis=None)

 It could be argued that the correct behavior is to leave the
 array unchanged. (The current behavior can be interpreted as
 shuffling a 1-d sequence of monolithic blobs; the axis argument
 specifies which axis of the array corresponds to the
 sequence index.  Then `axis=None` means the argument is
 a single monolithic blob, so there is nothing to shuffle.)
 Or an error could be raised.

Let's think about it from the other direction: if a user wants to
shuffle all the elements as if it were 1-d, as you point out they
could do this:

  shuffle(a, axis=None, independent=True)

But that's a lot of typing.  Maybe we should just let this do the same thing:

  shuffle(a, axis=None)

That seems to be in keeping with the other APIs taking axis as you
mentioned.  To me, independent has no relevance when the array is
1-d, it can simply be ignored.

John Zwinck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion