Re: [Numpy-discussion] Indexing bug

2013-03-31 Thread David Cournapeau
On Sun, Mar 31, 2013 at 6:14 AM, Ivan Oseledets
ivan.oseled...@gmail.com wrote:
 Message: 2
 Date: Sat, 30 Mar 2013 11:13:35 -0700
 From: Jaime Fern?ndez del R?o jaime.f...@gmail.com
 Subject: Re: [Numpy-discussion] Indexing bug?
 To: Discussion of Numerical Python numpy-discussion@scipy.org
 Message-ID:
 capowhwk+ml6kn6f2fhtpn5htiu0ueqpj6kdxjnk_+t1e-yr...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1

 On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets
 ivan.oseled...@gmail.comwrote:

 I am using numpy 1.6.1,
 and encountered a wierd fancy indexing bug:

 import numpy as np
 c = np.random.randn(10,200,10);

 In [29]: print c[[0,1],:200,:2].shape
 (2, 200, 2)

 In [30]: print c[[0,1],:200,[0,1]].shape
 (2, 200)

 It means, that here fancy indexing is not working right for a 3d array.


 On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets
 ivan.oseled...@gmail.comwrote:

 I am using numpy 1.6.1,
 and encountered a wierd fancy indexing bug:

 import numpy as np
 c = np.random.randn(10,200,10);

 In [29]: print c[[0,1],:200,:2].shape
 (2, 200, 2)

 In [30]: print c[[0,1],:200,[0,1]].shape
 (2, 200)

 It means, that here fancy indexing is not working right for a 3d array.

 --
 It is working fine, review the docs:

 http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing

 In your return, item [0, :] is c[0, :, 0] and item[1, :]is c[1, :, 1].

 If you want a return of shape (2, 200, 2) where item [i, :, j] is c[i, :,
 j] you could use slicing:

  c[:2, :200, :2]

 or something more elaborate like:

 c[np.arange(2)[:, None, None], np.arange(200)[:, None], np.arange(2)]

 Jaime
 ---


 Oh!  So it is not a bug, it is a feature, which is completely
 incompatible with other array based languages (MATLAB and Fortran). To
 me, I can not find a single explanation why it is so in numpy.
 Taking submatrices from a matrix is a common operation and the syntax
 above is very natural to take submatrices, not a weird diagonal stuff.

It is not a weird diagonal stuff, but a well define operation: when
you use fancy indexing, the indexing numbers become coordinate (

 i.e.,

 c = np.random.randn(100,100)
 d = c[[0,3],[2,3]]

 should NOT produce two numbers! (and you can not do it using slices!)

 In MATLAB and Fortran
 c(indi,indj)
 will produce a 2 x 2 matrix.
 How it can be done in numpy (and why the complications?)

in your example, it is simple enough:

c[[0, 3], 2:4] (return the first row limited to columns 3, 4, and the
4th row limiter to columns 3, 4).

Numpy's syntax is' biased' toward fancy indexing, and you need more
typing if you want to extract 'irregular' submatrices. Matlab has a
different tradeoff (extracting irregular sub-matrices is sligthly
easier, but selecting a few points is harder as you need sub2index to
use linear indexing).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

2013-03-31 Thread Matthew Brett
Hi,

On Sat, Mar 30, 2013 at 10:38 PM,  josef.p...@gmail.com wrote:
 On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Sat, Mar 30, 2013 at 9:37 PM,  josef.p...@gmail.com wrote:
 On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Sat, Mar 30, 2013 at 7:02 PM,  josef.p...@gmail.com wrote:
 On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Sat, Mar 30, 2013 at 7:50 PM,  josef.p...@gmail.com wrote:
 On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
 brad.froe...@gmail.com wrote:
 On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett 
 matthew.br...@gmail.com
 wrote:

 On Sat, Mar 30, 2013 at 2:20 PM,  josef.p...@gmail.com wrote:
  On Sat, Mar 30, 2013 at 4:57 PM,  josef.p...@gmail.com wrote:
  On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
  matthew.br...@gmail.com wrote:
  On Sat, Mar 30, 2013 at 4:14 AM,  josef.p...@gmail.com wrote:
  On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
  matthew.br...@gmail.com wrote:
 
  Ravel and reshape use the tems 'C' and 'F in the sense of index
  ordering.
 
  This is very confusing.  We think the index ordering and memory
  ordering ideas need to be separated, and specifically, we should
  avoid
  using C and F to refer to index ordering.
 
  Proposal
  -
 
  * Deprecate the use of C and F meaning backwards and 
  forwards
  index ordering for ravel, reshape
  * Prefer Z and N, being graphical representations of 
  unraveling
  in
  2 dimensions, axis1 first and axis0 first respectively 
  (excellent
  naming idea by Paul Ivanov)
 
  What do y'all think?
 
  I always thought F and C are easy to understand, I always 
  thought
  about
  the content and never about the memory when using it.
 
  changing the names doesn't make it easier to understand.
  I think the confusion is because the new A and K refer to existing
  memory
 

 I disagree, I think it's confusing, but I have evidence, and that is
 that four out of four of us tested ourselves and got it wrong.

 Perhaps we are particularly dumb or poorly informed, but I think it's
 rash to assert there is no problem here.

 I think you are overcomplicating things or phrased it as a trick 
 question

 I don't know what you mean by trick question - was there something
 over-complicated in the example?  I deliberately didn't include
 various much more confusing examples in reshape.

 I meant making the candidates think about memory instead of just
 column versus row stacking.

 To be specific, we were teaching about reshaping a (I, J, K, N) 4D
 array, it was an image, with time as the 4th dimension (N time
 points).   Raveling and reshaping 3D and 4D arrays is a common thing
 to do in neuroimaging, as you can imagine.

 A student asked what he would get back from raveling this array, a
 concatenated time series, or something spatial?

 We showed (I'd worked it out by this time) that the first N values
 were the time series given by [0, 0, 0, :].

 He said - Oh - I see - so the data is stored as a whole lot of time
 series one by one, I thought it would be stored as a series of
 images'.

 Ironically, this was a Fortran-ordered array in memory, and he was wrong.

 So, I think the idea of memory ordering and index ordering is very
 easy to confuse, and comes up naturally.

 I would like, as a teacher, to be able to say something like:

 This is what C memory layout is (it's the memory layout  that gives
 arr.flags.C_CONTIGUOUS=True)
 This is what F memory layout is (it's the memory layout  that gives
 arr.flags.F_CONTIGUOUS=True)
 It's rather easy to get something that is neither C or F memory layout
 Numpy does many memory layouts.
 Ravel and reshape and numpy in general do not care (normally) about C
 or F layouts, they only care about index ordering.

 My point, that I'm repeating, is that my job is made harder by
 'arr.ravel('F')'.

 But once you know that ravel and reshape don't care about memory, the
 ravel is easy to predict (maybe not easy to visualize in 4-D):

 But this assumes that you already know that there's such a thing as
 memory layout, and there's such a thing as index ordering, and that
 'C' and 'F' in ravel refer to index ordering.  Once you have that,
 you're golden.  I'm arguing it's markedly harder to get this
 distinction, and keep it in mind, and teach it, if we are using the
 'C' and 'F names for both things.

 No, I think you are still missing my point.
 I think explaining ravel and reshape F and C is easy (kind of) because the
 students don't need to know at that stage about memory layouts.

 All they need to know is that we look at n-dimensional objects in
 C-order or in  F-order
 (whichever index runs fastest)

Would you accept that it may or may not be true that it is desirable
or practical not to mention memory layouts when teaching numpy?

You believe it is desirable, I believe that it is not - that teaching
numpy naturally involves some discussion of memory layout.


Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

2013-03-31 Thread josef . pktd
On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Sat, Mar 30, 2013 at 10:38 PM,  josef.p...@gmail.com wrote:
 On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Sat, Mar 30, 2013 at 9:37 PM,  josef.p...@gmail.com wrote:
 On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Sat, Mar 30, 2013 at 7:02 PM,  josef.p...@gmail.com wrote:
 On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Sat, Mar 30, 2013 at 7:50 PM,  josef.p...@gmail.com wrote:
 On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
 brad.froe...@gmail.com wrote:
 On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett 
 matthew.br...@gmail.com
 wrote:

 On Sat, Mar 30, 2013 at 2:20 PM,  josef.p...@gmail.com wrote:
  On Sat, Mar 30, 2013 at 4:57 PM,  josef.p...@gmail.com wrote:
  On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
  matthew.br...@gmail.com wrote:
  On Sat, Mar 30, 2013 at 4:14 AM,  josef.p...@gmail.com wrote:
  On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
  matthew.br...@gmail.com wrote:
 
  Ravel and reshape use the tems 'C' and 'F in the sense of 
  index
  ordering.
 
  This is very confusing.  We think the index ordering and memory
  ordering ideas need to be separated, and specifically, we 
  should
  avoid
  using C and F to refer to index ordering.
 
  Proposal
  -
 
  * Deprecate the use of C and F meaning backwards and 
  forwards
  index ordering for ravel, reshape
  * Prefer Z and N, being graphical representations of 
  unraveling
  in
  2 dimensions, axis1 first and axis0 first respectively 
  (excellent
  naming idea by Paul Ivanov)
 
  What do y'all think?
 
  I always thought F and C are easy to understand, I always 
  thought
  about
  the content and never about the memory when using it.
 
  changing the names doesn't make it easier to understand.
  I think the confusion is because the new A and K refer to existing
  memory
 

 I disagree, I think it's confusing, but I have evidence, and that is
 that four out of four of us tested ourselves and got it wrong.

 Perhaps we are particularly dumb or poorly informed, but I think it's
 rash to assert there is no problem here.

 I think you are overcomplicating things or phrased it as a trick 
 question

 I don't know what you mean by trick question - was there something
 over-complicated in the example?  I deliberately didn't include
 various much more confusing examples in reshape.

 I meant making the candidates think about memory instead of just
 column versus row stacking.

 To be specific, we were teaching about reshaping a (I, J, K, N) 4D
 array, it was an image, with time as the 4th dimension (N time
 points).   Raveling and reshaping 3D and 4D arrays is a common thing
 to do in neuroimaging, as you can imagine.

 A student asked what he would get back from raveling this array, a
 concatenated time series, or something spatial?

 We showed (I'd worked it out by this time) that the first N values
 were the time series given by [0, 0, 0, :].

 He said - Oh - I see - so the data is stored as a whole lot of time
 series one by one, I thought it would be stored as a series of
 images'.

 Ironically, this was a Fortran-ordered array in memory, and he was wrong.

 So, I think the idea of memory ordering and index ordering is very
 easy to confuse, and comes up naturally.

 I would like, as a teacher, to be able to say something like:

 This is what C memory layout is (it's the memory layout  that gives
 arr.flags.C_CONTIGUOUS=True)
 This is what F memory layout is (it's the memory layout  that gives
 arr.flags.F_CONTIGUOUS=True)
 It's rather easy to get something that is neither C or F memory layout
 Numpy does many memory layouts.
 Ravel and reshape and numpy in general do not care (normally) about C
 or F layouts, they only care about index ordering.

 My point, that I'm repeating, is that my job is made harder by
 'arr.ravel('F')'.

 But once you know that ravel and reshape don't care about memory, the
 ravel is easy to predict (maybe not easy to visualize in 4-D):

 But this assumes that you already know that there's such a thing as
 memory layout, and there's such a thing as index ordering, and that
 'C' and 'F' in ravel refer to index ordering.  Once you have that,
 you're golden.  I'm arguing it's markedly harder to get this
 distinction, and keep it in mind, and teach it, if we are using the
 'C' and 'F names for both things.

 No, I think you are still missing my point.
 I think explaining ravel and reshape F and C is easy (kind of) because the
 students don't need to know at that stage about memory layouts.

 All they need to know is that we look at n-dimensional objects in
 C-order or in  F-order
 (whichever index runs fastest)

 Would you accept that it may or may not be true that it is desirable
 or practical not to mention memory layouts when teaching numpy?

I think they should be in two different 

Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

2013-03-31 Thread Ralf Gommers
On Sun, Mar 31, 2013 at 10:43 PM, josef.p...@gmail.com wrote:

 On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett matthew.br...@gmail.com
 wrote:
  Hi,
 
  On Sat, Mar 30, 2013 at 10:38 PM,  josef.p...@gmail.com wrote:
  On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett 
 matthew.br...@gmail.com wrote:
  Hi,
 
  On Sat, Mar 30, 2013 at 9:37 PM,  josef.p...@gmail.com wrote:
  On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett 
 matthew.br...@gmail.com wrote:
  Hi,
 
  On Sat, Mar 30, 2013 at 7:02 PM,  josef.p...@gmail.com wrote:
  On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett 
 matthew.br...@gmail.com wrote:
  Hi,
 
  On Sat, Mar 30, 2013 at 7:50 PM,  josef.p...@gmail.com wrote:
  On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
  brad.froe...@gmail.com wrote:
  On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett 
 matthew.br...@gmail.com
  wrote:
 
  On Sat, Mar 30, 2013 at 2:20 PM,  josef.p...@gmail.com wrote:
   On Sat, Mar 30, 2013 at 4:57 PM,  josef.p...@gmail.com
 wrote:
   On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
   matthew.br...@gmail.com wrote:
   On Sat, Mar 30, 2013 at 4:14 AM,  josef.p...@gmail.com
 wrote:
   On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
   matthew.br...@gmail.com wrote:
  
   Ravel and reshape use the tems 'C' and 'F in the sense
 of index
   ordering.
  
   This is very confusing.  We think the index ordering and
 memory
   ordering ideas need to be separated, and specifically, we
 should
   avoid
   using C and F to refer to index ordering.
  
   Proposal
   -
  
   * Deprecate the use of C and F meaning backwards and
 forwards
   index ordering for ravel, reshape
   * Prefer Z and N, being graphical representations of
 unraveling
   in
   2 dimensions, axis1 first and axis0 first respectively
 (excellent
   naming idea by Paul Ivanov)
  
   What do y'all think?
  
   I always thought F and C are easy to understand, I
 always thought
   about
   the content and never about the memory when using it.
  
   changing the names doesn't make it easier to understand.
   I think the confusion is because the new A and K refer to
 existing
   memory
  
 
  I disagree, I think it's confusing, but I have evidence, and
 that is
  that four out of four of us tested ourselves and got it wrong.
 
  Perhaps we are particularly dumb or poorly informed, but I
 think it's
  rash to assert there is no problem here.
 
  I think you are overcomplicating things or phrased it as a trick
 question
 
  I don't know what you mean by trick question - was there something
  over-complicated in the example?  I deliberately didn't include
  various much more confusing examples in reshape.
 
  I meant making the candidates think about memory instead of just
  column versus row stacking.
 
  To be specific, we were teaching about reshaping a (I, J, K, N) 4D
  array, it was an image, with time as the 4th dimension (N time
  points).   Raveling and reshaping 3D and 4D arrays is a common thing
  to do in neuroimaging, as you can imagine.
 
  A student asked what he would get back from raveling this array, a
  concatenated time series, or something spatial?
 
  We showed (I'd worked it out by this time) that the first N values
  were the time series given by [0, 0, 0, :].
 
  He said - Oh - I see - so the data is stored as a whole lot of time
  series one by one, I thought it would be stored as a series of
  images'.
 
  Ironically, this was a Fortran-ordered array in memory, and he was
 wrong.
 
  So, I think the idea of memory ordering and index ordering is very
  easy to confuse, and comes up naturally.
 
  I would like, as a teacher, to be able to say something like:
 
  This is what C memory layout is (it's the memory layout  that gives
  arr.flags.C_CONTIGUOUS=True)
  This is what F memory layout is (it's the memory layout  that gives
  arr.flags.F_CONTIGUOUS=True)
  It's rather easy to get something that is neither C or F memory
 layout
  Numpy does many memory layouts.
  Ravel and reshape and numpy in general do not care (normally) about C
  or F layouts, they only care about index ordering.
 
  My point, that I'm repeating, is that my job is made harder by
  'arr.ravel('F')'.
 
  But once you know that ravel and reshape don't care about memory, the
  ravel is easy to predict (maybe not easy to visualize in 4-D):
 
  But this assumes that you already know that there's such a thing as
  memory layout, and there's such a thing as index ordering, and that
  'C' and 'F' in ravel refer to index ordering.  Once you have that,
  you're golden.  I'm arguing it's markedly harder to get this
  distinction, and keep it in mind, and teach it, if we are using the
  'C' and 'F names for both things.
 
  No, I think you are still missing my point.
  I think explaining ravel and reshape F and C is easy (kind of) because
 the
  students don't need to know at that stage about memory layouts.
 
  All they need to know is that we look at n-dimensional objects in
  C-order or in  F-order
  (whichever index runs 

Re: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

2013-03-31 Thread Matthew Brett
Hi,

On Sun, Mar 31, 2013 at 1:43 PM,  josef.p...@gmail.com wrote:
 On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Sat, Mar 30, 2013 at 10:38 PM,  josef.p...@gmail.com wrote:
 On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Sat, Mar 30, 2013 at 9:37 PM,  josef.p...@gmail.com wrote:
 On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Sat, Mar 30, 2013 at 7:02 PM,  josef.p...@gmail.com wrote:
 On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett 
 matthew.br...@gmail.com wrote:
 Hi,

 On Sat, Mar 30, 2013 at 7:50 PM,  josef.p...@gmail.com wrote:
 On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
 brad.froe...@gmail.com wrote:
 On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett 
 matthew.br...@gmail.com
 wrote:

 On Sat, Mar 30, 2013 at 2:20 PM,  josef.p...@gmail.com wrote:
  On Sat, Mar 30, 2013 at 4:57 PM,  josef.p...@gmail.com wrote:
  On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
  matthew.br...@gmail.com wrote:
  On Sat, Mar 30, 2013 at 4:14 AM,  josef.p...@gmail.com wrote:
  On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
  matthew.br...@gmail.com wrote:
 
  Ravel and reshape use the tems 'C' and 'F in the sense of 
  index
  ordering.
 
  This is very confusing.  We think the index ordering and 
  memory
  ordering ideas need to be separated, and specifically, we 
  should
  avoid
  using C and F to refer to index ordering.
 
  Proposal
  -
 
  * Deprecate the use of C and F meaning backwards and 
  forwards
  index ordering for ravel, reshape
  * Prefer Z and N, being graphical representations of 
  unraveling
  in
  2 dimensions, axis1 first and axis0 first respectively 
  (excellent
  naming idea by Paul Ivanov)
 
  What do y'all think?
 
  I always thought F and C are easy to understand, I always 
  thought
  about
  the content and never about the memory when using it.
 
  changing the names doesn't make it easier to understand.
  I think the confusion is because the new A and K refer to 
  existing
  memory
 

 I disagree, I think it's confusing, but I have evidence, and that is
 that four out of four of us tested ourselves and got it wrong.

 Perhaps we are particularly dumb or poorly informed, but I think 
 it's
 rash to assert there is no problem here.

 I think you are overcomplicating things or phrased it as a trick 
 question

 I don't know what you mean by trick question - was there something
 over-complicated in the example?  I deliberately didn't include
 various much more confusing examples in reshape.

 I meant making the candidates think about memory instead of just
 column versus row stacking.

 To be specific, we were teaching about reshaping a (I, J, K, N) 4D
 array, it was an image, with time as the 4th dimension (N time
 points).   Raveling and reshaping 3D and 4D arrays is a common thing
 to do in neuroimaging, as you can imagine.

 A student asked what he would get back from raveling this array, a
 concatenated time series, or something spatial?

 We showed (I'd worked it out by this time) that the first N values
 were the time series given by [0, 0, 0, :].

 He said - Oh - I see - so the data is stored as a whole lot of time
 series one by one, I thought it would be stored as a series of
 images'.

 Ironically, this was a Fortran-ordered array in memory, and he was wrong.

 So, I think the idea of memory ordering and index ordering is very
 easy to confuse, and comes up naturally.

 I would like, as a teacher, to be able to say something like:

 This is what C memory layout is (it's the memory layout  that gives
 arr.flags.C_CONTIGUOUS=True)
 This is what F memory layout is (it's the memory layout  that gives
 arr.flags.F_CONTIGUOUS=True)
 It's rather easy to get something that is neither C or F memory layout
 Numpy does many memory layouts.
 Ravel and reshape and numpy in general do not care (normally) about C
 or F layouts, they only care about index ordering.

 My point, that I'm repeating, is that my job is made harder by
 'arr.ravel('F')'.

 But once you know that ravel and reshape don't care about memory, the
 ravel is easy to predict (maybe not easy to visualize in 4-D):

 But this assumes that you already know that there's such a thing as
 memory layout, and there's such a thing as index ordering, and that
 'C' and 'F' in ravel refer to index ordering.  Once you have that,
 you're golden.  I'm arguing it's markedly harder to get this
 distinction, and keep it in mind, and teach it, if we are using the
 'C' and 'F names for both things.

 No, I think you are still missing my point.
 I think explaining ravel and reshape F and C is easy (kind of) because the
 students don't need to know at that stage about memory layouts.

 All they need to know is that we look at n-dimensional objects in
 C-order or in  F-order
 (whichever index runs fastest)

 Would you accept that it may or may not be true that it is desirable
 or practical not to mention