Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-09 Thread David Cournapeau
On Wed, Jun 9, 2010 at 4:16 PM, Francesc Alted fal...@pytables.org wrote:
 A Tuesday 08 June 2010 23:34:09 Anne Archibald escrigué:
  But the issue isn't one of efficiency, it's merely an arbitrarily chosen
  convention.  (Does anyone know the history of the choices for FORTRAN and
  C, esp. why KR chose the opposite of what was already in common usage in
  FORTRAN?  Just curious?)

 This is speculation, not knowledge, but it's worth pointing out that
 there are actually two ways to represent a multidimensional array in
 C: as a block of memory with appropriate type definitions, or as an
 array of pointers to subarrays. This latter approach is generally not
 used for numerical work, but is potentially useful for other
 applications. More relevantly, it already has a natural syntax;
 a[2][3][5] naturally follows the chain of pointers and gives you what
 you want; it also forces your last index to change most rapidly as you
 walk through memory. So it would be very odd if multidimensional
 arrays defined without pointers but using the same syntax were indexed
 the other way around. (Let's ignore abominations like 5[3[2[a]]].)

 Hey, maybe it is only speculation, but this is the most convincing argument
 for breaking Fortran convention that I've ever heard

I think that arrays are just syntax on pointer is indeed the key
reason for how C works here. Since a[b] really means a + b (which is
why 5[a] and a[5] are the same), I don't see how to do it differently.

 (although I'm not sure if
 C was really breaking Fortran convention, as both languages should have born
 more or less in time, although I'd say that Fortran is a bit older).

Fortran is the oldest language I am aware of - certainly the oldest
still widely in use. it is even older than Lisp, the first version is
from 1956-57, and was proposed by Backus to IBM in 53 according to
wikipedia. It was created at a time where many people thought the very
idea of a compiler did not make any sense and was impossible. So yes,
Fortran is *much* older than C.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-09 Thread Benjamin Root

 I think that arrays are just syntax on pointer is indeed the key
 reason for how C works here. Since a[b] really means a + b (which is
 why 5[a] and a[5] are the same), I don't see how to do it differently.


Holy crap!  You can do that in C?!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-09 Thread David Cournapeau
On Thu, Jun 10, 2010 at 12:09 AM, Benjamin Root ben.r...@ou.edu wrote:
 I think that arrays are just syntax on pointer is indeed the key
 reason for how C works here. Since a[b] really means a + b (which is
 why 5[a] and a[5] are the same), I don't see how to do it differently.

 Holy crap!  You can do that in C?!

Yes:

#include stdio.h

int main()
{
float a[2] = {1.0, 2.0};

printf(%f %f %f\n, a[1], *(a+1), 1[a]);
}
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-09 Thread David Goldsmith
On Wed, Jun 9, 2010 at 9:00 AM, David Cournapeau courn...@gmail.com wrote:

 On Thu, Jun 10, 2010 at 12:09 AM, Benjamin Root ben.r...@ou.edu wrote:
  I think that arrays are just syntax on pointer is indeed the key
  reason for how C works here. Since a[b] really means a + b (which is
  why 5[a] and a[5] are the same), I don't see how to do it differently.
 
  Holy crap!  You can do that in C?!

 Yes:

 #include stdio.h

 int main()
 {
float a[2] = {1.0, 2.0};

printf(%f %f %f\n, a[1], *(a+1), 1[a]);
 }


This is all _very_ educational (and I mean that sincerely), but can we
please get back to the topic at hand ( :-) ).  A specific proposal is on the
table: we remove discussion of the whole C/Fortran ordering issue from
basics.indexing.rst and promote it to a more advanced document TBD.

DG
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-08 Thread David Goldsmith
On Mon, Jun 7, 2010 at 4:52 AM, Pavel Bazant maxpla...@seznam.cz wrote:

 Correct me if I am wrong, but the paragraph

 Note to those used to IDL or Fortran memory order as it relates to
 indexing. Numpy uses C-order indexing. That means that the last index
 usually (see xxx for exceptions) represents the most rapidly changing memory
 location, unlike Fortran or IDL, where the first index represents the most
 rapidly changing location in memory. This difference represents a great
 potential for confusion.

 in

 http://docs.scipy.org/doc/numpy/user/basics.indexing.html

 is quite misleading, as C-order means that the last index changes rapidly,
 not the
 memory location.

 Pavel


Sounds correct (your criticism, that is) but I'm no expert, so I'm going to
wait another 12 hours or so - to give others a chance to chime in - before
correcting it.

DG
-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.

Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
lies, prevents mankind from committing a general suicide.  (As interpreted
by Robert Graves)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-08 Thread Charles R Harris
On Mon, Jun 7, 2010 at 5:52 AM, Pavel Bazant maxpla...@seznam.cz wrote:

 Correct me if I am wrong, but the paragraph

 Note to those used to IDL or Fortran memory order as it relates to
 indexing. Numpy uses C-order indexing. That means that the last index
 usually (see xxx for exceptions) represents the most rapidly changing memory
 location, unlike Fortran or IDL, where the first index represents the most
 rapidly changing location in memory. This difference represents a great
 potential for confusion.

 in

 http://docs.scipy.org/doc/numpy/user/basics.indexing.html

 is quite misleading, as C-order means that the last index changes rapidly,
 not the
 memory location.


Any index can change rapidly, depending on whether is in an inner loop or
not. The important distinction between C and Fortran order is how indices
translate to memory locations. The documentation seems correct to me,
although it might make more sense to say the last index addresses a
contiguous range of memory. Of course, with modern processors, actual
physical memory can be mapped all over the place.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-08 Thread Charles R Harris
On Tue, Jun 8, 2010 at 9:27 AM, Pavel Bazant maxpla...@seznam.cz wrote:


   Correct me if I am wrong, but the paragraph
  
   Note to those used to IDL or Fortran memory order as it relates to
   indexing. Numpy uses C-order indexing. That means that the last index
   usually (see xxx for exceptions) represents the most rapidly changing
 memory
   location, unlike Fortran or IDL, where the first index represents the
 most
   rapidly changing location in memory. This difference represents a great
   potential for confusion.
  
   in
  
   http://docs.scipy.org/doc/numpy/user/basics.indexing.html
  
   is quite misleading, as C-order means that the last index changes
 rapidly,
   not the
   memory location.
  
  
  Any index can change rapidly, depending on whether is in an inner loop or
  not. The important distinction between C and Fortran order is how indices
  translate to memory locations. The documentation seems correct to me,
  although it might make more sense to say the last index addresses a
  contiguous range of memory. Of course, with modern processors, actual
  physical memory can be mapped all over the place.
 
  Chuck

 To me, saying that the last index represents the most rapidly changing
 memory
 location means that if I change the last index, the memory location changes
 a lot, which is not true for C-order. So for C-order, supposed one scans
 the memory
 linearly (the desired scenario),  it is the last *index* that changes most
 rapidly.

 The inverted picture looks like this: For C-order,  changing the first
 index
 leads to the most rapid jump in *memory*.


Good point, I can see that that could be a source of potential confusion.
Perhaps something along the lines that 1) the memory is in one contiguous
slab, and 2) it is accessed in order by changing the rightmost indices
fastest, would be better.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-08 Thread David Goldsmith
On Tue, Jun 8, 2010 at 8:27 AM, Pavel Bazant maxpla...@seznam.cz wrote:


   Correct me if I am wrong, but the paragraph
  
   Note to those used to IDL or Fortran memory order as it relates to
   indexing. Numpy uses C-order indexing. That means that the last index
   usually (see xxx for exceptions) represents the most rapidly changing
 memory
   location, unlike Fortran or IDL, where the first index represents the
 most
   rapidly changing location in memory. This difference represents a great
   potential for confusion.
  
   in
  
   http://docs.scipy.org/doc/numpy/user/basics.indexing.html
  
   is quite misleading, as C-order means that the last index changes
 rapidly,
   not the
   memory location.
  
  
  Any index can change rapidly, depending on whether is in an inner loop or
  not. The important distinction between C and Fortran order is how indices
  translate to memory locations. The documentation seems correct to me,
  although it might make more sense to say the last index addresses a
  contiguous range of memory. Of course, with modern processors, actual
  physical memory can be mapped all over the place.
 
  Chuck

 To me, saying that the last index represents the most rapidly changing
 memory
 location means that if I change the last index, the memory location changes
 a lot, which is not true for C-order. So for C-order, supposed one scans
 the memory
 linearly (the desired scenario),  it is the last *index* that changes most
 rapidly.

 The inverted picture looks like this: For C-order,  changing the first
 index
 leads to the most rapid jump in *memory*.

 Still have the feeling the doc is very misleading at this important issue.

 Pavel


The distinction between your two perspectives is that one is using for-loop
traversal of indices, the other is using pointer-increment traversal of
memory; from each of your perspectives, your conclusions are correct, but
my inclination is that the pointer-increment traversal of memory perspective
is closer to the spirit of the docstring, no?

DG
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-08 Thread Eric Firing
On 06/08/2010 05:50 AM, Charles R Harris wrote:


 On Tue, Jun 8, 2010 at 9:39 AM, David Goldsmith d.l.goldsm...@gmail.com
 mailto:d.l.goldsm...@gmail.com wrote:

 On Tue, Jun 8, 2010 at 8:27 AM, Pavel Bazant maxpla...@seznam.cz
 mailto:maxpla...@seznam.cz wrote:


Correct me if I am wrong, but the paragraph
   
Note to those used to IDL or Fortran memory order as it
 relates to
indexing. Numpy uses C-order indexing. That means that the
 last index
usually (see xxx for exceptions) represents the most
 rapidly changing memory
location, unlike Fortran or IDL, where the first index
 represents the most
rapidly changing location in memory. This difference
 represents a great
potential for confusion.
   
in
   
http://docs.scipy.org/doc/numpy/user/basics.indexing.html
   
is quite misleading, as C-order means that the last index
 changes rapidly,
not the
memory location.
   
   
   Any index can change rapidly, depending on whether is in an
 inner loop or
   not. The important distinction between C and Fortran order is
 how indices
   translate to memory locations. The documentation seems
 correct to me,
   although it might make more sense to say the last index
 addresses a
   contiguous range of memory. Of course, with modern
 processors, actual
   physical memory can be mapped all over the place.
  
   Chuck

 To me, saying that the last index represents the most rapidly
 changing memory
 location means that if I change the last index, the memory
 location changes
 a lot, which is not true for C-order. So for C-order, supposed
 one scans the memory
 linearly (the desired scenario),  it is the last *index* that
 changes most rapidly.

 The inverted picture looks like this: For C-order,  changing the
 first index
 leads to the most rapid jump in *memory*.

 Still have the feeling the doc is very misleading at this
 important issue.

 Pavel


 The distinction between your two perspectives is that one is using
 for-loop traversal of indices, the other is using pointer-increment
 traversal of memory; from each of your perspectives, your
 conclusions are correct, but my inclination is that the
 pointer-increment traversal of memory perspective is closer to the
 spirit of the docstring, no?


 I think the confusion is in most rapidly changing memory location,
 which is kind of ambiguous because a change in the indices is always a
 change in memory location if one hasn't used index tricks and such. So
 from a time perspective it means nothing, while from a memory
 perspective the largest address changes come from the leftmost indices.

Exactly.  Rate of change with respect to what, or as you do what?

I suggest something like the following wording, if you don't mind the 
verbosity as a means of conjuring up an image (although putting in 
diagrams would make it even clearer--undoubtedly there are already good 
illustrations somewhere on the web):



Note to those used to Matlab, IDL, or Fortran memory order as it relates 
to indexing. Numpy uses C-order indexing by default, although a numpy 
array can be designated as using Fortran order. [With C-order, 
sequential memory locations are accessed by incrementing the last 
index.]  For a two-dimensional array, think if it as a table.  With 
C-order indexing the table is stored as a series of rows, so that one is 
reading from left to right, incrementing the column (last) index, and 
jumping ahead in memory to the next row by incrementing the row (first) 
index. With Fortran order, the table is stored as a series of columns, 
so one reads memory sequentially from top to bottom, incrementing the 
first index, and jumps ahead in memory to the next column by 
incrementing the last index.

One more difference to be aware of: numpy, like python and C, uses 
zero-based indexing; Matlab, [IDL???], and Fortran start from one.

-

If you want to keep it short, the key wording is in the sentence in 
brackets, and you can chop out the table illustration.

Eric



 Chuck



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-08 Thread Eric Firing
On 06/08/2010 08:16 AM, Eric Firing wrote:
 On 06/08/2010 05:50 AM, Charles R Harris wrote:


 On Tue, Jun 8, 2010 at 9:39 AM, David Goldsmithd.l.goldsm...@gmail.com
 mailto:d.l.goldsm...@gmail.com  wrote:

  On Tue, Jun 8, 2010 at 8:27 AM, Pavel Bazantmaxpla...@seznam.cz
  mailto:maxpla...@seznam.cz  wrote:


   Correct me if I am wrong, but the paragraph
 
   Note to those used to IDL or Fortran memory order as it
  relates to
   indexing. Numpy uses C-order indexing. That means that the
  last index
   usually (see xxx for exceptions) represents the most
  rapidly changing memory
   location, unlike Fortran or IDL, where the first index
  represents the most
   rapidly changing location in memory. This difference
  represents a great
   potential for confusion.
 
   in
 
   http://docs.scipy.org/doc/numpy/user/basics.indexing.html
 
   is quite misleading, as C-order means that the last index
  changes rapidly,
   not the
   memory location.
 
 
 Any index can change rapidly, depending on whether is in an
  inner loop or
 not. The important distinction between C and Fortran order is
  how indices
 translate to memory locations. The documentation seems
  correct to me,
 although it might make more sense to say the last index
  addresses a
 contiguous range of memory. Of course, with modern
  processors, actual
 physical memory can be mapped all over the place.
   
 Chuck

  To me, saying that the last index represents the most rapidly
  changing memory
  location means that if I change the last index, the memory
  location changes
  a lot, which is not true for C-order. So for C-order, supposed
  one scans the memory
  linearly (the desired scenario),  it is the last *index* that
  changes most rapidly.

  The inverted picture looks like this: For C-order,  changing the
  first index
  leads to the most rapid jump in *memory*.

  Still have the feeling the doc is very misleading at this
  important issue.

  Pavel


  The distinction between your two perspectives is that one is using
  for-loop traversal of indices, the other is using pointer-increment
  traversal of memory; from each of your perspectives, your
  conclusions are correct, but my inclination is that the
  pointer-increment traversal of memory perspective is closer to the
  spirit of the docstring, no?


 I think the confusion is in most rapidly changing memory location,
 which is kind of ambiguous because a change in the indices is always a
 change in memory location if one hasn't used index tricks and such. So
 from a time perspective it means nothing, while from a memory
 perspective the largest address changes come from the leftmost indices.

 Exactly.  Rate of change with respect to what, or as you do what?

 I suggest something like the following wording, if you don't mind the
 verbosity as a means of conjuring up an image (although putting in
 diagrams would make it even clearer--undoubtedly there are already good
 illustrations somewhere on the web):

 

 Note to those used to Matlab, IDL, or Fortran memory order as it relates
 to indexing. Numpy uses C-order indexing by default, although a numpy
 array can be designated as using Fortran order. [With C-order,
 sequential memory locations are accessed by incrementing the last

Maybe change sequential to contiguous.


 index.]  For a two-dimensional array, think if it as a table.  With
 C-order indexing the table is stored as a series of rows, so that one is
 reading from left to right, incrementing the column (last) index, and
 jumping ahead in memory to the next row by incrementing the row (first)
 index. With Fortran order, the table is stored as a series of columns,
 so one reads memory sequentially from top to bottom, incrementing the
 first index, and jumps ahead in memory to the next column by
 incrementing the last index.

 One more difference to be aware of: numpy, like python and C, uses
 zero-based indexing; Matlab, [IDL???], and Fortran start from one.

 -

 If you want to keep it short, the key wording is in the sentence in
 brackets, and you can chop out the table illustration.

 Eric



 Chuck



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 

Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-08 Thread Anne Archibald
On 8 June 2010 14:16, Eric Firing efir...@hawaii.edu wrote:
 On 06/08/2010 05:50 AM, Charles R Harris wrote:


 On Tue, Jun 8, 2010 at 9:39 AM, David Goldsmith d.l.goldsm...@gmail.com
 mailto:d.l.goldsm...@gmail.com wrote:

 On Tue, Jun 8, 2010 at 8:27 AM, Pavel Bazant maxpla...@seznam.cz
 mailto:maxpla...@seznam.cz wrote:


Correct me if I am wrong, but the paragraph
   
Note to those used to IDL or Fortran memory order as it
 relates to
indexing. Numpy uses C-order indexing. That means that the
 last index
usually (see xxx for exceptions) represents the most
 rapidly changing memory
location, unlike Fortran or IDL, where the first index
 represents the most
rapidly changing location in memory. This difference
 represents a great
potential for confusion.
   
in
   
http://docs.scipy.org/doc/numpy/user/basics.indexing.html
   
is quite misleading, as C-order means that the last index
 changes rapidly,
not the
memory location.
   
   
   Any index can change rapidly, depending on whether is in an
 inner loop or
   not. The important distinction between C and Fortran order is
 how indices
   translate to memory locations. The documentation seems
 correct to me,
   although it might make more sense to say the last index
 addresses a
   contiguous range of memory. Of course, with modern
 processors, actual
   physical memory can be mapped all over the place.
  
   Chuck

 To me, saying that the last index represents the most rapidly
 changing memory
 location means that if I change the last index, the memory
 location changes
 a lot, which is not true for C-order. So for C-order, supposed
 one scans the memory
 linearly (the desired scenario),  it is the last *index* that
 changes most rapidly.

 The inverted picture looks like this: For C-order,  changing the
 first index
 leads to the most rapid jump in *memory*.

 Still have the feeling the doc is very misleading at this
 important issue.

 Pavel


 The distinction between your two perspectives is that one is using
 for-loop traversal of indices, the other is using pointer-increment
 traversal of memory; from each of your perspectives, your
 conclusions are correct, but my inclination is that the
 pointer-increment traversal of memory perspective is closer to the
 spirit of the docstring, no?


 I think the confusion is in most rapidly changing memory location,
 which is kind of ambiguous because a change in the indices is always a
 change in memory location if one hasn't used index tricks and such. So
 from a time perspective it means nothing, while from a memory
 perspective the largest address changes come from the leftmost indices.

 Exactly.  Rate of change with respect to what, or as you do what?

 I suggest something like the following wording, if you don't mind the
 verbosity as a means of conjuring up an image (although putting in
 diagrams would make it even clearer--undoubtedly there are already good
 illustrations somewhere on the web):

 

 Note to those used to Matlab, IDL, or Fortran memory order as it relates
 to indexing. Numpy uses C-order indexing by default, although a numpy
 array can be designated as using Fortran order. [With C-order,
 sequential memory locations are accessed by incrementing the last
 index.]  For a two-dimensional array, think if it as a table.  With
 C-order indexing the table is stored as a series of rows, so that one is
 reading from left to right, incrementing the column (last) index, and
 jumping ahead in memory to the next row by incrementing the row (first)
 index. With Fortran order, the table is stored as a series of columns,
 so one reads memory sequentially from top to bottom, incrementing the
 first index, and jumps ahead in memory to the next column by
 incrementing the last index.

 One more difference to be aware of: numpy, like python and C, uses
 zero-based indexing; Matlab, [IDL???], and Fortran start from one.

 -

 If you want to keep it short, the key wording is in the sentence in
 brackets, and you can chop out the table illustration.

I'd just like to point out a few warnings to keep in mind while
rewriting this section:

Numpy arrays can have any configuration of memory strides, including
some that are zero; C and Fortran contiguous arrays are simply those
that have special arrangements of the strides. The actual stride
values is normally almost irrelevant to python code.

There is a second meaning of C and Fortran order: when you are
reshaping an array, you can specify one order or the 

Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-08 Thread David Goldsmith
On Tue, Jun 8, 2010 at 12:05 PM, Anne Archibald
aarch...@physics.mcgill.cawrote:

 On 8 June 2010 14:16, Eric Firing efir...@hawaii.edu wrote:
  On 06/08/2010 05:50 AM, Charles R Harris wrote:
 
 
  On Tue, Jun 8, 2010 at 9:39 AM, David Goldsmith 
 d.l.goldsm...@gmail.com
  mailto:d.l.goldsm...@gmail.com wrote:
 
  On Tue, Jun 8, 2010 at 8:27 AM, Pavel Bazant maxpla...@seznam.cz
  mailto:maxpla...@seznam.cz wrote:
 
 
 Correct me if I am wrong, but the paragraph

 Note to those used to IDL or Fortran memory order as it
  relates to
 indexing. Numpy uses C-order indexing. That means that the
  last index
 usually (see xxx for exceptions) represents the most
  rapidly changing memory
 location, unlike Fortran or IDL, where the first index
  represents the most
 rapidly changing location in memory. This difference
  represents a great
 potential for confusion.

 in

 http://docs.scipy.org/doc/numpy/user/basics.indexing.html

 is quite misleading, as C-order means that the last index
  changes rapidly,
 not the
 memory location.


Any index can change rapidly, depending on whether is in an
  inner loop or
not. The important distinction between C and Fortran order is
  how indices
translate to memory locations. The documentation seems
  correct to me,
although it might make more sense to say the last index
  addresses a
contiguous range of memory. Of course, with modern
  processors, actual
physical memory can be mapped all over the place.
   
Chuck
 
  To me, saying that the last index represents the most rapidly
  changing memory
  location means that if I change the last index, the memory
  location changes
  a lot, which is not true for C-order. So for C-order, supposed
  one scans the memory
  linearly (the desired scenario),  it is the last *index* that
  changes most rapidly.
 
  The inverted picture looks like this: For C-order,  changing the
  first index
  leads to the most rapid jump in *memory*.
 
  Still have the feeling the doc is very misleading at this
  important issue.
 
  Pavel
 
 
  The distinction between your two perspectives is that one is using
  for-loop traversal of indices, the other is using pointer-increment
  traversal of memory; from each of your perspectives, your
  conclusions are correct, but my inclination is that the
  pointer-increment traversal of memory perspective is closer to the
  spirit of the docstring, no?
 
 
  I think the confusion is in most rapidly changing memory location,
  which is kind of ambiguous because a change in the indices is always a
  change in memory location if one hasn't used index tricks and such. So
  from a time perspective it means nothing, while from a memory
  perspective the largest address changes come from the leftmost indices.
 
  Exactly.  Rate of change with respect to what, or as you do what?
 
  I suggest something like the following wording, if you don't mind the
  verbosity as a means of conjuring up an image (although putting in
  diagrams would make it even clearer--undoubtedly there are already good
  illustrations somewhere on the web):
 
  
 
  Note to those used to Matlab, IDL, or Fortran memory order as it relates
  to indexing. Numpy uses C-order indexing by default, although a numpy
  array can be designated as using Fortran order. [With C-order,
  sequential memory locations are accessed by incrementing the last
  index.]  For a two-dimensional array, think if it as a table.  With
  C-order indexing the table is stored as a series of rows, so that one is
  reading from left to right, incrementing the column (last) index, and
  jumping ahead in memory to the next row by incrementing the row (first)
  index. With Fortran order, the table is stored as a series of columns,
  so one reads memory sequentially from top to bottom, incrementing the
  first index, and jumps ahead in memory to the next column by
  incrementing the last index.
 
  One more difference to be aware of: numpy, like python and C, uses
  zero-based indexing; Matlab, [IDL???], and Fortran start from one.
 
  -
 
  If you want to keep it short, the key wording is in the sentence in
  brackets, and you can chop out the table illustration.

 I'd just like to point out a few warnings to keep in mind while
 rewriting this section:

 Numpy arrays can have any configuration of memory strides, including
 some that are zero; C and Fortran contiguous arrays are simply those
 that have special arrangements of 

Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-08 Thread Anne Archibald
On 8 June 2010 17:17, David Goldsmith d.l.goldsm...@gmail.com wrote:
 On Tue, Jun 8, 2010 at 1:56 PM, Benjamin Root ben.r...@ou.edu wrote:

 On Tue, Jun 8, 2010 at 1:36 PM, Eric Firing efir...@hawaii.edu wrote:

 On 06/08/2010 08:16 AM, Eric Firing wrote:
  On 06/08/2010 05:50 AM, Charles R Harris wrote:
 
  On Tue, Jun 8, 2010 at 9:39 AM, David
  Goldsmithd.l.goldsm...@gmail.com
  mailto:d.l.goldsm...@gmail.com  wrote:
 
   On Tue, Jun 8, 2010 at 8:27 AM, Pavel Bazantmaxpla...@seznam.cz
   mailto:maxpla...@seznam.cz  wrote:
 
Correct me if I am wrong, but the paragraph
  
Note to those used to IDL or Fortran memory order as
  it
   relates to
indexing. Numpy uses C-order indexing. That means that
  the
   last index
usually (see xxx for exceptions) represents the most
   rapidly changing memory
location, unlike Fortran or IDL, where the first index
   represents the most
rapidly changing location in memory. This difference
   represents a great
potential for confusion.
  
in
  
  
   http://docs.scipy.org/doc/numpy/user/basics.indexing.html
  
is quite misleading, as C-order means that the last
  index
   changes rapidly,
not the
memory location.
  
  
  Any index can change rapidly, depending on whether is in
  an
   inner loop or
  not. The important distinction between C and Fortran
  order is
   how indices
  translate to memory locations. The documentation seems
   correct to me,
  although it might make more sense to say the last index
   addresses a
  contiguous range of memory. Of course, with modern
   processors, actual
  physical memory can be mapped all over the place.

  Chuck
 
   To me, saying that the last index represents the most rapidly
   changing memory
   location means that if I change the last index, the memory
   location changes
   a lot, which is not true for C-order. So for C-order,
  supposed
   one scans the memory
   linearly (the desired scenario),  it is the last *index* that
   changes most rapidly.
 
   The inverted picture looks like this: For C-order,  changing
  the
   first index
   leads to the most rapid jump in *memory*.
 
   Still have the feeling the doc is very misleading at this
   important issue.
 
   Pavel
 
 
   The distinction between your two perspectives is that one is
  using
   for-loop traversal of indices, the other is using
  pointer-increment
   traversal of memory; from each of your perspectives, your
   conclusions are correct, but my inclination is that the
   pointer-increment traversal of memory perspective is closer to
  the
   spirit of the docstring, no?
 
 
  I think the confusion is in most rapidly changing memory location,
  which is kind of ambiguous because a change in the indices is always a
  change in memory location if one hasn't used index tricks and such. So
  from a time perspective it means nothing, while from a memory
  perspective the largest address changes come from the leftmost
  indices.
 
  Exactly.  Rate of change with respect to what, or as you do what?
 
  I suggest something like the following wording, if you don't mind the
  verbosity as a means of conjuring up an image (although putting in
  diagrams would make it even clearer--undoubtedly there are already good
  illustrations somewhere on the web):
 
  
 
  Note to those used to Matlab, IDL, or Fortran memory order as it
  relates
  to indexing. Numpy uses C-order indexing by default, although a numpy
  array can be designated as using Fortran order. [With C-order,
  sequential memory locations are accessed by incrementing the last

 Maybe change sequential to contiguous.

 I was thinking maybe subsequent might be a better word.

 IMV, contiguous has more of a physical connotation.  (That just isn't
 valid in Numpy, correct?)  So I'd prefer subsequent as an alternative to
 sequential.

 In the end, we need to communicate this clearly.  No matter which
 language, I have always found it difficult to get new programmers to
 understand the importance of knowing the difference between row-major and
 column-major.  A thick paragraph isn't going to help to get the idea
 across to a person who doesn't even know that a problem exists.

 Maybe a car analogy would be good here...

 Maybe if one imagine city streets (where many of the streets are one-way),
 and need to drop off mail at each address.  Would it be more efficient to go
 up and back a street or to drop off mail at the 

Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?

2010-06-08 Thread Friedrich Romstedt
2010/6/8 Anne Archibald aarch...@physics.mcgill.ca:
 Numpy arrays can have any configuration of memory strides, including
 some that are zero; C and Fortran contiguous arrays are simply those
 that have special arrangements of the strides. The actual stride
 values is normally almost irrelevant to python code.

First, I don't see the point why this text made it's way to this doc
page at all - it's all abstract Python numpy indexing on that page as
far as I can see - I don't know why a beginner should worry about
strides and how the linear memory is actually organised - from my
point of view I never did that.

To resolve the problem, and to avoid the confusion about the fast
and slow, why not using directly the concept of strides as Anne
pointed out.  Simply saying that:

When an array is indiced with indices (i1, i2, i3, ... in) and the
array has strides (s1, s2, s3, ..., sn), the memory location addressed
is:
i1 * s1 + i2 * s2 + ... + in * sn
relative to the base point (and up to dtype).  I hope I'm not wrong here.
For C order,
s1 = s2 = ... = sn = 1 ,
for fortran order the other way round.

Friedrich


In fact it holds:
s(k-1) = sk * Nk for C order,
s(k+1) = sk * Nk for fortran order,
where Nk is the length of dimension k.
If I'm not mistaken, it's late at night.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion