Re: [Numpy-discussion] SWIGed function does not like my boolean array

2007-11-08 Thread Sebastian Haase
On Nov 7, 2007 6:46 PM, Timothy Hochberg [EMAIL PROTECTED] wrote:




 On Nov 7, 2007 10:35 AM, Sebastian Haase [EMAIL PROTECTED] wrote:
 
  On Nov 7, 2007 5:23 PM, Matthieu Brucher [EMAIL PROTECTED]
 wrote:
  
I don't understand.  I'm thinking of most math functions in the
C-library. In C a boolean is just an integer of 0 or 1 (quasi, by
definition).
   
Could you explain what you mean ?
   
  
   In C++, bool is a new type that has two values, true and false. If you
 add
   true and true, it is still true, and not 2. In C, everything that is not
 0
   is true, not in C++.
 
  Yes, I know this. But my situation is the other way around. Lets say
  I want to count foreground pixels in an image: I would want to sum
  all the true values, i.e. a true *is* a 1 and a false *is* a 0.
 
  In other words, I'm really thinking of (older kind of) C, where there
  *was* no bool.
  I assume this thinking still applies to the internal arithmetic of CPUs
 today.
  Also the bit-values of a boolean array (in memory) are set this way
  already anyway !
 
  How can I simply call my functions looking at these bit values ?
  (essentially interpreting a boolean true as 1 and false as 0)

 I'm not sure how well this would work, but could you change the dtype before
 passing the array to your function? If you wanted a copy, you could just to
 the equivalent of a.astype(unit8). However, if you didn't want a copy, you
 could set the dtype to unit8, operate on the array and then reset it to
 bool:
  a = np.array([True, True, False, True])
  a
 array([ True,  True, False,  True], dtype=bool)
  a.dtype = np.uint8
  a
 array([1, 1, 0, 1], dtype=uint8)
  # do something with 'a' here
  a.dtype = bool
  a
 array([ True,  True, False,  True], dtype=bool)
 This assumes everything is single threaded. If you have multiple threads
 accessing 'a', this could be a problem... And, you probably want to do this
 in C, so translate as appropriate.

 -tim

Thanks Tim, this sound like a good idea.  How about creating an
a = a.view()
before changing dtype. This should make the proposed solution thread safe again.
How expensive is the creation of a view (performance wise, e.g.
compared to calling a trivial C-function) ?

Thanks,
Sebastian
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread Hans Meine
Hi!

I wonder why simple elementwise operations like a * 2 or a + 1 are not 
performed in order of increasing memory addresses in order to exploit CPU 
caches etc. - as it is now, their speed drops by a factor of around 3 simply 
by transpose()ing.  Similarly (but even less logical), copy() and even the 
constructor are affected (yes, I understand that copy() creates contiguous 
arrays, but shouldn't it respect/retain the order nevertheless?):

### constructor ###
In [89]: %timeit -r 10 -n 100 numpy.ndarray((3,3,3))
100 loops, best of 10: 1.19 s per loop

In [90]: %timeit -r 10 -n 100 numpy.ndarray((3,3,3), order=f)
100 loops, best of 10: 2.19 s per loop

### copy 3x3x3 array ###
In [85]: a = numpy.ndarray((3,3,3))

In [86]: %timeit -r 10 a.copy()
100 loops, best of 10: 1.14 s per loop

In [87]: a = numpy.ndarray((3,3,3), order=f)

In [88]: %timeit -r 10 -n 100 a.copy()
100 loops, best of 10: 3.39 s per loop

### copy 256x256x256 array ###
In [74]: a = numpy.ndarray((256,256,256))

In [75]: %timeit -r 10 a.copy()
10 loops, best of 10: 119 ms per loop

In [76]: a = numpy.ndarray((256,256,256), order=f)

In [77]: %timeit -r 10 a.copy()
10 loops, best of 10: 274 ms per loop

### fill ###
In [79]: a = numpy.ndarray((256,256,256))

In [80]: %timeit -r 10 a.fill(0)
10 loops, best of 10: 60.2 ms per loop

In [81]: a = numpy.ndarray((256,256,256), order=f)

In [82]: %timeit -r 10 a.fill(0)
10 loops, best of 10: 60.2 ms per loop

### power ###
In [151]: a = numpy.ndarray((256,256,256))

In [152]: %timeit -r 10 a ** 2
10 loops, best of 10: 124 ms per loop

In [153]: a = numpy.asfortranarray(a)

In [154]: %timeit -r 10 a ** 2
10 loops, best of 10: 458 ms per loop

### addition ###
In [160]: a = numpy.ndarray((256,256,256))

In [161]: %timeit -r 10 a + 1
10 loops, best of 10: 139 ms per loop

In [162]: a = numpy.asfortranarray(a)

In [163]: %timeit -r 10 a + 1
10 loops, best of 10: 465 ms per loop

### fft ###
In [146]: %timeit -r 10 numpy.fft.fft(vol, axis=0)
10 loops, best of 10: 1.16 s per loop

In [148]: %timeit -r 10 numpy.fft.fft(vol0, axis=2)
10 loops, best of 10: 1.16 s per loop

In [149]: vol.flags
Out[149]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [150]: vol0.flags
Out[150]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [9]: %timeit -r 10 numpy.fft.fft(vol0, axis=0)
10 loops, best of 10: 939 ms per loop

### mean ###
In [173]: %timeit -r 10 vol.mean()
10 loops, best of 10: 272 ms per loop

In [174]: %timeit -r 10 vol0.mean()
10 loops, best of 10: 683 ms per loop

### max ###
In [175]: %timeit -r 10 vol.max()
10 loops, best of 10: 63.8 ms per loop

In [176]: %timeit -r 10 vol0.max()
10 loops, best of 10: 475 ms per loop

### min ###
In [177]: %timeit -r 10 vol.min()
10 loops, best of 10: 63.8 ms per loop

In [178]: %timeit -r 10 vol0.min()
10 loops, best of 10: 476 ms per loop

### rot90 ###
In [10]: %timeit -r 10 numpy.rot90(vol)
10 loops, best of 10: 6.97 s per loop

In [12]: %timeit -r 10 numpy.rot90(vol0)
10 loops, best of 10: 6.92 s per loop

-- 
Ciao, /  /
 /--/
/  / ANS
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread David Cournapeau
Hans Meine wrote:
 Hi!

 I wonder why simple elementwise operations like a * 2 or a + 1 are not 
 performed in order of increasing memory addresses in order to exploit CPU 
 caches etc. - as it is now, their speed drops by a factor of around 3 simply 
 by transpose()ing. 
Because it is not trivial to do so in all cases, I guess. It is a 
problem which comes back time to time on the ML, but AFAIK, nobody had a 
fix for it. Fundamentally, for many element-wise operations, either you 
have to implement the thing for every possible case, or you abstract it 
through an iterator, which gives you a decrease of performances in some 
cases. There are also cases where the current implementation is far from 
optimal, for lack of man power I guess (taking a look at PyArray_Mean, 
for example, shows that it uses PyArray_GenericReduceFunction, which is 
really slow compare to a straight C implementation).
  Similarly (but even less logical), copy() and even the 
 constructor are affected (yes, I understand that copy() creates contiguous 
 arrays, but shouldn't it respect/retain the order nevertheless?):
   
I don't see why it is illogical: when you do a copy, you don't preserve 
memory layout, and so a simple memcpy of the whole buffer is not possible.

cheers,

David
 ### constructor ###
 In [89]: %timeit -r 10 -n 100 numpy.ndarray((3,3,3))
 100 loops, best of 10: 1.19 s per loop

 In [90]: %timeit -r 10 -n 100 numpy.ndarray((3,3,3), order=f)
 100 loops, best of 10: 2.19 s per loop

 ### copy 3x3x3 array ###
 In [85]: a = numpy.ndarray((3,3,3))

 In [86]: %timeit -r 10 a.copy()
 100 loops, best of 10: 1.14 s per loop

 In [87]: a = numpy.ndarray((3,3,3), order=f)

 In [88]: %timeit -r 10 -n 100 a.copy()
 100 loops, best of 10: 3.39 s per loop

 ### copy 256x256x256 array ###
 In [74]: a = numpy.ndarray((256,256,256))

 In [75]: %timeit -r 10 a.copy()
 10 loops, best of 10: 119 ms per loop

 In [76]: a = numpy.ndarray((256,256,256), order=f)

 In [77]: %timeit -r 10 a.copy()
 10 loops, best of 10: 274 ms per loop

 ### fill ###
 In [79]: a = numpy.ndarray((256,256,256))

 In [80]: %timeit -r 10 a.fill(0)
 10 loops, best of 10: 60.2 ms per loop

 In [81]: a = numpy.ndarray((256,256,256), order=f)

 In [82]: %timeit -r 10 a.fill(0)
 10 loops, best of 10: 60.2 ms per loop

 ### power ###
 In [151]: a = numpy.ndarray((256,256,256))

 In [152]: %timeit -r 10 a ** 2
 10 loops, best of 10: 124 ms per loop

 In [153]: a = numpy.asfortranarray(a)

 In [154]: %timeit -r 10 a ** 2
 10 loops, best of 10: 458 ms per loop

 ### addition ###
 In [160]: a = numpy.ndarray((256,256,256))

 In [161]: %timeit -r 10 a + 1
 10 loops, best of 10: 139 ms per loop

 In [162]: a = numpy.asfortranarray(a)

 In [163]: %timeit -r 10 a + 1
 10 loops, best of 10: 465 ms per loop

 ### fft ###
 In [146]: %timeit -r 10 numpy.fft.fft(vol, axis=0)
 10 loops, best of 10: 1.16 s per loop

 In [148]: %timeit -r 10 numpy.fft.fft(vol0, axis=2)
 10 loops, best of 10: 1.16 s per loop

 In [149]: vol.flags
 Out[149]:
   C_CONTIGUOUS : True
   F_CONTIGUOUS : False
   OWNDATA : True
   WRITEABLE : True
   ALIGNED : True
   UPDATEIFCOPY : False

 In [150]: vol0.flags
 Out[150]:
   C_CONTIGUOUS : False
   F_CONTIGUOUS : True
   OWNDATA : False
   WRITEABLE : True
   ALIGNED : True
   UPDATEIFCOPY : False

 In [9]: %timeit -r 10 numpy.fft.fft(vol0, axis=0)
 10 loops, best of 10: 939 ms per loop

 ### mean ###
 In [173]: %timeit -r 10 vol.mean()
 10 loops, best of 10: 272 ms per loop

 In [174]: %timeit -r 10 vol0.mean()
 10 loops, best of 10: 683 ms per loop

 ### max ###
 In [175]: %timeit -r 10 vol.max()
 10 loops, best of 10: 63.8 ms per loop

 In [176]: %timeit -r 10 vol0.max()
 10 loops, best of 10: 475 ms per loop

 ### min ###
 In [177]: %timeit -r 10 vol.min()
 10 loops, best of 10: 63.8 ms per loop

 In [178]: %timeit -r 10 vol0.min()
 10 loops, best of 10: 476 ms per loop

 ### rot90 ###
 In [10]: %timeit -r 10 numpy.rot90(vol)
 10 loops, best of 10: 6.97 s per loop

 In [12]: %timeit -r 10 numpy.rot90(vol0)
 10 loops, best of 10: 6.92 s per loop

   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread Hans Meine
Am Donnerstag, 08. November 2007 13:44:59 schrieb David Cournapeau:
 Hans Meine wrote:
  I wonder why simple elementwise operations like a * 2 or a + 1 are
  not performed in order of increasing memory addresses in order to exploit
  CPU caches etc. - as it is now, their speed drops by a factor of around 3
  simply by transpose()ing.

 Because it is not trivial to do so in all cases, I guess. It is a
 problem which comes back time to time on the ML, but AFAIK, nobody had a
 fix for it. Fundamentally, for many element-wise operations, either you
 have to implement the thing for every possible case, or you abstract it
 through an iterator, which gives you a decrease of performances in some
 cases.

OK, so far I have not looked at the NumPy internals, but I would expect sth. 
like the following:

[elementwise operation on array 'a']
1) ii = [i for i, s in sorted(enumerate(ia.strides), key = lambda (i, s): -s)]
2) aprime = a.transpose(ii) # aprime will be as C-contiguous as it gets
3) bprime = perform_operation(aprime) # fast elementwise operation
4) jj = [ii.index(i) for i in range(bprime.ndim)] # inverse ii
5) b = bprime.transpose(jj) # let result have the same order as the input

Apart from the fact that this is more or less pseudo-code and that the last 
step brings a behaviour change (which is intended, but probably not very 
welcome), am I overlooking a problem?

   Similarly (but even less logical), copy() and even the
  constructor are affected (yes, I understand that copy() creates
  contiguous arrays, but shouldn't it respect/retain the order
  nevertheless?):

 I don't see why it is illogical: when you do a copy, you don't preserve
 memory layout, and so a simple memcpy of the whole buffer is not possible.

As I wrote above, I don't think this is good.  A fortran-order-contiguous 
array is still contiguous, and not inferior in any way to C-order arrays.
So I actually expect copy() to return an array of unchanged order.  The numpy 
book only says copy() - Return a copy of the array (which is always 
single-segment, and ALIGNED)., which would be fulfilled also if the memory 
order was preserved as by my proposed method from above.

-- 
Ciao, /  /
 /--/
/  / ANS
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] SWIGed function does not like my boolean array

2007-11-08 Thread Timothy Hochberg
On Nov 8, 2007 3:28 AM, Sebastian Haase [EMAIL PROTECTED] wrote:

 On Nov 7, 2007 6:46 PM, Timothy Hochberg [EMAIL PROTECTED] wrote:
 
 
 
 
  On Nov 7, 2007 10:35 AM, Sebastian Haase [EMAIL PROTECTED] wrote:
  
   On Nov 7, 2007 5:23 PM, Matthieu Brucher [EMAIL PROTECTED]
  wrote:
   
 I don't understand.  I'm thinking of most math functions in the
 C-library. In C a boolean is just an integer of 0 or 1 (quasi, by
 definition).

 Could you explain what you mean ?

   
In C++, bool is a new type that has two values, true and false. If
 you
  add
true and true, it is still true, and not 2. In C, everything that is
 not
  0
is true, not in C++.
  
   Yes, I know this. But my situation is the other way around. Lets say
   I want to count foreground pixels in an image: I would want to sum
   all the true values, i.e. a true *is* a 1 and a false *is* a 0.
  
   In other words, I'm really thinking of (older kind of) C, where there
   *was* no bool.
   I assume this thinking still applies to the internal arithmetic of
 CPUs
  today.
   Also the bit-values of a boolean array (in memory) are set this way
   already anyway !
  
   How can I simply call my functions looking at these bit values ?
   (essentially interpreting a boolean true as 1 and false as 0)
 
  I'm not sure how well this would work, but could you change the dtype
 before
  passing the array to your function? If you wanted a copy, you could just
 to
  the equivalent of a.astype(unit8). However, if you didn't want a copy,
 you
  could set the dtype to unit8, operate on the array and then reset it to
  bool:
   a = np.array([True, True, False, True])
   a
  array([ True,  True, False,  True], dtype=bool)
   a.dtype = np.uint8
   a
  array([1, 1, 0, 1], dtype=uint8)
   # do something with 'a' here
   a.dtype = bool
   a
  array([ True,  True, False,  True], dtype=bool)
  This assumes everything is single threaded. If you have multiple threads
  accessing 'a', this could be a problem... And, you probably want to do
 this
  in C, so translate as appropriate.
 
  -tim
 
 Thanks Tim, this sound like a good idea.  How about creating an
 a = a.view()
 before changing dtype.


Yeah. That's a better idea. You should be able to just use a.view(np.uint8
).


 This should make the proposed solution thread safe again.
 How expensive is the creation of a view (performance wise, e.g.
 compared to calling a trivial C-function) ?


It should be pretty cheap in theory, but I have no idea really.

-- 
.  __
.   |-\
.
.  [EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread Hans Meine
Am Donnerstag, 08. November 2007 16:37:06 schrieb David Cournapeau:
 The problem is not F vs C storage: for element-wise operation, it does
 not matter at all; you just apply the same function
 (perform_operation) over and over on every element of the array. The
 order does not matter at all.

Yet Fortran order leads to several times slower operations, see the figures in 
my original post. :-(

 But what if you have segmented buffers, non aligned, etc... arrays ?

The code I posted should deal with it - by sorting the indices by decreasing 
stride, I simply ensure that all (source and target) segments are traversed 
in order of increasing memory addresses.  It does not affect segments or 
alignment.

 All this has to be taken care of,

Right - my perform_operation(aprime) step would need to apply the operation 
on every memory segment.

 and this means handling reference 
 count and other things which are always delicate to handle well...

I am not sure that I understand where refcounting comes into play here.

 Or 
 you just use the current situation, which let python handle it
 (through PyObject_Call and a callable, at a C level).

I need to look at the code to see what you mean here.  Probably, I have a 
wrong picture of where the slowness comes from (I thought that the order of 
the loops was wrong).

  As I wrote above, I don't think this is good.  A fortran-order-contiguous
  array is still contiguous, and not inferior in any way to C-order arrays.
  So I actually expect copy() to return an array of unchanged order.

 Maybe this behaviour was kept for compatibility with numarray ? If you
 look at the docstring, it is said that copy may not return the same
 order than its input. It kind of make sense to me: C order is the
 default of many numpy operations.

That is very sad, because Fortran order is much more natural for handling 
images, where you're absolutely used to index with [x,y], and x being the 
faster-changing index in memory.

-- 
Ciao, /  /
 /--/
/  / ANS
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread David Cournapeau
On Nov 8, 2007 10:13 PM, Hans Meine [EMAIL PROTECTED] wrote:
 Am Donnerstag, 08. November 2007 13:44:59 schrieb David Cournapeau:
  Hans Meine wrote:
   I wonder why simple elementwise operations like a * 2 or a + 1 are
   not performed in order of increasing memory addresses in order to exploit
   CPU caches etc. - as it is now, their speed drops by a factor of around 3
   simply by transpose()ing.
 
  Because it is not trivial to do so in all cases, I guess. It is a
  problem which comes back time to time on the ML, but AFAIK, nobody had a
  fix for it. Fundamentally, for many element-wise operations, either you
  have to implement the thing for every possible case, or you abstract it
  through an iterator, which gives you a decrease of performances in some
  cases.

 OK, so far I have not looked at the NumPy internals, but I would expect sth.
 like the following:

 [elementwise operation on array 'a']
 1) ii = [i for i, s in sorted(enumerate(ia.strides), key = lambda (i, s): -s)]
 2) aprime = a.transpose(ii) # aprime will be as C-contiguous as it gets
 3) bprime = perform_operation(aprime) # fast elementwise operation
 4) jj = [ii.index(i) for i in range(bprime.ndim)] # inverse ii
 5) b = bprime.transpose(jj) # let result have the same order as the input
The problem is not F vs C storage: for element-wise operation, it does
not matter at all; you just apply the same function
(perform_operation) over and over on every element of the array. The
order does not matter at all.

But what if you have segmented buffers, non aligned, etc... arrays ?
All this has to be taken care of, and this means handling reference
count and other things which are always delicate to handle well... Or
you just use the current situation, which let python handle it
(through PyObject_Call and a callable, at a C level).

I agree that in some cases, it would be useful to have better
performances: in particular, having fast code paths for common cases
(aligned, and single segmented) would be nice. One example is the
PyArray_Clip function (in multiarray.c), whose speed was too slow for
my taste in my use cases: the general idea is to get to the point
where you have a C single buffer at which point you can call a trivial
C function (fast_clip). As you can see, this is not trivial (~150
lines of C code): since the point is to go faster, you really want to
avoid copying things, which means you have to be pretty careful.

Speed-wise, it definitely worths it, though: I don't remember the
exact numbers, but in some cases (which are very common), it was a
several times speed.

 Apart from the fact that this is more or less pseudo-code and that the last
 step brings a behaviour change (which is intended, but probably not very
 welcome), am I overlooking a problem?

Similarly (but even less logical), copy() and even the
   constructor are affected (yes, I understand that copy() creates
   contiguous arrays, but shouldn't it respect/retain the order
   nevertheless?):
 
  I don't see why it is illogical: when you do a copy, you don't preserve
  memory layout, and so a simple memcpy of the whole buffer is not possible.

 As I wrote above, I don't think this is good.  A fortran-order-contiguous
 array is still contiguous, and not inferior in any way to C-order arrays.
 So I actually expect copy() to return an array of unchanged order.
Maybe this behaviour was kept for compatibility with numarray ? If you
look at the docstring, it is said that copy may not return the same
order than its input. It kind of make sense to me: C order is the
default of many numpy operations.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] saving and loading as PNG

2007-11-08 Thread Martin Teichmann
Hi all,

i am working with numpy and mostly 16 bit matrices, and so I was looking
for a standard way of saving them. I found that PNG actually supports
that, but I could not find a way to use this feature from within
python. I thought that it actually is a cool way of storing matrices
and thus I wrote a little pyrex program that does the job. I figured
this might be interesting to a broader audience, so I posted it here,
maybe for inclusion in numpy?

Some more thoughts:
* Other implementations: There is other people who have done such a thing.
the PIL knows how to read and write PNG, but only 8 bit. The same holds
for matplotlib.

* Problems: until now, numpy does not link agains libpng. This should not
be a problem on linux-like systems, as those normally all have libpng 
installed. Windows however, I don't know. Same thing with pyrex. 
Numpy doesn't use pyrex until now.

* other places to put: numpy could be considered to basic to have such
a functionality, one could put it in scipy. Well, but png is not really
a _scientific_ application (even though I personally use it to do science).
Matplotlib would be another nice place to put this feature, big advantage
is that matplotlib already links against libpng.

So, now I shared my thoughts with you, comments?

Greetings,

Martin

And here the code (188 lines are hopefully ok for a mailinglist?)

cimport c_numpy
import numpy

cdef extern from stdlib.h:
cdef void *malloc(int)
cdef void free(void *)

cdef extern from stdio.h:
cdef struct file:
pass
ctypedef file FILE
cdef FILE *fopen(char *, char *)
cdef int fclose(FILE *)

cdef extern from errno.h:
cdef int errno

cdef extern from png.h:
cdef struct jmp_buf:
pass
cdef int setjmp(jmp_buf)
cdef struct png_struct:
jmp_buf jmpbuf
ctypedef png_struct *png_structp

cdef struct png_info:
int width
int height
int bit_depth
int color_type
int channels
ctypedef png_info *png_infop

cdef struct png_text:
int compression
char *key
char *text
int text_length
ctypedef png_text *png_textp

cdef png_structp png_create_read_struct(char *, void *, void *, void*)
cdef void png_init_io(png_structp, FILE *)
cdef void png_set_sig_bytes(png_structp, int)
cdef png_infop png_create_info_struct(png_structp)
cdef void png_read_info(png_structp, png_infop)
cdef int png_set_interlace_handling(png_structp)
cdef void png_read_update_info(png_structp, png_infop)
cdef void png_read_image(png_structp, unsigned char **)
cdef void png_read_png(png_structp, png_infop, int, void *)
cdef void png_read_end(png_structp, png_infop)
cdef void png_destroy_read_struct(png_structp *, png_infop *, void *)
cdef png_structp png_create_write_struct(char *, void *, void *, void*)
cdef void png_set_IHDR(png_structp, png_infop, unsigned int,
unsigned int, int, int, int, int, int)
cdef void png_write_info(png_structp, png_infop)
cdef void png_write_image(png_structp, unsigned char **)
cdef void png_write_end(png_structp, png_infop)
cdef void png_destroy_write_struct(png_structp *,png_infop *)
cdef void png_set_swap(png_structp)
cdef void png_set_rows(png_structp, png_infop, unsigned char **)
cdef void png_write_png(png_structp, png_infop, int, void *)
cdef void png_set_text(png_structp, png_infop, png_textp, int)
cdef int PNG_COLOR_TYPE_GRAY
cdef int PNG_COLOR_TYPE_PALETTE
cdef int PNG_COLOR_TYPE_RGB
cdef int PNG_COLOR_TYPE_RGB_ALPHA
cdef int PNG_COLOR_TYPE_GRAY_ALPHA
cdef int PNG_INTERLACE_NONE
cdef int PNG_COMPRESSION_TYPE_BASE
cdef int PNG_FILTER_TYPE_BASE
cdef int PNG_TRANSFORM_IDENTITY
cdef int PNG_TRANSFORM_SWAP_ENDIAN
cdef char *PNG_LIBPNG_VER_STRING

def load(fn):
 load a PNG file into a NumPy array.

This function can read all kinds of PNG images in a senseful way:
Depending on the number of bits per pixel, we chose numpy.bool,
numpy.unit8 or numpy.uint16 as base type. For gray scale images or
pallettes, this returns a 2-D array, for everything else a 3-D array,
where the third dimension has a size of 2 for gray+alpha images, 3 for
RGB and 4 for RGBA images.  

cdef FILE *f
f = fopen(fn, rb)
if f == NULL:
raise IOError(errno)
cdef png_structp p
p = png_create_read_struct(PNG_LIBPNG_VER_STRING, NULL, NULL, NULL)
if setjmp(p.jmpbuf):
fclose(f)
raise RuntimeError(libpng reported error)
cdef png_infop i
i = png_create_info_struct(p)
png_init_io(p, f)
png_read_info(p, i)

if i.color_type == PNG_COLOR_TYPE_GRAY:
shape = (i.height, i.width)
elif i.color_type == PNG_COLOR_TYPE_GRAY_ALPHA:
shape = (i.height, i.width, 2)
elif i.color_type == PNG_COLOR_TYPE_PALETTE:
shape = (i.height, i.width)
elif i.color_type ==  PNG_COLOR_TYPE_RGB:

Re: [Numpy-discussion] saving and loading as PNG

2007-11-08 Thread Hans Meine
Am Donnerstag, 08. November 2007 16:25:57 schrieb Martin Teichmann:
 Some more thoughts:
 * Other implementations: There is other people who have done such a thing.
 the PIL knows how to read and write PNG, but only 8 bit. The same holds
 for matplotlib.

Our VIGRA imaging library can read and save PNGs with 8/16 bit data, as well 
as several kinds of TIFF files (including float data).  We are currently 
trying to re-model our python bindings (which have not yet been officially 
released) using NumPy containers for the images.

Unfortunately, we seem to get a performance penalty by the common convention 
that the order of the indices is x,y, e.g. colorImage[x,y] == [r,g,b].  See 
the other current thread.

-- 
Ciao, /  /
 /--/
/  / ANS
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy 1.04 numpy.test() hang

2007-11-08 Thread Geoffrey Zhu
Good morning.

I just installed the Windows binary of numpy 1.04. When I ran
numpy.test() in IDLE (the Python shell that comes with Python), the
program hang (or at least is running for half an hour). I am using
Windows XP, duel core intel CPU.

Does anyone know what is going on?

Thanks,
Geoffrey
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread David Cournapeau
On Nov 9, 2007 1:55 AM, Hans Meine [EMAIL PROTECTED] wrote:
 Am Donnerstag, 08. November 2007 17:31:40 schrieb David Cournapeau:
  This is because the current implementation for at least some of the
  operations you are talking about are using PyArray_GenericReduce and
  other similar functions, which are really high level (they use python
  callable, etc..). This is easier, because you don't have to care about
  anything (type, etc...), but this means that python runtime is
  handling all the thing.

 I suspected that after your last post, but that's really bad for pointwise
 operations on a contiguous, aligned array.  A simple transpose() should
 really not make any difference here.
It should not, but in practise, it is not so easy to do. AFAIK, even
matlab has the same problem, with less difference, though. And they
have much more ressources than numpy. Not to say that we cannot do
better, but this takes time.

  Instead, you should use a pure C
  implementation (by pure C, I mean a C function totally independant of
  python, only dealing with standard C types). This would already lead a
  significant performance leap.

 AFAICS, it would be much more elegant and easier to implement this using C++
 templates.  We have a lot of experience with such a design from our VIGRA
 library ( http://kogs-www.informatik.uni-hamburg.de/~koethe/vigra/ ), which
 is an imaging library based on the STL concepts (and some necessary and
 convenient extensions for higher-dimensional arrays and a more flexible API).

 I am not very keen on writing hundreds of lines of C code for things that can
 easily be handled with C++ functors.  But I don't think that I am the first
 to propose this, and I know that C has some advantages (faster compilation;
 are there more? ;-) )
The advantages of C over C++ are much more than compilation speed: it
is actually portable, is simple, and easily callable from other
languages, 3 significant points that C++ does not meet at all.
Generally, I don't see much benefit for using low level C++ in
numerical code ; I consider most numerical-based container a total
failure (the fact that after 20 years, there is still nothing close to
a standard for even a matrix concept in C++ is for me quite striking).
I think C++ 's aspect which would be more useful in numpy is RAII (I
must confess that I personnally don't like C++ very much, in
particular when template are involved). This is only my opinion, I
don't know the opinion on other people on this.


 Yes, I think it does.  It probably depends on the sizes of the segments
 though.  If you have a multi-segment box-sub-range of a large dataset (3D
 volume or even only 2D), processing each contiguous row (column/...) at
 once within the inner loop definitely makes a difference.  I.e. as long as
 one dimension is not strided (and the data's extent in this dimension is not
 too small), it should be handled in the inner loop.  The other loops
 probably don't make a big difference.

If you have contiguous segments in subranges, then this is already
handled through the ufunc mechanism, but I don't know anything about
their actual implementation. Other people much more knowledgable than
me can give you more info here.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread David Cournapeau
On Nov 9, 2007 12:50 AM, Hans Meine [EMAIL PROTECTED] wrote:
 Am Donnerstag, 08. November 2007 16:37:06 schrieb David Cournapeau:
  The problem is not F vs C storage: for element-wise operation, it does
  not matter at all; you just apply the same function
  (perform_operation) over and over on every element of the array. The
  order does not matter at all.

 Yet Fortran order leads to several times slower operations, see the figures in
 my original post. :-(
This is because the current implementation for at least some of the
operations you are talking about are using PyArray_GenericReduce and
other similar functions, which are really high level (they use python
callable, etc..). This is easier, because you don't have to care about
anything (type, etc...), but this means that python runtime is
handling all the thing. Instead, you should use a pure C
implementation (by pure C, I mean a C function totally independant of
python, only dealing with standard C types). This would already lead a
significant performance leap.

Once you do that, you should not see difference between F and C
storage, normally.

  But what if you have segmented buffers, non aligned, etc... arrays ?

 The code I posted should deal with it - by sorting the indices by decreasing
 stride, I simply ensure that all (source and target) segments are traversed
 in order of increasing memory addresses.  It does not affect segments or
 alignment.
If you have segmented addresses, I don't think the ordering matters
much anymore, for memory access, no ? For example,

  All this has to be taken care of,

 Right - my perform_operation(aprime) step would need to apply the operation
 on every memory segment.

  and this means handling reference
  count and other things which are always delicate to handle well...

 I am not sure that I understand where refcounting comes into play here.

  Or
  you just use the current situation, which let python handle it
  (through PyObject_Call and a callable, at a C level).

 I need to look at the code to see what you mean here.  Probably, I have a
 wrong picture of where the slowness comes from (I thought that the order of
 the loops was wrong).

   As I wrote above, I don't think this is good.  A fortran-order-contiguous
   array is still contiguous, and not inferior in any way to C-order arrays.
   So I actually expect copy() to return an array of unchanged order.
 
  Maybe this behaviour was kept for compatibility with numarray ? If you
  look at the docstring, it is said that copy may not return the same
  order than its input. It kind of make sense to me: C order is the
  default of many numpy operations.

 That is very sad, because Fortran order is much more natural for handling
 images, where you're absolutely used to index with [x,y], and x being the
 faster-changing index in memory.


 --
 Ciao, /  /
  /--/
 /  / ANS
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread David Cournapeau
On Nov 9, 2007 1:31 AM, David Cournapeau [EMAIL PROTECTED] wrote:
 On Nov 9, 2007 12:50 AM, Hans Meine [EMAIL PROTECTED] wrote:
  Am Donnerstag, 08. November 2007 16:37:06 schrieb David Cournapeau:
   The problem is not F vs C storage: for element-wise operation, it does
   not matter at all; you just apply the same function
   (perform_operation) over and over on every element of the array. The
   order does not matter at all.
 
  Yet Fortran order leads to several times slower operations, see the figures 
  in
  my original post. :-(
 This is because the current implementation for at least some of the
 operations you are talking about are using PyArray_GenericReduce and
 other similar functions, which are really high level (they use python
 callable, etc..). This is easier, because you don't have to care about
 anything (type, etc...), but this means that python runtime is
 handling all the thing. Instead, you should use a pure C
 implementation (by pure C, I mean a C function totally independant of
 python, only dealing with standard C types). This would already lead a
 significant performance leap.

 Once you do that, you should not see difference between F and C
 storage, normally.
 
   But what if you have segmented buffers, non aligned, etc... arrays ?
 
  The code I posted should deal with it - by sorting the indices by decreasing
  stride, I simply ensure that all (source and target) segments are traversed
  in order of increasing memory addresses.  It does not affect segments or
  alignment.
 If you have segmented addresses, I don't think the ordering matters
 much anymore, for memory access, no ? For example,

 
   All this has to be taken care of,
 
  Right - my perform_operation(aprime) step would need to apply the 
  operation
  on every memory segment.
 
   and this means handling reference
   count and other things which are always delicate to handle well...
 
  I am not sure that I understand where refcounting comes into play here.
 
   Or
   you just use the current situation, which let python handle it
   (through PyObject_Call and a callable, at a C level).
 
  I need to look at the code to see what you mean here.  Probably, I have a
  wrong picture of where the slowness comes from (I thought that the order of
  the loops was wrong).
 
As I wrote above, I don't think this is good.  A 
fortran-order-contiguous
array is still contiguous, and not inferior in any way to C-order 
arrays.
So I actually expect copy() to return an array of unchanged order.
  
   Maybe this behaviour was kept for compatibility with numarray ? If you
   look at the docstring, it is said that copy may not return the same
   order than its input. It kind of make sense to me: C order is the
   default of many numpy operations.
 
  That is very sad, because Fortran order is much more natural for handling
  images, where you're absolutely used to index with [x,y], and x being the
  faster-changing index in memory.
 
 
  --
  Ciao, /  /
   /--/
  /  / ANS
  ___
  Numpy-discussion mailing list
  Numpy-discussion@scipy.org
  http://projects.scipy.org/mailman/listinfo/numpy-discussion
 

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread Christopher Barker
This discussion makes me wonder if the basic element-wise operations 
could (should?) be special cased for contiguous arrays, reducing them to 
  simple pointer incrementing from the start to the finish of the data 
block. The same code would work for C and Fortran order arrays, and be 
pretty simple.

This would address Hans' issue, no?

It's a special case but a common one.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread Hans Meine
Am Donnerstag, 08. November 2007 17:31:40 schrieb David Cournapeau:
 This is because the current implementation for at least some of the
 operations you are talking about are using PyArray_GenericReduce and
 other similar functions, which are really high level (they use python
 callable, etc..). This is easier, because you don't have to care about
 anything (type, etc...), but this means that python runtime is
 handling all the thing.

I suspected that after your last post, but that's really bad for pointwise 
operations on a contiguous, aligned array.  A simple transpose() should 
really not make any difference here.

 Instead, you should use a pure C 
 implementation (by pure C, I mean a C function totally independant of
 python, only dealing with standard C types). This would already lead a
 significant performance leap.

AFAICS, it would be much more elegant and easier to implement this using C++ 
templates.  We have a lot of experience with such a design from our VIGRA 
library ( http://kogs-www.informatik.uni-hamburg.de/~koethe/vigra/ ), which 
is an imaging library based on the STL concepts (and some necessary and 
convenient extensions for higher-dimensional arrays and a more flexible API).

I am not very keen on writing hundreds of lines of C code for things that can 
easily be handled with C++ functors.  But I don't think that I am the first 
to propose this, and I know that C has some advantages (faster compilation; 
are there more? ;-) ) - what is the opinion on this in the SciPy community?

 If you have segmented addresses, I don't think the ordering matters
 much anymore, for memory access, no ?

Yes, I think it does.  It probably depends on the sizes of the segments 
though.  If you have a multi-segment box-sub-range of a large dataset (3D 
volume or even only 2D), processing each contiguous row (column/...) at 
once within the inner loop definitely makes a difference.  I.e. as long as 
one dimension is not strided (and the data's extent in this dimension is not 
too small), it should be handled in the inner loop.  The other loops  
probably don't make a big difference.

-- 
Ciao, /  /
 /--/
/  / ANS
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Warning or error on conversion from complex to float.

2007-11-08 Thread Michael McNeil Forbes
Hi,

Is it possible or easy to add a warning and/or error when array  
assignments are made that lose information?  I just got caught with  
the following type of code:

def f():
 return numpy.array([1j,2.0])

x = numpy.empty((2,),dtype=float)
x[:] = f()

I am pre-allocating arrays for speed, but made a change to f()  
resulting in complex results.  My code now silently ignores the  
imaginary part.  It would have been very helpful if the assignment  
check to make sure that the conversion was not going to lose  
information (presumably with an assert so that performance is not  
affected outside of debugging).

Any simple suggestions on how to avoid this problem in the future?   
How hard would it be to add such a check in numpy?

Thanks,
Michael.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.04 numpy.test() hang

2007-11-08 Thread Robert Kern
Geoffrey Zhu wrote:
 Good morning.
 
 I just installed the Windows binary of numpy 1.04. When I ran
 numpy.test() in IDLE (the Python shell that comes with Python), the
 program hang (or at least is running for half an hour). I am using
 Windows XP, duel core intel CPU.
 
 Does anyone know what is going on?

No. Run numpy.test(verbosity=2) so it will print out each test name before
running the test. Then we might have some idea about where the hang is coming 
from.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread David Cournapeau
On Nov 9, 2007 3:28 AM, Christopher Barker [EMAIL PROTECTED] wrote:
 This discussion makes me wonder if the basic element-wise operations
 could (should?) be special cased for contiguous arrays, reducing them to
   simple pointer incrementing from the start to the finish of the data
 block.
I don't see why it could not. I think that most of the implementation
could be common to most functions, because they share a lot (handling
out argument, axis argument, and sometimes dtype argument). The only
different part would be the actual compuation by a C function, which
is trivial as you pointed.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.04 numpy.test() hang

2007-11-08 Thread Nils Wagner
On Thu, 08 Nov 2007 12:12:42 -0600
  Robert Kern [EMAIL PROTECTED] wrote:
 Geoffrey Zhu wrote:
 Good morning.
 
 I just installed the Windows binary of numpy 1.04. When 
I ran
 numpy.test() in IDLE (the Python shell that comes with 
Python), the
 program hang (or at least is running for half an hour). 
I am using
 Windows XP, duel core intel CPU.
 
 Does anyone know what is going on?
 
 No. Run numpy.test(verbosity=2) so it will print out 
each test name before
 running the test. Then we might have some idea about 
where the hang is coming from.
 
 -- 
 Robert Kern
 
 I have come to believe that the whole world is an 
enigma, a harmless enigma
 that is made terrible by our own mad attempt to 
interpret it as though it had
 an underlying truth.
  -- Umberto Eco
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion
  
I can confirm the problem on an AMD x86_64 with 
python2.5.1.
The test hangs at check_cdouble 
(numpy.tests.test_linalg.test_det)

Nils
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.04 numpy.test() hang

2007-11-08 Thread Geoffrey Zhu
On Nov 8, 2007 12:12 PM, Robert Kern [EMAIL PROTECTED] wrote:

 Geoffrey Zhu wrote:
  Good morning.
 
  I just installed the Windows binary of numpy 1.04. When I ran
  numpy.test() in IDLE (the Python shell that comes with Python), the
  program hang (or at least is running for half an hour). I am using
  Windows XP, duel core intel CPU.
 
  Does anyone know what is going on?

 No. Run numpy.test(verbosity=2) so it will print out each test name before
 running the test. Then we might have some idea about where the hang is coming 
 from.

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth.
  -- Umberto Eco
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


Thanks for the hint, Robert.

It hangs on  check_cdouble (numpy.tests.test_linalg.test_det).


Also testip_not_allclose() had three warnings. I guess that's probably okay.

testip_not_allclose (numpy.core.tests.test_numeric.test_allclose_inf) ...
Warning: invalid value encountered in absolute
Warning: invalid value encountered in absolute
Warning: invalid value encountered in less_equal
ok

Thanks,
Geoffrey
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warning or error on conversion from complex to float.

2007-11-08 Thread Charles R Harris
On Nov 8, 2007 11:32 AM, Michael McNeil Forbes [EMAIL PROTECTED]
wrote:

 Hi,

 Is it possible or easy to add a warning and/or error when array
 assignments are made that lose information?  I just got caught with
 the following type of code:

 def f():
 return numpy.array([1j,2.0])

 x = numpy.empty((2,),dtype=float)
 x[:] = f()

 I am pre-allocating arrays for speed, but made a change to f()
 resulting in complex results.  My code now silently ignores the
 imaginary part.  It would have been very helpful if the assignment
 check to make sure that the conversion was not going to lose
 information (presumably with an assert so that performance is not
 affected outside of debugging).

 Any simple suggestions on how to avoid this problem in the future?
 How hard would it be to add such a check in numpy?


This seems to be a generic problem.

In [1]: a = empty((10,), dtype=float32)

In [2]: a[:] = 10.0

In [3]: a[:] = float32(1)

In [4]: b = empty((10,), dtype=float64)

In [5]: a[:] = b[:]

In the case of the scalar assignments downcasting is a convenience because
having to explicitly use the correct data type in an assignment gets to be a
hassle when the type isn't native to Python. It might have been more correct
to require matching data types, but convenience won out in that case. That
said, it might be useful to warn on such an unexpected conversion as complex
to float, and maybe for float to integer conversions also.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread Travis E. Oliphant
Hans Meine wrote:
 Hi!

 I wonder why simple elementwise operations like a * 2 or a + 1 are not 
 performed in order of increasing memory addresses in order to exploit CPU 
 caches etc.
C-order is special in NumPy due to the history.  I agree that it 
doesn't need to be and we have taken significant steps in that 
direction.   Right now, the fundamental problem is probably due to the 
fact that the output arrays are being created as C-order arrays when the 
input is a Fortran order array.  Once that is changed then we can cause 
Fortran-order inputs to use the simplest path through the code.

Fortran order arrays can be preserved but it takes a little extra work 
because backward compatible expectations had to be met.  See for example 
the order argument to the copy method of arrays.

  - as it is now, their speed drops by a factor of around 3 simply 
 by transpose()ing.  Similarly (but even less logical), copy() and even the 
 constructor are affected (yes, I understand that copy() creates contiguous 
 arrays, but shouldn't it respect/retain the order nevertheless?):
   
As mentioned, it can preserve order with the 'order' argument

a.copy('A')

 ### constructor ###
 In [89]: %timeit -r 10 -n 100 numpy.ndarray((3,3,3))
 100 loops, best of 10: 1.19 s per loop

 In [90]: %timeit -r 10 -n 100 numpy.ndarray((3,3,3), order=f)
 100 loops, best of 10: 2.19 s per loop

   

I bet what you are seeing here is simply the overhead of processing the 
order argument. 

Try the first one with order='c'  to see what I mean.


 ### copy 3x3x3 array ###
 In [85]: a = numpy.ndarray((3,3,3))

 In [86]: %timeit -r 10 a.copy()
 100 loops, best of 10: 1.14 s per loop

 In [87]: a = numpy.ndarray((3,3,3), order=f)

 In [88]: %timeit -r 10 -n 100 a.copy()
 100 loops, best of 10: 3.39 s per loop
   

Use the 'a' argument to allow copying in fortran order.
 ### copy 256x256x256 array ###
 In [74]: a = numpy.ndarray((256,256,256))

 In [75]: %timeit -r 10 a.copy()
 10 loops, best of 10: 119 ms per loop

 In [76]: a = numpy.ndarray((256,256,256), order=f)

 In [77]: %timeit -r 10 a.copy()
 10 loops, best of 10: 274 ms per loop
   
Same thing here.   Nobody is opposed to having faster code as long as we 
don't in the process break code bases.   There is also the matter of 
implementation

-Travis O.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Improving numpy.interp

2007-11-08 Thread LB
I often need to make a linear interpolation for a single scalar value
but this
is not very simple with numpy.interp :
 import numpy as n
 n.__version__
'1.0.5.dev4420'
 xp = n.arange(10)
 yp = 2.5 + xp**2 -x
 x = 3.2
 n.interp(x, xp, yp)
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: object of too small depth for desired array

So I use the following trick (which is really ugly) :
 n.interp([x], xp, yp)[0]
7.7011

Did I miss an obvious way to do it ?

If not, I'd be tempted to patch interp to let it accept scalar values
as first
argument as follow :
 def interp(x, *args, **kwds):
... if type(x) in (float, int):
... return n.interp([x], *args, **kwds).item()
... else :
... return n.interp(x, *args, **kwds)
...

 interp(x, xp, yp)
7.7011
 interp([x,2*x], xp, yp)
array([  7.7,  38.5])

--
LB

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread Hans Meine
On Donnerstag 08 November 2007, Christopher Barker wrote:
 This discussion makes me wonder if the basic element-wise operations
 could (should?) be special cased for contiguous arrays, reducing them to
   simple pointer incrementing from the start to the finish of the data
 block. The same code would work for C and Fortran order arrays, and be
 pretty simple.

This is exactly what I am thinking and talking about.  I did not want to imply 
that NumPy is bad because it does not yet do so - it's an impressive piece of 
well-designed software AFAICS - but I do think that this is a really common 
and important use case that should not stay as inefficiently handled as it is 
now.

Ciao, /  /.o.
 /--/ ..o
/  / ANS  ooo


signature.asc
Description: This is a digitally signed message part.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread Travis E. Oliphant
Christopher Barker wrote:
 This discussion makes me wonder if the basic element-wise operations 
 could (should?) be special cased for contiguous arrays, reducing them to 
   simple pointer incrementing from the start to the finish of the data 
 block. The same code would work for C and Fortran order arrays, and be 
 pretty simple.

 This would address Hans' issue, no?

 It's a special case but a common one.

   
There is a special case for this already.  It's just that the specific 
operations he is addressing requires creation of output arrays that by 
default are in C-order.This would need to change in order to take 
advantage of the special case.

-Travis O.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread David Cournapeau
Travis E. Oliphant wrote:
 Christopher Barker wrote:
 This discussion makes me wonder if the basic element-wise operations 
 could (should?) be special cased for contiguous arrays, reducing them to 
   simple pointer incrementing from the start to the finish of the data 
 block. The same code would work for C and Fortran order arrays, and be 
 pretty simple.

 This would address Hans' issue, no?

 It's a special case but a common one.

   
 There is a special case for this already.  It's just that the specific 
 operations he is addressing requires creation of output arrays that by 
 default are in C-order.This would need to change in order to take 
 advantage of the special case.
For copy and array creation, I understand this, but for element-wise 
operations (mean, min, and max), this is not enough to explain the 
difference, no ? For example, I can understand a 50 % or 100 % time 
increase for simple operations (by simple, I mean one element operation 
taking only a few CPU cycles), because of copies, but a 5 fold time 
increase seems too big, no (mayb a cache problem, though) ? Also, the 
fact that mean is slower than min/max for both cases (F vs C) seems a 
bit counterintuitive (maybe cache effects are involved somehow ?).

Again, I see huge differences between my Xeon PIV @ 3.2 Ghz and my 
pentium M @ 1.2 Ghz for those operations: pentium M gives more 
intuitive results (and is almost as fast, and sometimes even faster 
than my Xeon for arrays which can stay in cache).

cheers,

David

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread Travis E. Oliphant
David Cournapeau wrote:
 Travis E. Oliphant wrote:
   
 I wasn't talking about the min, mean, and max methods specifically.  
 These are all implemented with the reduce method of a ufunc. 
   
 
 Ah, my mistake, I wrongly understood only some of them were implemented 
 through ufunc. But the ufunc machinery has nothing to do with output 
 array output, right ? So is the 5 time speed increase mainly happening 
 inside ufunc ?
   

I'm not sure.   Ufuncs don't do everything in NumPy as you have noted.  
There are less well publicized (and less structured) array functions for 
each data-type that also implement things.   Not all of these places 
have optimized paths, but there was some thought given to it in 
several places of the code.

One area that could be improved that may be exposed in some of the 
timing tests, is that iterator-based algorithms  always use a 
C-contiguous iterator.  There is no Fortran-contiguous iterator 
(although there could be).

-Travis O.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread Anne Archibald
On 08/11/2007, David Cournapeau [EMAIL PROTECTED] wrote:

 For copy and array creation, I understand this, but for element-wise
 operations (mean, min, and max), this is not enough to explain the
 difference, no ? For example, I can understand a 50 % or 100 % time
 increase for simple operations (by simple, I mean one element operation
 taking only a few CPU cycles), because of copies, but a 5 fold time
 increase seems too big, no (mayb a cache problem, though) ? Also, the
 fact that mean is slower than min/max for both cases (F vs C) seems a
 bit counterintuitive (maybe cache effects are involved somehow ?).

I have no doubt at all that cache effects are involved: for an int
array, each data element is four bytes, but typical CPUs need to load
64 bytes at a time into cache. If you read (or write) the rest of
those bytes in the next iterations through the loop, the (large) cost
of a memory read is amortized. If you jump to the next row of the
array, some large number of bytes away, those 64 bytes basically need
to be purged to make room for another 64 bytes, of which you'll use 4.
If you're reading from a FORTRAN-order array and writing to a C-order
one, there's no way around doing this on one end or another: you're
effectively doing a transpose, which is pretty much always expensive.

Is there any reason not to let ufuncs pick whichever order for
newly-allocated arrays they want? The natural choice would be the same
as the bigger input array, if it's a contiguous block of memory
(whether or not the contiguous flags are set). Failing that, the same
as the other input array (if any); failing that, C order's as good a
default as any. How difficult would this be to implement?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread David Cournapeau
Anne Archibald wrote:
 On 08/11/2007, David Cournapeau [EMAIL PROTECTED] wrote:

 For copy and array creation, I understand this, but for element-wise
 operations (mean, min, and max), this is not enough to explain the
 difference, no ? For example, I can understand a 50 % or 100 % time
 increase for simple operations (by simple, I mean one element operation
 taking only a few CPU cycles), because of copies, but a 5 fold time
 increase seems too big, no (mayb a cache problem, though) ? Also, the
 fact that mean is slower than min/max for both cases (F vs C) seems a
 bit counterintuitive (maybe cache effects are involved somehow ?).

 I have no doubt at all that cache effects are involved: for an int
 array, each data element is four bytes, but typical CPUs need to load
 64 bytes at a time into cache. If you read (or write) the rest of
 those bytes in the next iterations through the loop, the (large) cost
 of a memory read is amortized. If you jump to the next row of the
 array, some large number of bytes away, those 64 bytes basically need
 to be purged to make room for another 64 bytes, of which you'll use 4.
 If you're reading from a FORTRAN-order array and writing to a C-order
 one, there's no way around doing this on one end or another: you're
 effectively doing a transpose, which is pretty much always expensive.
This is obviously part of the picture, and there is indeed no way around 
the simple fact that on sequential memory access is extremely slow on 
modern hardware. The fact that the array iterator does not support (yet) 
the Fortran order would largely explain this (I wrongly assumed the 
iterator already did that).

 Is there any reason not to let ufuncs pick whichever order for
 newly-allocated arrays they want? The natural choice would be the same
 as the bigger input array, if it's a contiguous block of memory
 (whether or not the contiguous flags are set). Failing that, the same
 as the other input array (if any); failing that, C order's as good a
 default as any. How difficult would this be to implement?
I think part of the problem is that it is difficult to know which is 
faster a priori (is copying multi-segmented parts in a single buffer 
faster for processing faster than direct processing ?). As mentioned 
previously, on the same platform (same OS/compiler combination), there 
are already huge differences between two CPU (pentium4 vs pentium M, the 
former being noticeably pig-slow when SSE FPU is not used).

Does ufunc use buffer for segmented memory, or do they only use them for 
misbehaved arrays ? Also, from my very limited experience, I found array 
iterators to be significantly slower than simple array indexing on 
contiguous, single segment arrays. Do ufunc always use array iterator, 
or do they use simple array indexing when possible ? When I implemented 
the fast clip function (which does not use ufunc), I noticed that the 
following matter:
- avoiding unnecessary copies.
- quickly determine which cases can be optimized wrt to the 
operations you are using (memmove vs memcpy, etc...)
- fast memory copying (numpy do not have those yet: by using 
optimized memory copying, you can often gain a 50 % speed increase on 
recent archs; also, having information about alignment could help, too, 
and we go back to aligned allocators discussion :) ).

The problem is to find a good API to put those optimizations at one 
place. I unfortunately do not know much about ufunc, maybe they already 
implement most of those.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion