Re: [Numpy-discussion] Solving a memory leak in a numpy extension; PyArray_ContiguousFromObject

2009-04-20 Thread Dan S
Thanks all for responses. Continuation below:

2009/4/18 Charles R Harris charlesr.har...@gmail.com:


 On Fri, Apr 17, 2009 at 9:25 AM, Dan S dan.s.towell+nu...@gmail.com wrote:

 Hi -

 I have written a numpy extension which works fine but has a memory
 leak. It takes a single array argument and returns a single scalar.
 After reducing the code down in order to chase the problem, I have the
 following:

 static PyObject * kdpee_pycall(PyObject *self, PyObject *args)
 {
        PyObject *input;
        PyArrayObject *array;
        int n, numdims;

        if (!PyArg_ParseTuple(args, O, input))
                return NULL;

        // Ensure we have contiguous, 2D, floating-point data:
        array = (PyArrayObject*) PyArray_ContiguousFromObject(input,
 PyArray_DOUBLE, 2, 2);

        if(array==NULL){
                printf(kdpee_py: nullness!\n);
                return NULL;
        }
        PyArray_XDECREF(array);
        Py_DECREF(array); // destroy the contig array
        Py_DECREF(input); // is this needed? doc says no, but it seems to fix


 Shouldn't be, but there might be a bug somewhere which causes the reference
 count of input to be double incremented. Does the reference count in the
 test script increase without this line?

Yes - if I comment out Py_DECREF(input), then sys.getrefcount(a) goes
from 2, to 5002, to 10002, every time I run that little 5000-fold test
iteration.

(Py_DECREF(input) is not suggested in the docs - I tried it in
desperation and for some reason it seems to reduce the problem, since
it gets rid of the refleak, although it doesn't solve the memleak.)


        return PyFloat_FromDouble(3.1415927); // temporary
 }


 The test code is

      from numpy import *
      from kdpee import *
      import sys
      a = array( [[1,2,3,4,5,1,2,3,4,5], [6,7,8,9,0,6,7,8,9,0]])
      sys.getrefcount(a)
      for i in range(5000): tmp = kdpee(a)

      sys.getrefcount(a)

 Every time I run this code I get 2 back from both calls to
 sys.getrefcount(a), but I'm still getting a memory leak of about 0.4
 MB. What am I doing wrong?

 So about 80 bytes/iteration? How accurate is that .4 MB? does it change with
 the size of the input array?

OK, pinning it down more precisely I get around 86--90 bytes per
iteration (using ps, which gives me Kb resolution). It does not change
if I double or quadruple the size of the input array.

Today I tried using heapy: using the heap() and heapu() methods, I
find no evidence of anything suspicious - there is no garbage left
lying on the heap of python objects. So it's presumably something
deeper. But as you can see, my C code doesn't perform any malloc() or
suchlike, so I'm stumped.

I'd be grateful for any further thoughts.

Thanks
Dan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Solving a memory leak in a numpy extension; PyArray_ContiguousFromObject

2009-04-20 Thread V. Armando Solé
Dan S wrote:
 But as you can see, my C code doesn't perform any malloc() or
 suchlike, so I'm stumped.

 I'd be grateful for any further thoughts
Could it be your memory leak is in:

return PyFloat_FromDouble(3.1415927); // temporary


You are creating a python float object from something. What if you  
return Py_None instead of your float?:

Py_INCREF(Py_None);
return PyNone;

I do not know if it will help you but I guess it falls in the any 
further thought category  :-)

Best regards,

Armando

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Solving a memory leak in a numpy extension; PyArray_ContiguousFromObject

2009-04-20 Thread Dan S
2009/4/20 V. Armando Solé s...@esrf.fr:
 Dan S wrote:
 But as you can see, my C code doesn't perform any malloc() or
 suchlike, so I'm stumped.

 I'd be grateful for any further thoughts
 Could it be your memory leak is in:

 return PyFloat_FromDouble(3.1415927); // temporary


 You are creating a python float object from something. What if you
 return Py_None instead of your float?:

 Py_INCREF(Py_None);
 return PyNone;

 I do not know if it will help you but I guess it falls in the any
 further thought category  :-)

Thanks :) but that doesn't alter things. In fact:

I found the real source of the problem. Namely: I was doing

  #include Numeric/arrayobject.h

when I should have been doing

  #include numpy/arrayobject.h

Therefore I was compiling against the old deprecated version of the
lib, which presumably had a memory leak inside itself which got fixed
in the intervening years. Using the up-to-date lib, no leak.

Thanks all, anyway!

Dan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-20 Thread David Cournapeau
Hi,

For quite a long time I have been bothered by the very large files
needed for python extensions. In particular for numpy.core, which
consists in a few files which are ~ 1 Mb, I find this a pretty high
barrier of entry for newcomers, and it has quite a big impact on the
code organization. I think I have found a way to split things on common
platforms (this includes at least windows, mac os x, linux and solaris),
without impacting other  potentially less capable platforms, or static
linking of numpy.

Assuming my idea is technically sound and that I can demonstrate it
works on say Linux without impacting other platforms (see example
below), would that be considered useful ?

cheers,

David

Technical details
==

The rationale for doing things as they are is a C limitation related
to symbol visibility being limited to file scope, i.e. if you want to
share a function into several files without making it public in the
binary, you have to tag the function static, and include all .c files
which use this function into one giant .c file. That's how we do it in
numpy. Many binary format (elf, coff and Mach-O) have a mechanism to
limit the symbol visibility, so that we can explicitly set the functions
we do want to export. With a couple of defines, we could either include
every files and tag the implementation functions as static, or link
every file together and limit symbol visibility with some linker magic.

Example
---

I use the spam example from the official python doc, with one function
PySpam_System which is exported in a C API, and the actual
implementation is _pyspam_system.

* spammodule.c: define the interface available from python interpreter:

#include
Python.h  

#include
stdio.h   


 

#define
SPAM_MODULE 
 

#include
spammodule.h  

#include
spammodule_imp.h  


/* if we don't know how to deal with symbol visibility on the platform,
just include everything in one file */
#ifdef
SYMBOL_SCRIPT_UNSUPPORTED   
  

#include
spammodule_imp.c  

#endif  
 


/* C API for spam module */

  

static int  
PySpam_System(const char *command)
{
_pyspam_implementation(command);
return 0;
}

* spammodule_imp.h: declares the implementation, should only be included
by spammodule.c and spammodule_imp.c which implements the actual function

#ifndef _IMP_H_
#define _IMP_H_

#ifndef SPAM_MODULE
#error this should not be included unless you really know what you are doing
#endif

#ifdef SYMBOL_SCRIPT_UNSUPPORTED
#define SPAM_PRIVATE static
#else
#define SPAM_PRIVATE
#endif

SPAM_PRIVATE int
_pyspam_implementation(const char *command);

#endif

For supported platforms (where SYMBOL_SCRIPT_UNSUPPORTED is not
defined), _pyspam_implementation would not be visible because we would
have a list of functions to export (only initspam in this case).

Advantages
--

This has several advantages on platforms where this is supported
- code more amenable: source code which are thousand of lines are
difficult to follow
- faster compilation times: in my experience, compilation time
doesn't scale linearly with the amount of code.
- compilation can be better parallelized
- changing one file does not force a whole multiarray/ufunc module
recompilation (which can be pretty long when you chase bugs in it)

Another advantage is related to namespace pollution. Since library
extensions are static libraries for now, any symbol frome those
libraries used by any extension is publicly available. For example, now
that multiarray.so uses the npy_math library, every symbol in npy_math
is in the public namespace. That's also true for every scipy extensions
(for example, _fftpack.so exports the whole dfftpack public API). If we
want to go further down the road of making core computational code
publicly available, I think we should improve this first.

Disadvantage


We need to code it. There are two parts:
- numpy.distutils support: I have already something working in for
linux. Once we have one platform working, adding others should not be a
problem
- changing the C code: we could at first splitting things in .c
files but still including everything, and then starting the conversion.



[Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread Ruben Salvador
Hi everybody!

First of all I should say I am a newbie with Python/Scipy. Have been
searching a little bit (google and lists) and haven't found a helpful
answer...so I'm posting.

I'm using Scipy/Numpy to do image wavelet transforms via the lifting scheme.
I grabbed some code implementing the transforms with Python lists (float
type). This code works perfectly, but slow for my needs (I'll be doing some
genetic algorithms to evolve coefficients of the filters and the forward and
inverse transform will be done many times). It's just implemented by looping
in the lists and making computations this way. Reconstructed image after
doing a forward and inverse transform is perfect, this is, original and
reconstructed images difference is 0.

With Scipy/Numpy float arrays slicing this code is much faster as you know.
But the reconstructed image is not perfect. The image difference maximum and
minimum values returns:
maximum difference = 3.5527136788e-15
minimum difference = -3.5527136788e-15

Is this behavior expected? Because it seems sooo weird to me. If expected,
anyway to override it?

I include some test code for you to reproduce. It's part of a transform over
a 8x8 2D signal (for simplicity). It's not complete (again for simplicity),
but enough to reproduce. It does part of a forward and inverse transform
both with lists and arrays, printing the differences (there is commented
code showing some plots with the results as I used when transforming real
images, but for the purpose, is enough with the return results I think).
Code is simple (maybe long but it's always the same). Instead of using the
faster array slicing as commented above, I am using here array looping, so
that the math code is exactly the same as in the case list.

This happens in the next three system/platforms.

* System 1 (laptop):
---
64 bit processor running Kubuntu 8.04 32 bits
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52
Numpy version: 1:1.0.4-6ubuntu3
Scipy version: 0.6.0-8ubuntu1

* System 2 (PC):
--
Windows Xp on 64 bit processor
Enthought Python distribution  (EPD Py25 v4.1.30101). This is a Python 2.5.2
with Numpy 1.1.1 and Scipy 0.6.0

* System 3 (same PC as 2):
--
Debian Lenny 64 bit on 64 bit processor
Not sure about versions here, but doesn't mind because behavior is prety
much the same in the 3 systems

Thanks everybody in advance for the help!

Rubén Salvador
PhD Student
Industrial Electronics Center
Universidad Politécnica de Madrid
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import numpy
import scipy
import pylab as plt


# The test shows precision issues with numpy floats against
# perfect behavior with python lists
#
# The computations are based on image wavelet transforms
# via the lifting scheme, wich allows computations in place

# Note that this is just one step of the transform, not the
# whole of it (just first lifting stage of row processing), but
# enough to show weird behavior of floating point operations
#
# Inverse transformed image should be exact to the original image



# Test with numpy arrays

marr = [[ 1,  5,  6,  2,  9,  7,  1,  4],[3, 6, 2, 7, 4, 8, 5, 9],[ 8, 2,  9, 1, 3, 4, 1, 5],[4, 1, 5, 8, 3, 1, 4, 7],[6, 2, 1, 3, 8, 2, 4, 3],[8, 5, 9, 5, 4, 2, 1, 5],[4, 8, 5, 9, 6, 3, 2, 7],[5, 6, 5, 1, 8, 2, 9, 3]]
m = scipy.array(marr, dtype=scipy.float64)
morig = m.copy()

##
# Forward transform (9/7)
##
a1 = -1.586134342
a2 = -0.05298011854
k1 = 0.81289306611596146 # 1/1.230174104914
k2 = 0.6150870524572 # 1.230174104914/2
width = len(m[0])
height = len(m)

# transform
for col in range(width):
for row in range(1, height-1, 2):
m[row][col] += a1 * (m[row-1][col] + m[row+1][col]) 
m[height-1][col] += 2 * a1 * m[height-2][col]
for row in range(2, height, 2):
m[row][col] += a2 * (m[row-1][col] + m[row+1][col])
m[0][col] +=  2 * a2 * m[1][col]
temp_bank = scipy.zeros((height,width),scipy.float64)

# transpose/interleave 2D signal and scale
for row in range(height):
for col in range(width):
if row % 2 == 0: # even
temp_bank[col][row/2] = k1 * m[row][col]
else:# odd
temp_bank[col][row/2 + height/2] = k2 * m[row][col]
for row in range(width):
for col in range(height):
m[row][col] = temp_bank[row][col]

##
# Inverse transform (9/7)
##
w = len(m[0])
h = len(m)
a1 = 1.586134342
a2 = 0.05298011854
# Inverse scale coeffs:
k1 = 1.230174104914
k2 = 1.6257861322319229
s = m.copy()
temp_bank = scipy.zeros((height,width),scipy.float64)

# transpose/interleave 2D signal and scale back
for col in range(width/2):
for row in range(height):
temp_bank[col * 2][row] = k1 * s[row][col]
temp_bank[col * 2 + 1][row] = k2 * s[row][col + 

Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread David Cournapeau
Hi Ruben,

Ruben Salvador wrote:
 Hi everybody!

 First of all I should say I am a newbie with Python/Scipy. Have been
 searching a little bit (google and lists) and haven't found a helpful
 answer...so I'm posting.

 I'm using Scipy/Numpy to do image wavelet transforms via the lifting
 scheme. I grabbed some code implementing the transforms with Python
 lists (float type). This code works perfectly, but slow for my needs
 (I'll be doing some genetic algorithms to evolve coefficients of the
 filters and the forward and inverse transform will be done many
 times). It's just implemented by looping in the lists and making
 computations this way. Reconstructed image after doing a forward and
 inverse transform is perfect, this is, original and reconstructed
 images difference is 0.

 With Scipy/Numpy float arrays slicing this code is much faster as you
 know. But the reconstructed image is not perfect. The image difference
 maximum and minimum values returns:
 maximum difference = 3.5527136788e-15
 minimum difference = -3.5527136788e-15

 Is this behavior expected?

Yes, it is expected, it is inherent to how floating point works. By
default, the precision for floating point array is double precision, for
which, in normal settings, a = a + 1e-17.

 Because it seems sooo weird to me.

It shouldn't :) The usual answer is that you should read this:

http://docs.sun.com/app/docs/doc/800-7895

Floating point is a useful abstraction, but has some strange properties,
which do not matter much in most cases, but can catch you off guard.

 If expected, anyway to override it?

The only way to mitigate it is to use higher precision (depending on
your OS/CPU combination, the long double type can help), or using a type
with arbitrary precision. But in that later case, it will be much, much
slower (as many computations are done in software), and you get a pretty
limited subset of numpy features (enough to implement basic wavelet with
Haar bases, though).

Using extended precision:

import numpy as np
a = np.array([1, 2, 3], dtype=np.longdouble) # this gives you more precision

You can also use the decimal module for (almost) arbitrary precision:

from decimal import Decimal
import numpy as np
a = np.array([1, 2, 3], dtype=Decimal)

But again, please keep in mind that many operations normally available
cannot be done for arbitrary objects like Decimal instances. Generally,
numerical computation are done with a limited precision, and it is
rarely a problem if enough care s taken. If you need theoretically exact
results, then you need a symbolic computation package, which numpy alone
is not.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread David Cournapeau
David Cournapeau wrote:
 Hi Ruben,

 Ruben Salvador wrote:
   
 Hi everybody!

 First of all I should say I am a newbie with Python/Scipy. Have been
 searching a little bit (google and lists) and haven't found a helpful
 answer...so I'm posting.

 I'm using Scipy/Numpy to do image wavelet transforms via the lifting
 scheme. I grabbed some code implementing the transforms with Python
 lists (float type). This code works perfectly, but slow for my needs
 (I'll be doing some genetic algorithms to evolve coefficients of the
 filters and the forward and inverse transform will be done many
 times). It's just implemented by looping in the lists and making
 computations this way. Reconstructed image after doing a forward and
 inverse transform is perfect, this is, original and reconstructed
 images difference is 0.

 With Scipy/Numpy float arrays slicing this code is much faster as you
 know. But the reconstructed image is not perfect. The image difference
 maximum and minimum values returns:
 maximum difference = 3.5527136788e-15
 minimum difference = -3.5527136788e-15

 Is this behavior expected?
 

 Yes, it is expected, it is inherent to how floating point works. By
 default, the precision for floating point array is double precision, for
 which, in normal settings, a = a + 1e-17.

   
 Because it seems sooo weird to me.
 

 It shouldn't :) The usual answer is that you should read this:

 http://docs.sun.com/app/docs/doc/800-7895

 Floating point is a useful abstraction, but has some strange properties,
 which do not matter much in most cases, but can catch you off guard.

   
 If expected, anyway to override it?
 

 The only way to mitigate it is to use higher precision (depending on
 your OS/CPU combination, the long double type can help), or using a type
 with arbitrary precision.
^^^ this is inexact, I should have written the only way to avoid the
problem it so use arbitrary precision, and you can mitigate with higher
precision and/or better implementation. If double precision (the
default) is not enough, it is often because there is something wrong in
your implementation (compute exponential of big numbers instead of
working in the log domain, etc...)

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread Rob Clewley
David,

I'm confused about your reply. I don't think Ruben was only asking why
you'd ever get non-zero error after the forward and inverse transform,
but why his implementation using lists gives zero error but using
arrays he gets something of order 1e-15.

On Mon, Apr 20, 2009 at 9:47 AM, David Cournapeau
da...@ar.media.kyoto-u.ac.jp wrote:
 I'm using Scipy/Numpy to do image wavelet transforms via the lifting
 scheme. I grabbed some code implementing the transforms with Python
 lists (float type). This code works perfectly, but slow for my needs
 (I'll be doing some genetic algorithms to evolve coefficients of the
 filters and the forward and inverse transform will be done many
 times). It's just implemented by looping in the lists and making
 computations this way. Reconstructed image after doing a forward and
 inverse transform is perfect, this is, original and reconstructed
 images difference is 0.

 With Scipy/Numpy float arrays slicing this code is much faster as you
 know. But the reconstructed image is not perfect. The image difference
 maximum and minimum values returns:
 maximum difference = 3.5527136788e-15
 minimum difference = -3.5527136788e-15

 Is this behavior expected?

 Yes, it is expected, it is inherent to how floating point works. By
 default, the precision for floating point array is double precision, for
 which, in normal settings, a = a + 1e-17.

I don't think it's expected in this sense. The question is why the
exact same sequence of arithmetic ops on lists yields zero error but
on arrays yields 3.6e-15 error. This doesn't seem to be about lists
not showing full precision of the values, because the differences are
even zero when extracted from the lists.

 Because it seems sooo weird to me.

 It shouldn't :) The usual answer is that you should read this:

 http://docs.sun.com/app/docs/doc/800-7895

This doesn't help! This is a python question, methinks.

Best,
Rob
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] NaN as dictionary key?

2009-04-20 Thread Wes McKinney
I assume that, because NaN != NaN, even though both have the same hash value
(hash(NaN) == -32768), that Python treats any NaN double as a distinct key
in a dictionary.

In [76]: a = np.repeat(nan, 10)

In [77]: d = {}

In [78]: for i, v in enumerate(a):
   : d[v] = i
   :
   :

In [79]: d
Out[79]:
{nan: 0,
 nan: 1,
 nan: 6,
 nan: 4,
 nan: 3,
 nan: 9,
 nan: 7,
 nan: 2,
 nan: 8,
 nan: 5}

I'm not sure if this ever worked in a past version of NumPy, however, I have
code which does a group by value and occasionally in the real world those
values are NaN. Any ideas or a way around this problem?

Thanks,
Wes
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread David Cournapeau
Rob Clewley wrote:
 David,

 I'm confused about your reply. I don't think Ruben was only asking why
 you'd ever get non-zero error after the forward and inverse transform,
 but why his implementation using lists gives zero error but using
 arrays he gets something of order 1e-15.
   

That's more likely just an accident. Forward + inverse = id is the
surprising thing, actually. In any numerical package, if you do
ifft(fft(a)), you will not recover a exactly for any non trivial size.
For example, with floating point numbers, the order in which you do
operations matters, so:

a = 1e10
b = 1-20

c = a
c -= a + b

d = a
d -= a
d -= b

Will give you different values for d and c, even if you on paper,
those are exactly the same. For those reasons, it is virtually
impossible to have exactly the same values for two different
implementations of the same algorithm. As long as the difference is
small (if the reconstruction error falls in the 1e-15 range, it is
mostly likely the case), it should not matter,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-20 Thread Charles R Harris
Hi David

On Mon, Apr 20, 2009 at 6:51 AM, David Cournapeau 
da...@ar.media.kyoto-u.ac.jp wrote:

 Hi,

For quite a long time I have been bothered by the very large files
 needed for python extensions. In particular for numpy.core, which
 consists in a few files which are ~ 1 Mb, I find this a pretty high
 barrier of entry for newcomers, and it has quite a big impact on the
 code organization. I think I have found a way to split things on common
 platforms (this includes at least windows, mac os x, linux and solaris),
 without impacting other  potentially less capable platforms, or static
 linking of numpy.


There was a discussion of this a couple of years ago. I was in favor of many
small files maybe in subdirectories. Robert, IIRC, thought too many small
files could become confusing, so there is a fine line in there somewhere.  I
am generally in favor of breaking the files up into their functional
components and maybe rewriting some of the upper level interface files in
cython. But it does need some agreement and we should probably start by just
breaking up a few files. I don't have a problem with big files that are just
collections of small routines all of the same type, umath_loops.inc.src for
instance.



 Assuming my idea is technically sound and that I can demonstrate it
 works on say Linux without impacting other platforms (see example
 below), would that be considered useful ?


Definitly worth consideration.



 cheers,

 David

 Technical details
 ==

The rationale for doing things as they are is a C limitation related
 to symbol visibility being limited to file scope, i.e. if you want to
 share a function into several files without making it public in the
 binary, you have to tag the function static, and include all .c files
 which use this function into one giant .c file. That's how we do it in
 numpy. Many binary format (elf, coff and Mach-O) have a mechanism to
 limit the symbol visibility, so that we can explicitly set the functions
 we do want to export. With a couple of defines, we could either include
 every files and tag the implementation functions as static, or link
 every file together and limit symbol visibility with some linker magic.


Maybe just not worry about symbol visibility on other platforms. It is one
of those warts that only becomes apparent when you go looking for it. For
instance, the current *.so has some extraneous symbols but I don't hear
folks complaining.



 Example
 ---

 I use the spam example from the official python doc, with one function
 PySpam_System which is exported in a C API, and the actual
 implementation is _pyspam_system.

 * spammodule.c: define the interface available from python interpreter:

 #include
 Python.h

 #include
 stdio.h



 #define
 SPAM_MODULE

 #include
 spammodule.h

 #include
 spammodule_imp.h


 /* if we don't know how to deal with symbol visibility on the platform,
 just include everything in one file */
 #ifdef
 SYMBOL_SCRIPT_UNSUPPORTED

 #include
 spammodule_imp.c

 #endif


 /* C API for spam module */


 static int
 PySpam_System(const char *command)
 {
_pyspam_implementation(command);
return 0;
 }

 * spammodule_imp.h: declares the implementation, should only be included
 by spammodule.c and spammodule_imp.c which implements the actual function

 #ifndef _IMP_H_
 #define _IMP_H_

 #ifndef SPAM_MODULE
 #error this should not be included unless you really know what you are
 doing
 #endif

 #ifdef SYMBOL_SCRIPT_UNSUPPORTED
 #define SPAM_PRIVATE static
 #else
 #define SPAM_PRIVATE
 #endif

 SPAM_PRIVATE int
 _pyspam_implementation(const char *command);

 #endif

 For supported platforms (where SYMBOL_SCRIPT_UNSUPPORTED is not
 defined), _pyspam_implementation would not be visible because we would
 have a list of functions to export (only initspam in this case).

 Advantages
 --

 This has several advantages on platforms where this is supported
- code more amenable: source code which are thousand of lines are
 difficult to follow
- faster compilation times: in my experience, compilation time
 doesn't scale linearly with the amount of code.
- compilation can be better parallelized
- changing one file does not force a whole multiarray/ufunc module
 recompilation (which can be pretty long when you chase bugs in it)

 Another advantage is related to namespace pollution. Since library
 extensions are static libraries for now, any symbol frome those
 libraries used by any extension is publicly available. For example, now
 that multiarray.so uses the npy_math library, every symbol in npy_math
 is in the public namespace. That's also true for every scipy extensions
 (for example, _fftpack.so exports the whole dfftpack public API). If we
 want to go further down the road of making core computational code
 publicly available, I think we should improve this first.

 Disadvantage
 

 We need to code it. There are two parts:
- numpy.distutils support: I have 

Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread Rob Clewley
On Mon, Apr 20, 2009 at 10:48 AM, David Cournapeau
da...@ar.media.kyoto-u.ac.jp wrote:
 Rob Clewley wrote:
 David,

 I'm confused about your reply. I don't think Ruben was only asking why
 you'd ever get non-zero error after the forward and inverse transform,
 but why his implementation using lists gives zero error but using
 arrays he gets something of order 1e-15.


 That's more likely just an accident. Forward + inverse = id is the
 surprising thing, actually. In any numerical package, if you do
 ifft(fft(a)), you will not recover a exactly for any non trivial size.
 For example, with floating point numbers, the order in which you do
 operations matters, so:
SNIP ARITHMETIC
 Will give you different values for d and c, even if you on paper,
 those are exactly the same. For those reasons, it is virtually
 impossible to have exactly the same values for two different
 implementations of the same algorithm. As long as the difference is
 small (if the reconstruction error falls in the 1e-15 range, it is
 mostly likely the case), it should not matter,

I understand the numerical mathematics behind this very well but my
point is that his two algorithms appear to be identical (same
operations, same order), he simply uses lists in one and arrays in the
other. It's not like he used vectorization or other array-related
operations - he uses for loops in both cases. Of course I agree that
1e-15 error should be acceptable, but that's not the point. I think
there is legitimate curiosity in wondering why there is any difference
between using the two data types in exactly the same algorithm.

Maybe Ruben could re-code the example using a single function that
accepts either a list or an array to really demonstrate that there is
no mistake in any of the decimal literals or any subtle difference in
the ordering of operations that I didn't spot. Would that convince
you, David, that there is something interesting that remains to be
explained in this example?!

Best,
Rob
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread Charles R Harris
On Mon, Apr 20, 2009 at 9:49 AM, Rob Clewley rob.clew...@gmail.com wrote:

 On Mon, Apr 20, 2009 at 10:48 AM, David Cournapeau
 da...@ar.media.kyoto-u.ac.jp wrote:
  Rob Clewley wrote:
  David,
 
  I'm confused about your reply. I don't think Ruben was only asking why
  you'd ever get non-zero error after the forward and inverse transform,
  but why his implementation using lists gives zero error but using
  arrays he gets something of order 1e-15.
 
 
  That's more likely just an accident. Forward + inverse = id is the
  surprising thing, actually. In any numerical package, if you do
  ifft(fft(a)), you will not recover a exactly for any non trivial size.
  For example, with floating point numbers, the order in which you do
  operations matters, so:
 SNIP ARITHMETIC
  Will give you different values for d and c, even if you on paper,
  those are exactly the same. For those reasons, it is virtually
  impossible to have exactly the same values for two different
  implementations of the same algorithm. As long as the difference is
  small (if the reconstruction error falls in the 1e-15 range, it is
  mostly likely the case), it should not matter,

 I understand the numerical mathematics behind this very well but my
 point is that his two algorithms appear to be identical (same
 operations, same order), he simply uses lists in one and arrays in the
 other. It's not like he used vectorization or other array-related
 operations - he uses for loops in both cases. Of course I agree that
 1e-15 error should be acceptable, but that's not the point. I think
 there is legitimate curiosity in wondering why there is any difference
 between using the two data types in exactly the same algorithm.


Well, without an example it is hard to tell. Maybe the print formats are
different precisions and the list values are just getting rounded.

Chuck

snip
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread David Cournapeau
On Tue, Apr 21, 2009 at 12:49 AM, Rob Clewley rob.clew...@gmail.com wrote:
 On Mon, Apr 20, 2009 at 10:48 AM, David Cournapeau
 da...@ar.media.kyoto-u.ac.jp wrote:
 Rob Clewley wrote:
 David,

 I'm confused about your reply. I don't think Ruben was only asking why
 you'd ever get non-zero error after the forward and inverse transform,
 but why his implementation using lists gives zero error but using
 arrays he gets something of order 1e-15.


 That's more likely just an accident. Forward + inverse = id is the
 surprising thing, actually. In any numerical package, if you do
 ifft(fft(a)), you will not recover a exactly for any non trivial size.
 For example, with floating point numbers, the order in which you do
 operations matters, so:
 SNIP ARITHMETIC
 Will give you different values for d and c, even if you on paper,
 those are exactly the same. For those reasons, it is virtually
 impossible to have exactly the same values for two different
 implementations of the same algorithm. As long as the difference is
 small (if the reconstruction error falls in the 1e-15 range, it is
 mostly likely the case), it should not matter,

 I understand the numerical mathematics behind this very well but my
 point is that his two algorithms appear to be identical (same
 operations, same order), he simply uses lists in one and arrays in the
 other. It's not like he used vectorization or other array-related
 operations - he uses for loops in both cases. Of course I agree that
 1e-15 error should be acceptable, but that's not the point. I think
 there is legitimate curiosity in wondering why there is any difference
 between using the two data types in exactly the same algorithm.

Yes, it is legitimate and healthy to worry about the difference - but
the surprising thing really is the list behavior when you are used to
numerical computation :) And I maintain that the algorithms are not
the same in both operations. For once, the operation of using arrays
on the data do not give the same data in both cases, you can see right
away that m and ml are not the same, e.g.

print ml - morig

shows that the internal representation is not exactly the same.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NaN as dictionary key?

2009-04-20 Thread David Cournapeau
On Mon, Apr 20, 2009 at 11:42 PM, Wes McKinney wesmck...@gmail.com wrote:
 I assume that, because NaN != NaN, even though both have the same hash value
 (hash(NaN) == -32768), that Python treats any NaN double as a distinct key
 in a dictionary.

I think that strictly speaking, nan should not be hashable because of
nan != nan. But since that's not an error in python, I am not sure we
should do something about it.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread josef . pktd
On Mon, Apr 20, 2009 at 12:04 PM, David Cournapeau courn...@gmail.comwrote:

 On Tue, Apr 21, 2009 at 12:49 AM, Rob Clewley rob.clew...@gmail.com
 wrote:
  On Mon, Apr 20, 2009 at 10:48 AM, David Cournapeau
  da...@ar.media.kyoto-u.ac.jp wrote:
  Rob Clewley wrote:
  David,
 
  I'm confused about your reply. I don't think Ruben was only asking why
  you'd ever get non-zero error after the forward and inverse transform,
  but why his implementation using lists gives zero error but using
  arrays he gets something of order 1e-15.
 
 
  That's more likely just an accident. Forward + inverse = id is the
  surprising thing, actually. In any numerical package, if you do
  ifft(fft(a)), you will not recover a exactly for any non trivial size.
  For example, with floating point numbers, the order in which you do
  operations matters, so:
  SNIP ARITHMETIC
  Will give you different values for d and c, even if you on paper,
  those are exactly the same. For those reasons, it is virtually
  impossible to have exactly the same values for two different
  implementations of the same algorithm. As long as the difference is
  small (if the reconstruction error falls in the 1e-15 range, it is
  mostly likely the case), it should not matter,
 
  I understand the numerical mathematics behind this very well but my
  point is that his two algorithms appear to be identical (same
  operations, same order), he simply uses lists in one and arrays in the
  other. It's not like he used vectorization or other array-related
  operations - he uses for loops in both cases. Of course I agree that
  1e-15 error should be acceptable, but that's not the point. I think
  there is legitimate curiosity in wondering why there is any difference
  between using the two data types in exactly the same algorithm.

 Yes, it is legitimate and healthy to worry about the difference - but
 the surprising thing really is the list behavior when you are used to
 numerical computation :) And I maintain that the algorithms are not
 the same in both operations. For once, the operation of using arrays
 on the data do not give the same data in both cases, you can see right
 away that m and ml are not the same, e.g.

 print ml - morig

 shows that the internal representation is not exactly the same.


I think you are copying your result to your original list

instead of
morigl = ml[:]

use:

from copy import deepcopy
morigl = deepcopy(ml)


this is morigl after running your script

 morigl
[[1.0002, 5.0, 6.0, 1.9998, 9.0, 7.0, 1.0, 4.0],
[3.0, 6.0, 2.0, 6.9982, 4.0, 8.0, 5.0, 9.0], [8.0, 2.0, 9.0,
0.99989, 3.0004, 4.0009, 1.0,
4.9991], [3.9964, 1.0, 5.0, 7.9991, 3.0,
1.0036, 4.0, 6.9991], [5.9982, 2.0,
0.99989, 2.9996, 8.0, 2.0, 4.0, 3.0],
[7.9973, 5.0, 9.0, 5.0, 4.0, 2.0009,
1.0018, 5.0], [4.0, 8.0, 5.0, 9.0, 5.9991, 3.0, 2.0,
7.0009], [5.0, 6.0, 5.0, 1.0, 7.9964, 2.0, 9.0,
3.0036]]

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Getting the only object in a zero dimensional array

2009-04-20 Thread Fadhley Salim
I have a 0d array that looks like this:
 
myarray = array( 0.1234 )

This generates a TypeError:
myarray[0]

I can get it's number using a hack like this... but it looks kind of
wrong:
myval = myarray + 0.0

There must be a better way to do it, right? All I want is the correct
way to return the value of the only item in a 0d array.

Thanks!

Sal 

This email does not create a legal relationship between any member of the 
Cr=E9dit Agricole group and the recipient or constitute investment advice. 
The content of this email should not be copied or disclosed (in whole or 
part) to any other person. It may contain information which is 
confidential, privileged or otherwise protected from disclosure. If you 
are not the intended recipient, you should notify us and delete it from 
your system. Emails may be monitored, are not secure and may be amended, 
destroyed or contain viruses and in communicating with us such conditions 
are accepted. Any content which does not relate to business matters is not 
endorsed by us.

Calyon is authorised by the Comit=e9 des Etablissements de Cr=e9dit et des
Entreprises d'Investissement (CECEI) and supervised by the Commission Bancaire
in France and subject to limited regulation by the Financial Services Authority.
Details about the extent of our regulation by the Financial Services Authority
are available from us on request. Calyon is incorporated in France with limited
liability and registered in England  Wales. Registration number: FC008194.
Registered office: Broadwalk House, 5 Appold Street, London, EC2A 2DA.
This message and/or any  attachments is intended for the sole use of its 
addressee.
If you are not the addressee, please immediately notify the sender and then 
destroy the message.
As this message and/or any attachments may have been altered without our 
knowledge, its content is not legally binding on CALYON Crédit Agricole CIB.
All rights reserved.

Ce message et ses pièces jointes est destiné à l'usage exclusif de son 
destinataire.
Si vous recevez ce message par erreur, merci d'en aviser immédiatement 
l'expéditeur et de le détruire ensuite.
Le présent message pouvant être altéré à notre insu, CALYON Crédit Agricole CIB 
ne peut pas être engagé par son contenu.
Tous droits réservés.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Getting the only object in a zero dimensional array

2009-04-20 Thread Robert Kern
On Mon, Apr 20, 2009 at 12:39, Fadhley Salim
fadhley.sa...@uk.calyon.com wrote:
 I have a 0d array that looks like this:

 myarray = array( 0.1234 )

 This generates a TypeError:
 myarray[0]

myarray[()] or myarray.item()

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread Matthieu Brucher
 I understand the numerical mathematics behind this very well but my
 point is that his two algorithms appear to be identical (same
 operations, same order), he simply uses lists in one and arrays in the
 other. It's not like he used vectorization or other array-related
 operations - he uses for loops in both cases. Of course I agree that
 1e-15 error should be acceptable, but that's not the point. I think
 there is legitimate curiosity in wondering why there is any difference
 between using the two data types in exactly the same algorithm.

Hi,
There are cases where the same algorithm implementation outputs
different results in optimized mode, depending on data alignment. As
David said, this is something that is known or at least better known.

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread josef . pktd
On Mon, Apr 20, 2009 at 1:47 PM, Matthieu Brucher 
matthieu.bruc...@gmail.com wrote:

  I understand the numerical mathematics behind this very well but my
  point is that his two algorithms appear to be identical (same
  operations, same order), he simply uses lists in one and arrays in the
  other. It's not like he used vectorization or other array-related
  operations - he uses for loops in both cases. Of course I agree that
  1e-15 error should be acceptable, but that's not the point. I think
  there is legitimate curiosity in wondering why there is any difference
  between using the two data types in exactly the same algorithm.

 Hi,
 There are cases where the same algorithm implementation outputs
 different results in optimized mode, depending on data alignment. As
 David said, this is something that is known or at least better known.


in this example, there is no difference between numpy and python list if you
deepcopy
instead of mutating your original lists, see:

diff_npl = [[0.0]*width for i in range(height)]
for row in range(height):
for col in range(width):
diff_npl[row][col] = s[row][col] - sl[row][col]
print \n* Difference numpy list:
print diff_npl
print maximum difference =, max(diff_npl)
print minimum diff =, min(diff_npl)

* Difference numpy list:
[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]
maximum difference = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
minimum diff = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread John Gleeson

On 2009-04-20, at 10:04 AM, David Cournapeau wrote:


 Yes, it is legitimate and healthy to worry about the difference - but
 the surprising thing really is the list behavior when you are used to
 numerical computation :) And I maintain that the algorithms are not
 the same in both operations. For once, the operation of using arrays
 on the data do not give the same data in both cases, you can see right
 away that m and ml are not the same, e.g.

 print ml - morig

 shows that the internal representation is not exactly the same.

 David
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

The discrepancy David found in ml - morig vanishes if you change line  
118

morigl = ml[:]

to

import copy
morigl = copy.deepcopy(ml)

There is still an issue with disagreement in minimum diff, but I'd bet  
it is not a floating point precision problem.

John
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-20 Thread Charles R Harris
On Mon, Apr 20, 2009 at 9:48 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:

 Hi David

 On Mon, Apr 20, 2009 at 6:51 AM, David Cournapeau 
 da...@ar.media.kyoto-u.ac.jp wrote:

 Hi,

For quite a long time I have been bothered by the very large files
 needed for python extensions. In particular for numpy.core, which
 consists in a few files which are ~ 1 Mb, I find this a pretty high
 barrier of entry for newcomers, and it has quite a big impact on the
 code organization. I think I have found a way to split things on common
 platforms (this includes at least windows, mac os x, linux and solaris),
 without impacting other  potentially less capable platforms, or static
 linking of numpy.


 There was a discussion of this a couple of years ago. I was in favor of
 many small files maybe in subdirectories. Robert, IIRC, thought too many
 small files could become confusing, so there is a fine line in there
 somewhere.  I am generally in favor of breaking the files up into their
 functional components and maybe rewriting some of the upper level interface
 files in cython. But it does need some agreement and we should probably
 start by just breaking up a few files. I don't have a problem with big files
 that are just collections of small routines all of the same type,
 umath_loops.inc.src for instance.



Here is a link to the start of the old
discussionhttp://article.gmane.org/gmane.comp.python.numeric.general/12974/match=exported+symbols+code+reorganization.
You took part in it also.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread Ruben Salvador
Well, thanks everybody for such a quick help!! I just couldn't imagine what
could arise this difference.

1e-15 is enough precision for what I will be doing, but was just curious.

Anyway, this 'deepcopy' really surprised me. I have now looked for it and I
think I get an idea of it, though I wouldn't expect this difference in
behavior to happen. From my logical point of view a copy is a copy, and if
I find a method called copy() I can only expect it to...copy (whether I am
copying compound or simple objects!!). This comment is only for the shake
of curiosity. I guess maybe it is just my lack of knowledge in programming
in general and this is such a needed difference in copy behavior.

On Mon, Apr 20, 2009 at 7:21 PM, John Gleeson jdglee...@mac.com wrote:


 On 2009-04-20, at 10:04 AM, David Cournapeau wrote:

 
  Yes, it is legitimate and healthy to worry about the difference - but
  the surprising thing really is the list behavior when you are used to
  numerical computation :) And I maintain that the algorithms are not
  the same in both operations.


It's not? Would you please mind commenting this a little bit?


 For once, the operation of using arrays
  on the data do not give the same data in both cases, you can see right
  away that m and ml are not the same, e.g.


I don't get what you mean



 
  print ml - morig
 
  shows that the internal representation is not exactly the same.
 
  David

 The discrepancy David found in ml - morig vanishes if you change line
 118

 morigl = ml[:]


Sorry for this mistake :S


 to

 import copy
 morigl = copy.deepcopy(ml)

 There is still an issue with disagreement in minimum diff, but I'd bet
 it is not a floating point precision problem.

 John
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread Christopher Barker
Ruben Salvador wrote:
 Anyway, this 'deepcopy' really surprised me. I have now looked for it 
 and I think I get an idea of it, though I wouldn't expect this 
 difference in behavior to happen. From my logical point of view a copy 
 is a copy, and if I find a method called copy() I can only expect it 
 to...copy (whether I am copying compound or simple objects!!).

I think this also highlights an advantage of numpy - a 2-d array is a 
2-d array, NOT a 1-d array of 1-d arrays. When you used lists, you were 
using a list-of-lists, so copying copied the main list, but only 
referenced the interval lists.

One other note: if you do a complex computation with floats, then 
reverse it, and get back EXACTLY the same numbers, you can be pretty 
sure something is wrong!

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy Mac binary for Python 2.6

2009-04-20 Thread Christopher Barker
Russell E. Owen wrote:
 http://www.pymvpa.org/devguide.html
 
 The patch at the end of this document worked.

Has anyone submitted these patches so they'll get into bdist_mpkg? I'm 
guessing Ronald Oussoren would be the person to accept them, but you can 
post to the MacPython list to be sure.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] help getting started

2009-04-20 Thread Christopher Barker
David Cournapeau wrote:
 Christopher, would you mind trying the following binary ?
 
 http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy/scipy-0.7.0-py2.5-macosx10.5.mpkg.tar

That binary seems to be working OK on my  PPC 10.4 mac, with no gfortran 
installed.

Is it up on the main site yet?

Thanks for all  your work on this,

-CHB


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy float precision vs Python list float issue

2009-04-20 Thread David Cournapeau
Ruben Salvador wrote:
 Anyway, this 'deepcopy' really surprised me. I have now looked for it
 and I think I get an idea of it, though I wouldn't expect this
 difference in behavior to happen. From my logical point of view a
 copy is a copy, and if I find a method called copy() I can only expect
 it to...copy (whether I am copying compound or simple objects!!).
 This comment is only for the shake of curiosity. I guess maybe it is
 just my lack of knowledge in programming in general and this is such a
 needed difference in copy behavior.

That's something which is related to python. Python lists are mutable,
meaning that when you modify one variable, you modify it in place.
Simply said, copying a list means creating a new contained with the
*same* items, and a deepcopy means creating a new container with a
(potentially recursive) copy of every item.


 On Mon, Apr 20, 2009 at 7:21 PM, John Gleeson jdglee...@mac.com
 mailto:jdglee...@mac.com wrote:


 On 2009-04-20, at 10:04 AM, David Cournapeau wrote:

 
  Yes, it is legitimate and healthy to worry about the difference
 - but
  the surprising thing really is the list behavior when you are
 used to
  numerical computation :) And I maintain that the algorithms are not
  the same in both operations.


 It's not? Would you please mind commenting this a little bit?

they are not because of the conversion to numpy array. And maybe other
implementation details. The thing which matters is which operations are
done in the hardware, and that's really difficult to control exactly
once you have two implementations. The same code could give different
results on different OS, different platforms, etc... The main point is
to remember that you can't expect exact results with floating point
units, and that for all practical purpose, you can't guarantee two
implementations to be the same. For example, python float behavior and
numpy array behavior may be different w.r.t FPU exceptions, depending on
the environment and your settings.. They *could* be equal, but that
would most likely be an accident.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-20 Thread David Cournapeau
Charles R Harris wrote:
 

 Here is a link to the start of the old discussion
 http://article.gmane.org/gmane.comp.python.numeric.general/12974/match=exported+symbols+code+reorganization.
 You took part in it also.

Thanks, I remembered we had the discussion, but could not find it. The
different is that I am much more familiar with the technical details and
numpy codebase now :) I know how to control exported symbols on most
platform which matter (I can't test for AIX or HP-UX unfortunately - but
I am perfectly fine with ignoring namespace pollution on those anyway),
and I would guess that the only platforms which do not support symbol
visibility in one way or the other do not support shared library anyway
(some CRAY stuff, for example).

Concerning the file size, I don't think anyone would disagree that they
are too big, but we don't need to go the java-way of one
file/class-function either. One first split which I personally like is
API/implementation. For example, for multiarray.c, we would only keep
the public PyArray_* functions, and put everything else in another file.
The other very big file is arrayobject.c, and this one is already mostly
organized in independent parts (buffer protocol, number protocol, etc...)

Another thing I would like to do it to make the global C API array
pointer a 'true' global variable instead of a static one. It took me a
while when I was working on the hashing protocol for dtype to understand
why it was crashing (the array pointer being static, every file has its
own copy, so it was never initialized in the hashdescr.c file). I think
a true global variable, hidden through a symbol map, is easier to
understand and more reliable.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-20 Thread Charles R Harris
On Mon, Apr 20, 2009 at 10:13 PM, David Cournapeau 
da...@ar.media.kyoto-u.ac.jp wrote:

 Charles R Harris wrote:

 
  Here is a link to the start of the old discussion
  
 http://article.gmane.org/gmane.comp.python.numeric.general/12974/match=exported+symbols+code+reorganization
 .
  You took part in it also.

 Thanks, I remembered we had the discussion, but could not find it. The
 different is that I am much more familiar with the technical details and
 numpy codebase now :) I know how to control exported symbols on most
 platform which matter (I can't test for AIX or HP-UX unfortunately - but
 I am perfectly fine with ignoring namespace pollution on those anyway),
 and I would guess that the only platforms which do not support symbol
 visibility in one way or the other do not support shared library anyway
 (some CRAY stuff, for example).

 Concerning the file size, I don't think anyone would disagree that they
 are too big, but we don't need to go the java-way of one
 file/class-function either. One first split which I personally like is
 API/implementation. For example, for multiarray.c, we would only keep
 the public PyArray_* functions, and put everything else in another file.
 The other very big file is arrayobject.c, and this one is already mostly
 organized in independent parts (buffer protocol, number protocol, etc...)

 Another thing I would like to do it to make the global C API array
 pointer a 'true' global variable instead of a static one. It took me a
 while when I was working on the hashing protocol for dtype to understand
 why it was crashing (the array pointer being static, every file has its
 own copy, so it was never initialized in the hashdescr.c file). I think
 a true global variable, hidden through a symbol map, is easier to
 understand and more reliable.


I made an experiment along those lines a couple of years ago. There were
compilation problems because the needed include files weren't available. No
doubt that could be fixed in the build, but at some point I would like to
have real include files, not the generated variety. Generated include files
are kind of bogus IMHO, as they don't define an interface but rather reflect
whatever the function definition happens to be. So as any part of a split I
would also suggest writing the associated include files. That would also
make separate compilation possible, which would make it easier to do test
compilations while doing development.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion