[Numpy-discussion] What should be the value of nansum of nan's?

2010-04-28 Thread Charles R Harris
Hi All,

We need to make a decision for ticket
#1123http://projects.scipy.org/numpy/ticket/1123#comment:11regarding
what nansum should return when all values are nan. At some earlier
point it was zero, but currently it is nan, in fact it is nan whatever the
operation is. That is consistent, simple and serves to mark the array or
axis as containing all nans. I would like to close the ticket and am a bit
inclined to go with the current behaviour although there is an argument to
be made for returning 0 for the nansum case. Thoughts?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the value of nansum of nan's?

2010-04-28 Thread Charles R Harris
On Mon, Apr 26, 2010 at 10:55 AM, Charles R Harris 
charlesr.har...@gmail.com wrote:

 Hi All,

 We need to make a decision for ticket 
 #1123http://projects.scipy.org/numpy/ticket/1123#comment:11regarding what 
 nansum should return when all values are nan. At some earlier
 point it was zero, but currently it is nan, in fact it is nan whatever the
 operation is. That is consistent, simple and serves to mark the array or
 axis as containing all nans. I would like to close the ticket and am a bit
 inclined to go with the current behaviour although there is an argument to
 be made for returning 0 for the nansum case. Thoughts?


To add a bit of context, one could argue that the results should be
consistent with the equivalent operations on empty arrays and always be
non-nan.

In [1]: nansum([])
Out[1]: nan

In [2]: sum([])
Out[2]: 0.0

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Memory profiling NumPy code?

2010-04-28 Thread Joe Kington
I know you're looking for something with much more fine-grained control,
(which I can't help much with) but I often find it useful to just plot the
overall memory of the program over time.

There may be an slicker way to do it, but here's the script I use, anyway...
(saved as ~/bin/quick_profile, usage quick_profile (whatever), e.g.
quick_profile python script.py)

# /bin/sh

# Setup
datfile=$(mktemp)
echo ElapsedTime MemUsed  $datfile

starttime=$(date +%s.%N)

# Run the specified command in the background
$@ 

# While the last process is still going
while [ -n `ps --no-headers $!` ]
do
bytes=$(ps -o rss -C $1 --no-headers | awk '{SUM += $1} END {print
SUM}')
elapsed=$(echo $(date +%s.%N) - $starttime | bc)
echo $elapsed $bytes  $datfile
sleep 0.1
done

# Plot up the results with matplotlib
cat EOF | python
import pylab, sys, numpy
infile = file($datfile)
infile.readline() # skip first line
data = [[float(dat) for dat in line.strip().split()] for line in infile]
data = numpy.array(data)
time,mem = data[:,0], data[:,1]/1024
pylab.plot(time,mem)
pylab.title(Profile of:  \%s\  % $@)
pylab.xlabel('Elapsed Time (s): Total %0.5f s' % time.max())
pylab.ylabel('Memory Used (MB): Peak %0.2f MB' % mem.max())
pylab.show()
EOF

rm $datfile

Hope that helps a bit, anyway...
-Joe

On Mon, Apr 26, 2010 at 6:16 AM, David Cournapeau courn...@gmail.comwrote:

 On Mon, Apr 26, 2010 at 7:57 PM, Dag Sverre Seljebotn
 da...@student.matnat.uio.no wrote:
  I'd like to profile the memory usage of my application using tools like
  e.g. Heapy. However since NumPy arrays are allocated in C they are not
  tracked by Python memory profiling.
 
  Does anybody have a workaround to share? I really just want to track a
  few arrays in a friendly way from within Python (I am aware of the
  existance of C-level profilers).

 I think heapy has some hooks so that you can add support for
 extensions. Maybe we could provide a C API in numpy to make this easy,

 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread David Warde-Farley
Trying to debug code written by an undergrad working for a colleague of 
mine who ported code over from MATLAB, I am seeing an ugly melange of 
matrix objects and ndarrays that are interacting poorly with each other 
and various functions in SciPy/other libraries. In particular there was 
a custom minimizer function that contained a line a * b, that was 
receiving an Nx1 matrix and a N-length array and computing an outer 
product. Hence the unexpected 6 GB of memory usage and weird results...

We've had this discussion before and it seems that the matrix class 
isn't going anywhere (I *really* wish it would at least be banished from 
the top-level namespace), but it has its adherents for pedagogical 
reasons. Could we at least consider putting a gigantic warning on all 
the functions for creating matrix objects (matrix, mat, asmatrix, etc.) 
that they may not behave quite so predictably in some situations and 
should be avoided when writing nontrivial code?

There are already such warnings scattered about on SciPy.org but the 
situation is so bad, in my opinion (bad from a programming perspective 
and bad from a new user perspective, asking why doesn't this work? why 
doesn't that work? why is this language/library/etc. so stupid, 
inconsistent, etc.?) that the situation warrants steering people still 
further away from the matrix object.

I apologize for ranting, but it pains me when people give up on 
Python/NumPy because they can't figure out inconsistencies that aren't 
really there for a good reason. IMHO, of course.

David

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ndimage.label - howto force SWIG to use int32 - even on 64bit Linux ?

2010-04-28 Thread Sebastian Haase
Hi,
I wanted to write some C code to accept labels as they come from ndimage.label.
For some reason ndimage.label produces its output as an int32 array -
even on my 64bit system .

BTW, could this be considered a bug ?

Now, if I use the typemaps of numpy.i I can choose between NPY_LONG and NPY_INT.
But those are sometimes 32 sometimes 64 bit, depending on the system.

Any ideas ... ?

Thanks,
Sebastian Haase
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Speeding up loadtxt / savetxt

2010-04-28 Thread Stéfan van der Walt
Hi Andreas

On 23 April 2010 10:16, Andreas li...@hilboll.de wrote:
 I was wondering if there's a way to speedup loadtxt/savetxt for big
 arrays? So far, I'm plainly using something like this::

Do you specifically need to store text files?  NumPy's binary storage
functions (numpy.load and save) are faster.

Also, an efficient reader for very simply formatted text is provided
by numpy.fromfile.

Regards
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ma.std(ddof=1) bug?

2010-04-28 Thread Pierre GM
On Apr 23, 2010, at 12:45 PM, josef.p...@gmail.com wrote:
 Is there a reason why ma.std(ddof=1)  does not calculated the std if
 there are 2 valid values?


Bug! Good call... Should be fixed in SVN r8370.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [ANN] la 0.2, the labeled array

2010-04-28 Thread Keith Goodman
I am pleased to announce the second release of the la package, version 0.2.

The main class of the la package is a labeled array, larry. A larry
consists of a data array and a label list. The data array is stored as
a NumPy array and the label list as a list of lists.

larry has built-in methods such as movingsum, ranking, merge, shuffle,
zscore, demean, lag as well as typical Numpy methods like sum, max,
std, sign, clip. NaNs are treated as missing data.

Alignment by label is automatic when you add (or subtract, multiply,
divide) two larrys.

larry adds the convenience of labels, provides many built-in methods,
and let's you use many of your existing array functions.

Download: https://launchpad.net/larry/+download
docs  http://larry.sourceforge.net
code  https://launchpad.net/larry
list  http://groups.google.ca/group/pystatsmodels

=
Release Notes
=

la 0.2 (avocado)


*Release date: 2010-04-27*

New larry methods
-
- lix : Index into a larry using labels or index numbers or both
- swapaxes : Swap the two specified axes
- sortaxis : Sort data (and label) according to label along specified axis
- flipaxis : Reverse the order of the elements along the specified axis
- tocsv : Save larry to a csv file
- fromcsv : Load a larry from a csv file
- insertaxis : Insert a new axis at the specified position
- invert : Element by element inverting of True to False and False to True

Enhancements

- All larry methods can now take nd input arrays (some previously 2d only)
- Added ability to save larrys with datetime.date labels to HDF5
- New function (panel) to convert larry of shape (n, m, k) to shape (m*k, n)
- Expanded documentation
- Over 280 new unit tests; testing easier with new assert_larry_equal function

Bug fixes
-
- #517912: larry([]) == larry([]) raised IndexError
- #518096: larry.fromdict failed due to missing import
- #518106: la.larry.fromdict({}) failed
- #518114: fromlist([]) and fromtuples([]) failed
- #518135: keep_label crashed when there was nothing to keep
- #518210: sum, std, var returned NaN for empty larrys; now return 0.0
- #518215: unflatten crashed on an empty larry
- #518442: sum, std, var returned NaN for shapes that contain zero: (2, 0, 3)
- #568175: larry.std(axis=-1) and var crashed on negative axis input
- #569622: Negative axis input gave wrong output for several larry methods
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Recommended way to add Cython extension using numpy.distutils?

2010-04-28 Thread Matthew Brett
Hi,

We (neuroimaging.scipy.org) are using numpy.distutils, and we have
.pyx files that we build with Cython.

I wanted to add these in our current setup.py scripts, with something like:

def configuration(parent_package='',top_path=None):
from numpy.distutils.misc_util import Configuration
config = Configuration('statistics', parent_package, top_path)
config.add_extension('intvol',
 ['intvol.pyx'], include_dirs = [np.get_include()])
return config

but of course numpy only knows about Pyrex, and returns:

error: Pyrex required for compiling
'nipy/algorithms/statistics/intvol.pyx' but notavailable

Is there a recommended way to plumb Cython into the numpy build
machinery?  Should I try and patch numpy distutils to use Cython if
present?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] passing non-flat array to interpolator

2010-04-28 Thread josef . pktd
On Mon, Apr 26, 2010 at 12:04 PM, Thomas temesgen...@gmail.com wrote:


 I have some problem with interpolators in Scipy
 does anyone knows if there is a way to pass a non-flat
 array variables to Rbf, or other Scipy interpolator
 eg. for my case of 17 x 1 problems of 500 data size

 x1.shape = (500,)
 x2.shape = (500,)
 ...
 X17.shape =(500,)
 b = Rbf(x1,x2,x3,...,x17,y)

 i would rather create a non-flat variables
 x.shape =(500,17) and pass it to Rbf, or even for all the
 interpolator in Scipy as
  bf = Rbf(X, Y)
 How can i do this ?  Thank you for your time.
 Thomas

Rbf(*np.c_[X,Y].T)

or

Rbf(*(list(X.T)+[Y]))

I think the second version does not make a copy of the data when
building the list.

It would be easier if the xs and y were reversed in the signature, Rbf(y, *X.T)

Josef




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ndimage.label - howto force SWIG to use int32 - even on 64bit Linux ?

2010-04-28 Thread Sebastian Haase
(2nd try to get this post into the mailing list archive...)

Hi,
I wanted to write some C code to accept labels as they come from ndimage.label.
For some reason ndimage.label produces its output as an int32 array -
even on my 64bit system .

BTW, could this be considered a bug ?

Now, if I use the typemaps of numpy.i I can choose between NPY_LONG and NPY_INT.
But those are sometimes 32 sometimes 64 bit, depending on the system.

Any ideas ... ?

Thanks,
Sebastian Haase
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Incomplete uninstall of 1.4.0 superpack

2010-04-28 Thread David
On 04/27/2010 01:08 AM, threexk threexk wrote:
 David Cournapeau wrote:
   On Mon, Apr 26, 2010 at 2:42 AM, threexk threexk
 thre...@hotmail.com wrote:
Hello,
   
I recently uninstalled the NumPy 1.4.0 superpack for Python 2.6 on
 Windows
7, and afterward a dialog popped up that said 1 file or directory
 could not
be removed. Does anyone have any idea which file/directory this is? The
dialog gave no indication. Is an uninstall log with details generated
anywhere?
  
   There should be one in C:\Python*, something like numpy-*-wininst.log

 Looks like that log gets deleted after uninstallation (as it probably
 should), so I still could not figure out which file/directory was not
 deleted. I found that \Python26\Lib\site-packages\numpy and many
 files/directories under it have remained after uninstall. So, I tried
 reinstalling 1.4.0 and uninstalling again. This time, the uninstaller
 did not report not being able to remove files/directories, but it still
 did not delete the aforementioned numpy directory. I believe this is a
 bug with the uninstaller?

Could you maybe post the log (before uninstalling) and list the 
remaining files ?

Note though that we most likely won't be able to do much - we do not 
have much control over the generated installers,

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] floats as axis

2010-04-28 Thread Travis Oliphant

On Apr 25, 2010, at 8:16 AM, josef.p...@gmail.com wrote:

 (some) numpy functions take floats as valid axis argument. Is this a  
 feature?

 np.ones((2,3)).sum(1.2)
 array([ 3.,  3.])
 np.ones((2,3)).sum(1.99)
 array([ 3.,  3.])

 np.mean((1.5,0.5))
 1.0
 np.mean(1.5,0.5)
 1.5




 Keith pointed out that scipy.stats.nanmean has a different behavior


I think we should make float inputs raise an error for NumPy 2.0


-Travis




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Travis Oliphant

On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote:

 Trying to debug code written by an undergrad working for a colleague  
 of
 mine who ported code over from MATLAB, I am seeing an ugly melange of
 matrix objects and ndarrays that are interacting poorly with each  
 other
 and various functions in SciPy/other libraries. In particular there  
 was
 a custom minimizer function that contained a line a * b, that was
 receiving an Nx1 matrix and a N-length array and computing an outer
 product. Hence the unexpected 6 GB of memory usage and weird  
 results...

Overloading '*' and '**' while convenient does have consequences.   It  
would be nice if we could have a few more infix operators in Python to  
allow separation of  element-by-element calculations and dot-product  
calculations.

A proposal was made to allow calling a NumPy array to infer dot  
product:

a(b) is equivalent to dot(a,b)

a(b)(c) would be equivalent to dot(dot(a,b),c)

This seems rather reasonable.


While I don't have any spare cycles to push it forward and we are  
already far along on the NumPy to 3.0, I had wondered if we couldn't  
use the leverage of Python core developers wanting NumPy to be ported  
to Python 3 to actually add a few more infix operators to the language.

One of the problems of moving to Python 3.0 for many people is that  
there are not  new features to outweigh the hassle of moving. 
Having a few more infix operators would be a huge incentive to the  
NumPy community to move to Python 3.

Anybody willing to lead the charge with the Python developers?

-Travis



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Speeding up loadtxt / savetxt

2010-04-28 Thread Andreas Hilboll
Hi Stéfan,

 Do you specifically need to store text files?  NumPy's binary storage
 functions (numpy.load and save) are faster.

Yes, I know. But the files I create must be readable by an application
developed in-house at our institude, and that only supports a) ASCII files
or b) some home-grown binary format, which I hate.

 Also, an efficient reader for very simply formatted text is provided
 by numpy.fromfile.

Yes, I heard about it. But the files I have to read have comments in them,
and I didn't find a way to exclude these easily.

Time needed to read a 100M file is ~13 seconds, and to write ~5 seconds.
Which is not too bad, but also still too much ...

Thanks,

Andreas

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Dag Sverre Seljebotn
David Warde-Farley wrote:
 Trying to debug code written by an undergrad working for a colleague of 
 mine who ported code over from MATLAB, I am seeing an ugly melange of 
 matrix objects and ndarrays that are interacting poorly with each other 
 and various functions in SciPy/other libraries. In particular there was 
 a custom minimizer function that contained a line a * b, that was 
 receiving an Nx1 matrix and a N-length array and computing an outer 
 product. Hence the unexpected 6 GB of memory usage and weird results...

If this was in a library function of some sort, I think they should 
always call np.asarray on the input arguments. That converts matrices to 
normal arrays.

It could have been Python lists-of-lists, other PEP 3118 objects -- in 
Python an object can be everything in general, and I think it is very 
proper for most reusable functions to either validate the type of their 
arguments or take some steps to convert.

That said, I second that it would be good to deprecate the matrix class 
from NumPy. The problem for me is not the existance of a matrix class as 
such, but the fact that it subclasses np.ndarray and is so similar with 
it, breaking a lot of rules for OO programming in the process.

(Example: I happen to have my own oomatrix.py which allows me to do

P, L = (A * A.H).cholesky()
y = L.solve_right(x)

This works fine because the matrices don't support any NumPy operations, 
and so I don't confuse them. But it helps to have to habit to do 
np.asarray in reusable functions so that errors are caught early.

I do this so that A above can be either sparse, dense, triangular, 
diagonal, etc. -- i.e. polymorphic linear algebra. On the other hand, 
they don't even support single-element lookups, although that's just 
because I've been to lazy to implement it. Iteration is out of the 
question, it's just not the level of abstraction I'd like a matrix to 
work at.)

Dag Sverre

 
 We've had this discussion before and it seems that the matrix class 
 isn't going anywhere (I *really* wish it would at least be banished from 
 the top-level namespace), but it has its adherents for pedagogical 
 reasons. Could we at least consider putting a gigantic warning on all 
 the functions for creating matrix objects (matrix, mat, asmatrix, etc.) 
 that they may not behave quite so predictably in some situations and 
 should be avoided when writing nontrivial code?
 
 There are already such warnings scattered about on SciPy.org but the 
 situation is so bad, in my opinion (bad from a programming perspective 
 and bad from a new user perspective, asking why doesn't this work? why 
 doesn't that work? why is this language/library/etc. so stupid, 
 inconsistent, etc.?) that the situation warrants steering people still 
 further away from the matrix object.
 
 I apologize for ranting, but it pains me when people give up on 
 Python/NumPy because they can't figure out inconsistencies that aren't 
 really there for a good reason. IMHO, of course.
 
 David
 
 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the value of nansum of nan's?

2010-04-28 Thread Travis Oliphant


On Apr 26, 2010, at 12:03 PM, Charles R Harris wrote:




On Mon, Apr 26, 2010 at 10:55 AM, Charles R Harris charlesr.har...@gmail.com 
 wrote:

Hi All,

We need to make a decision for ticket #1123 regarding what nansum  
should return when all values are nan. At some earlier point it was  
zero, but currently it is nan, in fact it is nan whatever the  
operation is. That is consistent, simple and serves to mark the  
array or axis as containing all nans. I would like to close the  
ticket and am a bit inclined to go with the current behaviour  
although there is an argument to be made for returning 0 for the  
nansum case. Thoughts?



To add a bit of context, one could argue that the results should be  
consistent with the equivalent operations on empty arrays and always  
be non-nan.


In [1]: nansum([])
Out[1]: nan

In [2]: sum([])
Out[2]: 0.0



I favor nansum([])  returning 0.0 which implies returning 0.0 when all  
the elements are nan.


-Travis





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Robert Kern
On Wed, Apr 28, 2010 at 11:05, Travis Oliphant oliph...@enthought.com wrote:

 On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote:

 Trying to debug code written by an undergrad working for a colleague
 of
 mine who ported code over from MATLAB, I am seeing an ugly melange of
 matrix objects and ndarrays that are interacting poorly with each
 other
 and various functions in SciPy/other libraries. In particular there
 was
 a custom minimizer function that contained a line a * b, that was
 receiving an Nx1 matrix and a N-length array and computing an outer
 product. Hence the unexpected 6 GB of memory usage and weird
 results...

 Overloading '*' and '**' while convenient does have consequences.   It
 would be nice if we could have a few more infix operators in Python to
 allow separation of  element-by-element calculations and dot-product
 calculations.

 A proposal was made to allow calling a NumPy array to infer dot
 product:

 a(b) is equivalent to dot(a,b)

 a(b)(c) would be equivalent to dot(dot(a,b),c)

 This seems rather reasonable.


 While I don't have any spare cycles to push it forward and we are
 already far along on the NumPy to 3.0, I had wondered if we couldn't
 use the leverage of Python core developers wanting NumPy to be ported
 to Python 3 to actually add a few more infix operators to the language.

 One of the problems of moving to Python 3.0 for many people is that
 there are not  new features to outweigh the hassle of moving.
 Having a few more infix operators would be a huge incentive to the
 NumPy community to move to Python 3.

 Anybody willing to lead the charge with the Python developers?

There is currently a moratorium on language changes. This will have to wait.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the value of nansum of nan's?

2010-04-28 Thread Warren Weckesser
Travis Oliphant wrote:

 On Apr 26, 2010, at 12:03 PM, Charles R Harris wrote:



 On Mon, Apr 26, 2010 at 10:55 AM, Charles R Harris 
 charlesr.har...@gmail.com mailto:charlesr.har...@gmail.com wrote:

 Hi All,

 We need to make a decision for ticket #1123
 http://projects.scipy.org/numpy/ticket/1123#comment:11
 regarding what nansum should return when all values are nan. At
 some earlier point it was zero, but currently it is nan, in fact
 it is nan whatever the operation is. That is consistent, simple
 and serves to mark the array or axis as containing all nans. I
 would like to close the ticket and am a bit inclined to go with
 the current behaviour although there is an argument to be made
 for returning 0 for the nansum case. Thoughts?


 To add a bit of context, one could argue that the results should be 
 consistent with the equivalent operations on empty arrays and always 
 be non-nan.

 In [1]: nansum([])
 Out[1]: nan

 In [2]: sum([])
 Out[2]: 0.0


 I favor nansum([])  returning 0.0 which implies returning 0.0 when all 
 the elements are nan.


+1

 -Travis





 

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
   

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the value of nansum of nan's?

2010-04-28 Thread Keith Goodman
On Mon, Apr 26, 2010 at 9:55 AM, Charles R Harris
charlesr.har...@gmail.com wrote:
 Hi All,

 We need to make a decision for ticket #1123 regarding what nansum should
 return when all values are nan. At some earlier point it was zero, but
 currently it is nan, in fact it is nan whatever the operation is. That is
 consistent, simple and serves to mark the array or axis as containing all
 nans. I would like to close the ticket and am a bit inclined to go with the
 current behaviour although there is an argument to be made for returning 0
 for the nansum case. Thoughts?

I use nansum a lot because I treat NaNs as missing data. I think a lot
of people use NaNs as missing data but few admit it. My packages have
grown to depend on nansum([nan, nan]) returning NaN. I vote to keep
the current behavior. Changing nansum([]) to return zero, however, has
no impact on me.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bug in MaskedArray min/max functions

2010-04-28 Thread Philip Cooper

if a masked array is created and has no masked values it can have a mask of 
just False scalar false that is.

This causes an error when getting the max or min on axis=1
I don't have a fix but do offer a workaround.
If you reset the mask to the expandedmask all works ok (but it's a bug that 
pops up all over my code)

Below a and m1 work but m5 only works on axis = 0.  anyway you can look 
at:   (see http://www.openvest.com/trac/wiki/MaskedArrayMinMax if the 
formatting doesn't survive the mail posting process)

#
 import numpy as np
 a = np.array([np.arange(5)])
 a
array([[0, 1, 2, 3, 4]])
 m1 = np.ma.masked_values(a,1)
 m5 = np.ma.masked_values(a,5)
 m1
masked_array(data =
 [[0 -- 2 3 4]],
 mask =
 [[False  True False False False]],
   fill_value = 1)

 m5
masked_array(data =
 [[0 1 2 3 4]],
 mask =
 False,
   fill_value = 5)

 a.min(axis=0)
array([0, 1, 2, 3, 4])
 m5.min(axis=0)
masked_array(data = [0 1 2 3 4],
 mask = False,
   fill_value = 99)

 m1.min(axis=0)
masked_array(data = [0 -- 2 3 4],
 mask = [False  True False False False],
   fill_value = 99)

 a.min(axis=1)
array([0])
 m1.min(axis=1)
masked_array(data = [0],
 mask = [False],
   fill_value = 99)
 m5.min(axis=1)
Traceback (most recent call last):
  File stdin, line 1, in module
  File /Library/Python/2.6/site-packages/numpy/ma/core.py, line 5020, in min
newmask = _mask.all(axis=axis)
ValueError: axis(=1) out of bounds
### workaround
 m5.mask = np.ma.getmaskarray(m5)
 m1.min(axis=1)
masked_array(data = [0],
 mask = [False],
   fill_value = 99)
-- 
Philip J. Cooper (CFA)___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Speeding up loadtxt / savetxt

2010-04-28 Thread Chris Barker
Andreas Hilboll wrote:
 Yes, I know. But the files I create must be readable by an application
 developed in-house at our institude, and that only supports a) ASCII files
 or b) some home-grown binary format, which I hate.
 
 Also, an efficient reader for very simply formatted text is provided
 by numpy.fromfile.
 
 Yes, I heard about it. But the files I have to read have comments in them,
 and I didn't find a way to exclude these easily.

you can't do i with fromfile -- I think it would be Very useful to have 
a fromfile() like functionality with a few more features: comments lines 
and allowing non-whitespace delimiters while reading multiple lines. See 
my posts about this in the past.

I did spend a non-trivial amount of time looking into how to add these 
features, and fix some bugs in the process -- again, see my posts in the 
past. It turns out that the fromfile code is some pretty ugly C--a 
result of supporting all numpy data types, and compatibility with 
tradition C functions--so it's a bit of a chore, at least for a lame C 
programmer like me.

I'm still not sure what I'll do when I get some time to look at this 
again -- I may simply start from scratch with Cython.

It would be great if someone wanted to take it on

 Time needed to read a 100M file is ~13 seconds, and to write ~5 seconds.
 Which is not too bad, but also still too much ...

You might try running fromfile() on a file with no comments, and you 
could see from that how much speed gain is possible -- at some point, 
you're waiting on the disk anyway.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Recommended way to add Cython extension using numpy.distutils?

2010-04-28 Thread Charles R Harris
On Tue, Apr 27, 2010 at 6:09 PM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 We (neuroimaging.scipy.org) are using numpy.distutils, and we have
 .pyx files that we build with Cython.

 I wanted to add these in our current setup.py scripts, with something like:

 def configuration(parent_package='',top_path=None):
from numpy.distutils.misc_util import Configuration
config = Configuration('statistics', parent_package, top_path)
config.add_extension('intvol',
 ['intvol.pyx'], include_dirs = [np.get_include()])
return config

 but of course numpy only knows about Pyrex, and returns:

 error: Pyrex required for compiling
 'nipy/algorithms/statistics/intvol.pyx' but notavailable

 Is there a recommended way to plumb Cython into the numpy build
 machinery?  Should I try and patch numpy distutils to use Cython if
 present?


Patching distutils might be the way to go. We use Cython for the random
build now because Pyrex couldn't handle long strings in a way suitable for
Windows.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndimage.label - howto force SWIG to use int32 - even on 64bit Linux ?

2010-04-28 Thread Charles R Harris
On Tue, Apr 27, 2010 at 2:27 AM, Sebastian Haase seb.ha...@gmail.comwrote:

 Hi,
 I wanted to write some C code to accept labels as they come from
 ndimage.label.
 For some reason ndimage.label produces its output as an int32 array -
 even on my 64bit system .

 BTW, could this be considered a bug ?


Likely.


 Now, if I use the typemaps of numpy.i I can choose between NPY_LONG and
 NPY_INT.
 But those are sometimes 32 sometimes 64 bit, depending on the system.

 Any ideas ... ?


npy_intp.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] floats as axis

2010-04-28 Thread Charles R Harris
On Wed, Apr 28, 2010 at 8:44 AM, Travis Oliphant oliph...@enthought.comwrote:


 On Apr 25, 2010, at 8:16 AM, josef.p...@gmail.com wrote:

  (some) numpy functions take floats as valid axis argument. Is this a
  feature?
 
  np.ones((2,3)).sum(1.2)
  array([ 3.,  3.])
  np.ones((2,3)).sum(1.99)
  array([ 3.,  3.])
 
  np.mean((1.5,0.5))
  1.0
  np.mean(1.5,0.5)
  1.5



 
  Keith pointed out that scipy.stats.nanmean has a different behavior


 I think we should make float inputs raise an error for NumPy 2.0


Agree... Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Charles R Harris
On Wed, Apr 28, 2010 at 10:08 AM, Dag Sverre Seljebotn 
da...@student.matnat.uio.no wrote:

 David Warde-Farley wrote:
  Trying to debug code written by an undergrad working for a colleague of
  mine who ported code over from MATLAB, I am seeing an ugly melange of
  matrix objects and ndarrays that are interacting poorly with each other
  and various functions in SciPy/other libraries. In particular there was
  a custom minimizer function that contained a line a * b, that was
  receiving an Nx1 matrix and a N-length array and computing an outer
  product. Hence the unexpected 6 GB of memory usage and weird results...

 If this was in a library function of some sort, I think they should
 always call np.asarray on the input arguments. That converts matrices to
 normal arrays.

 It could have been Python lists-of-lists, other PEP 3118 objects -- in
 Python an object can be everything in general, and I think it is very
 proper for most reusable functions to either validate the type of their
 arguments or take some steps to convert.

 That said, I second that it would be good to deprecate the matrix class
 from NumPy. The problem for me is not the existance of a matrix class as
 such, but the fact that it subclasses np.ndarray and is so similar with
 it, breaking a lot of rules for OO programming in the process.


Yeah. Masked arrays have similar problems. Pierre has done so much work to
have masked versions of the various functions that it might as well be a
standalone class.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Charles R Harris
On Wed, Apr 28, 2010 at 10:05 AM, Travis Oliphant oliph...@enthought.comwrote:


 On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote:

  Trying to debug code written by an undergrad working for a colleague
  of
  mine who ported code over from MATLAB, I am seeing an ugly melange of
  matrix objects and ndarrays that are interacting poorly with each
  other
  and various functions in SciPy/other libraries. In particular there
  was
  a custom minimizer function that contained a line a * b, that was
  receiving an Nx1 matrix and a N-length array and computing an outer
  product. Hence the unexpected 6 GB of memory usage and weird
  results...

 Overloading '*' and '**' while convenient does have consequences.   It
 would be nice if we could have a few more infix operators in Python to
 allow separation of  element-by-element calculations and dot-product
 calculations.

 A proposal was made to allow calling a NumPy array to infer dot
 product:

 a(b) is equivalent to dot(a,b)

 a(b)(c) would be equivalent to dot(dot(a,b),c)

 This seems rather reasonable.


I like this too. A similar proposal that recently showed up on the list was
to add a dot method to ndarrays so that a(b)(c) would be written
a.dot(b).dot(c).


 While I don't have any spare cycles to push it forward and we are
 already far along on the NumPy to 3.0, I had wondered if we couldn't
 use the leverage of Python core developers wanting NumPy to be ported
 to Python 3 to actually add a few more infix operators to the language.

 One of the problems of moving to Python 3.0 for many people is that
 there are not  new features to outweigh the hassle of moving.
 Having a few more infix operators would be a huge incentive to the
 NumPy community to move to Python 3.

 Anybody willing to lead the charge with the Python developers?


Problem is that we couldn't decide on an appropriate operator. Adding a
keyword that functioned like and would likely break all sorts of code, so
it needs to be something that is not currently seen in the wild.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Recommended way to add Cython extension using numpy.distutils?

2010-04-28 Thread Kevin Jacobs jac...@bioinformed.com
On Tue, Apr 27, 2010 at 8:09 PM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 We (neuroimaging.scipy.org) are using numpy.distutils, and we have
 .pyx files that we build with Cython.

 I wanted to add these in our current setup.py scripts, with something like:

 def configuration(parent_package='',top_path=None):
from numpy.distutils.misc_util import Configuration
config = Configuration('statistics', parent_package, top_path)
config.add_extension('intvol',
 ['intvol.pyx'], include_dirs = [np.get_include()])
return config

 but of course numpy only knows about Pyrex, and returns:

 error: Pyrex required for compiling
 'nipy/algorithms/statistics/intvol.pyx' but notavailable

 Is there a recommended way to plumb Cython into the numpy build
 machinery?  Should I try and patch numpy distutils to use Cython if
 present?


Here is the monkey-patch I'm using in my project:

def evil_numpy_monkey_patch():
  from   numpy.distutils.command import build_src
  import Cython
  import Cython.Compiler.Main
  build_src.Pyrex = Cython
  build_src.have_pyrex = True
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Nikolaus Rath
Robert Kern robert.k...@gmail.com writes:
 Overloading '*' and '**' while convenient does have consequences.   It
 would be nice if we could have a few more infix operators in Python to
 allow separation of  element-by-element calculations and dot-product
 calculations.

http://www.python.org/dev/peps/pep-0225/ was considered and rejected.
But that was in 2000...

 While I don't have any spare cycles to push it forward and we are
 already far along on the NumPy to 3.0, I had wondered if we couldn't
 use the leverage of Python core developers wanting NumPy to be ported
 to Python 3 to actually add a few more infix operators to the language.

I don't think that stands a chance: http://www.python.org/dev/peps/pep-3003/


Best,

   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread David Warde-Farley

On 2010-04-28, at 12:05 PM, Travis Oliphant wrote:

 a(b) is equivalent to dot(a,b)
 
 a(b)(c) would be equivalent to dot(dot(a,b),c)
 
 This seems rather reasonable.

Indeed, and it leads to a rather pleasant way of permuting syntax to change the 
order of operations, i.e. a(b(c)) vs. a(b)(c).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread josef . pktd
On Wed, Apr 28, 2010 at 1:30 PM, David Warde-Farley d...@cs.toronto.edu wrote:

 On 2010-04-28, at 12:05 PM, Travis Oliphant wrote:

 a(b) is equivalent to dot(a,b)

 a(b)(c) would be equivalent to dot(dot(a,b),c)

 This seems rather reasonable.

 Indeed, and it leads to a rather pleasant way of permuting syntax to change 
 the order of operations, i.e. a(b(c)) vs. a(b)(c).

I like the explicit dot method much better, __call__ (parentheses) can
mean anything, and reading the code will be more difficult.
(especially when switching from matlab)

Josef


 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the value of nansum of nan's?

2010-04-28 Thread T J
On Mon, Apr 26, 2010 at 10:03 AM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Mon, Apr 26, 2010 at 10:55 AM, Charles R Harris
 charlesr.har...@gmail.com wrote:

 Hi All,

 We need to make a decision for ticket #1123 regarding what nansum should
 return when all values are nan. At some earlier point it was zero, but
 currently it is nan, in fact it is nan whatever the operation is. That is
 consistent, simple and serves to mark the array or axis as containing all
 nans. I would like to close the ticket and am a bit inclined to go with the
 current behaviour although there is an argument to be made for returning 0
 for the nansum case. Thoughts?


 To add a bit of context, one could argue that the results should be
 consistent with the equivalent operations on empty arrays and always be
 non-nan.

 In [1]: nansum([])
 Out[1]: nan

 In [2]: sum([])
 Out[2]: 0.0


This seems like an obvious one to me.  What is the spirit of nansum?


Return the sum of array elements over a given axis treating
Not a Numbers (NaNs) as zero.


Okay.  So NaNs in an array are treated as zeros and the sum is
performed as one normally would perform it starting with an initial
sum of zero.  So if all values are NaN, then we add nothing to our
original sum and still return 0.

I'm not sure I understand the argument that it should return NaN.  It
is counter to the *purpose* of nansum.   Also, if one wants to
determine if all values in an array are NaN, isn't there another way?
Let's keep (or make) those distinct operations, as they are definitely
distinct concepts.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Alan G Isaac
On 4/28/2010 12:05 PM, Travis Oliphant wrote:
 A proposal was made to allow calling a NumPy array to infer dot
 product:

 a(b) is equivalent to dot(a,b)

 a(b)(c) would be equivalent to dot(dot(a,b),c)


Here is a related ticket that proposes a more
explicit alternative: adding a ``dot`` method
to ndarray.
http://projects.scipy.org/numpy/ticket/1456

fwiw,
Alan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Recommended way to add Cython extension using numpy.distutils?

2010-04-28 Thread Matthew Brett
Hi,

Thanks a lot for the suggestion - I appreciate it.

 Is there a recommended way to plumb Cython into the numpy build
 machinery?  Should I try and patch numpy distutils to use Cython if
 present?


 Here is the monkey-patch I'm using in my project:
 def evil_numpy_monkey_patch():
   from   numpy.distutils.command import build_src
   import Cython
   import Cython.Compiler.Main
   build_src.Pyrex = Cython
   build_src.have_pyrex = True

I think this patch does not work for current numpy trunk;  I've put a
minimal test case here:

http://github.com/matthew-brett/du-cy-numpy

If you run the setup.py there (python setup.py build) then all works
fine for - say - numpy 1.1.  For current trunk you get an error ending
in:

  File 
/Users/mb312/usr/local/lib/python2.6/site-packages/numpy/distutils/command/build_src.py,
line 466, in generate_a_pyrex_source
if self.inplace or not have_pyrex():
TypeError: 'bool' object is not callable

which is easily fixable of course ('build_src.have_pyrex = lambda :
True') - leading to:

  File 
/Users/mb312/usr/local/lib/python2.6/site-packages/numpy/distutils/command/build_src.py,
line 474, in generate_a_pyrex_source
import Pyrex.Compiler.Main
ImportError: No module named Pyrex.Compiler.Main

I'm afraid I did a rather crude monkey-patch to replace the
'generate_a_pyrex_source' function.  It seems to work for numpy 1.1
and current trunk.  The patching process is here:

http://github.com/matthew-brett/du-cy-numpy/blob/master/matthew_monkey.py

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Travis Oliphant

On Apr 28, 2010, at 11:19 AM, Robert Kern wrote:

 On Wed, Apr 28, 2010 at 11:05, Travis Oliphant  
 oliph...@enthought.com wrote:

 On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote:

 Trying to debug code written by an undergrad working for a colleague
 of
 mine who ported code over from MATLAB, I am seeing an ugly melange  
 of
 matrix objects and ndarrays that are interacting poorly with each
 other
 and various functions in SciPy/other libraries. In particular there
 was
 a custom minimizer function that contained a line a * b, that was
 receiving an Nx1 matrix and a N-length array and computing an  
 outer
 product. Hence the unexpected 6 GB of memory usage and weird
 results...

 Overloading '*' and '**' while convenient does have consequences.
 It
 would be nice if we could have a few more infix operators in Python  
 to
 allow separation of  element-by-element calculations and dot- 
 product
 calculations.

 A proposal was made to allow calling a NumPy array to infer dot
 product:

 a(b) is equivalent to dot(a,b)

 a(b)(c) would be equivalent to dot(dot(a,b),c)

 This seems rather reasonable.


 While I don't have any spare cycles to push it forward and we are
 already far along on the NumPy to 3.0, I had wondered if we couldn't
 use the leverage of Python core developers wanting NumPy to be ported
 to Python 3 to actually add a few more infix operators to the  
 language.

 One of the problems of moving to Python 3.0 for many people is that
 there are not  new features to outweigh the hassle of moving.
 Having a few more infix operators would be a huge incentive to the
 NumPy community to move to Python 3.

 Anybody willing to lead the charge with the Python developers?

 There is currently a moratorium on language changes. This will have  
 to wait.

Exceptions can always be made for the right reasons.I don't think  
this particular question has received sufficient audience with Python  
core developers.The reason they want the moratorium is for  
stability, but they also want Python 3k to be adopted.

-Travis

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Travis Oliphant

On Apr 28, 2010, at 11:50 AM, Nikolaus Rath wrote:

 Robert Kern robert.k...@gmail.com writes:
 Overloading '*' and '**' while convenient does have  
 consequences.   It
 would be nice if we could have a few more infix operators in  
 Python to
 allow separation of  element-by-element calculations and dot- 
 product
 calculations.

 http://www.python.org/dev/peps/pep-0225/ was considered and rejected.
 But that was in 2000...

 While I don't have any spare cycles to push it forward and we are
 already far along on the NumPy to 3.0, I had wondered if we couldn't
 use the leverage of Python core developers wanting NumPy to be  
 ported
 to Python 3 to actually add a few more infix operators to the  
 language.

 I don't think that stands a chance: http://www.python.org/dev/peps/pep-3003/

Frankly, I still think we should move forward.  It will take us as  
long as the moratorium is in effect to figure out what operators we  
want anyway and we can do things like put attributes on arrays in the  
meantime to implement the infix operators we think we need.

It's too bad we don't have more of a voice with the Python core  
team.This is our fault of course (we don't have people with spare  
cycles to spend the time interfacing), but it's still too bad.

-Travis

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Alan G Isaac
On 4/28/2010 12:08 PM, Dag Sverre Seljebotn wrote:
 it would be good to deprecate the matrix class
 from NumPy


Please let us not have this discussion all over again.

The matrix class is very useful for teaching.
In economics for example, the use of matrix algebra
is widespread, while algebra with arrays that are
not matrices is very rare.  I can (and do) use NumPy
matrices even in undergraduate courses.

If you do not like them, do not use them.

If you want `matrix` replaced with a better matrix
object, offer a replacement for community consideration.

Thank you,
Alan Isaac

PS There is one change I would not mind: let
A * M be undefined if A is an ndarray and
M is a NumPy matrix.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Fwd: Advice for grouping recarrays

2010-04-28 Thread Tom Denniston
Someone inquired about this one today and I wanted to clarify there is
now a better way to do this that I didn't know about when I posted the
original:

 ind = numpy.array([0,0,0,0,1,1,1,2,2,2,])
 data = numpy.arange(10)
 borders = numpy.arange(len(ind)).compress(numpy.hstack([[1], 
 ind[1:]!=ind[:-1]]))
 numpy.add.reduceat(data, borders)
array([ 6, 15, 24])





On Tue, Jul 18, 2006 at 8:49 AM, Tom Denniston
tom.dennis...@alum.dartmouth.org wrote:
 I suggest

 lexsort
 itertools.groupby of the indices
 take

 I think it would be really great if numpy had the first two as a
 function or something like that.  It is really useful to be able to
 take an array and bucket it and apply further numpy operations like
 accumulation functions.

 On 7/18/06, Stephen Simmons m...@stevesimmons.com wrote:

 Hi,

 Does anyone have any suggestions for summarising data in numpy?

 The quick description is that I want to do something like the SQL
 statement:
   SELECT sum(field1), sum(field2) FROM  table GROUP BY field3;

 The more accurate description is that my data is stored in PyTables HDF
 format, with 24 monthly files, each with 4m records describing how
 customers performed that month. Each record looks something like this:
 ('200604', 65140450L, '800', 'K', 12L, 162.0, 2000.0, 0.054581, 0.0,
 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0,
 8.80, 0.86, 7.80 17.46, 0.0, 70.0, 0.0, 70.0, -142.93, 0.0, 2000.0,
 2063.93, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -9.71, 7.75,
 87.46, 77.75, -3.45, 0.22, -0.45, -0.57, 73.95)
 The first 5 fields are status fields (month_string, account_number,
 product_code, account_status, months_since_customer_joined). The
 remaining 48 fields represent different aspects of the customer's
 performance during that month. I read 100,000 of these records at a time
 and turn them into a numpy recarray with:
   dat = hdf_table.read(start=pos, stop=pos+block_size)
   dat = numpy.asarray(dat._flatArray, dtype=dat.array_descr)

 I'd like to reduce these 96m records x 53 fields down to monthly
 averages for each tuple (month_string, months_since_customer_joined)
 which in the case above is ('200604', 12L). This will let me compare the
 performance of newly acquired customers at the same point in their
 lifecycle as customers acquired 1 or 2 years ago.

 The end result should be a dataset something like
   res[month_index, months_since_customer_joined]
   = array([ num_records, sum_field_5, sum_field_6, sum_field_7, ...
 sum_field_52 ])
 with a shape of (24, 24, 49).

 I've played around with lexsort(), take(), sum(), etc, but get very
 confused and end up feeling that I'm making things more complicated than
 they need to be. So any advice from numpy veterans on how best to
 proceed would be very welcome!

 Cheers

 Stephen

 -
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to share
 your
 opinions on IT  business topics through brief surveys -- and earn cash
 http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Numpy-discussion mailing list
 numpy-discuss...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Anne Archibald
On 28 April 2010 14:30, Alan G Isaac ais...@american.edu wrote:
 On 4/28/2010 12:08 PM, Dag Sverre Seljebotn wrote:
 it would be good to deprecate the matrix class
 from NumPy

 Please let us not have this discussion all over again.

I think you may be too late on this, but it's worth a try.

 The matrix class is very useful for teaching.
 In economics for example, the use of matrix algebra
 is widespread, while algebra with arrays that are
 not matrices is very rare.  I can (and do) use NumPy
 matrices even in undergraduate courses.

 If you do not like them, do not use them.

This is the problem: lots of people start using numpy and think hmm,
I want to store two-dimensional data so I'll use a matrix, and have
no idea that matrix means anything different from two-dimensional
array. It was this that inspired David's original post, and it's this
that we're trying to find a solution for.

 If you want `matrix` replaced with a better matrix
 object, offer a replacement for community consideration.

 Thank you,
 Alan Isaac

 PS There is one change I would not mind: let
 A * M be undefined if A is an ndarray and
 M is a NumPy matrix.

I can definitely vote for this, in the interest of catching as many
inadvertent matrix users as possible.

Anne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Robert Kern
On Wed, Apr 28, 2010 at 15:50, Travis Oliphant oliph...@enthought.com wrote:

 On Apr 28, 2010, at 11:19 AM, Robert Kern wrote:

 On Wed, Apr 28, 2010 at 11:05, Travis Oliphant
 oliph...@enthought.com wrote:

 On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote:

 Trying to debug code written by an undergrad working for a colleague
 of
 mine who ported code over from MATLAB, I am seeing an ugly melange
 of
 matrix objects and ndarrays that are interacting poorly with each
 other
 and various functions in SciPy/other libraries. In particular there
 was
 a custom minimizer function that contained a line a * b, that was
 receiving an Nx1 matrix and a N-length array and computing an
 outer
 product. Hence the unexpected 6 GB of memory usage and weird
 results...

 Overloading '*' and '**' while convenient does have consequences.
 It
 would be nice if we could have a few more infix operators in Python
 to
 allow separation of  element-by-element calculations and dot-
 product
 calculations.

 A proposal was made to allow calling a NumPy array to infer dot
 product:

 a(b) is equivalent to dot(a,b)

 a(b)(c) would be equivalent to dot(dot(a,b),c)

 This seems rather reasonable.


 While I don't have any spare cycles to push it forward and we are
 already far along on the NumPy to 3.0, I had wondered if we couldn't
 use the leverage of Python core developers wanting NumPy to be ported
 to Python 3 to actually add a few more infix operators to the
 language.

 One of the problems of moving to Python 3.0 for many people is that
 there are not  new features to outweigh the hassle of moving.
 Having a few more infix operators would be a huge incentive to the
 NumPy community to move to Python 3.

 Anybody willing to lead the charge with the Python developers?

 There is currently a moratorium on language changes. This will have
 to wait.

 Exceptions can always be made for the right reasons.    I don't think
 this particular question has received sufficient audience with Python
 core developers.

It received plenty of audience on python-dev in 2008. But no one from
our community cared enough to actually implement it.

  http://fperez.org/py4science/numpy-pep225/numpy-pep225.html

 The reason they want the moratorium is for
 stability, but they also want Python 3k to be adopted.

This is not something that will justify an exception. Things like oh
crap, this old feature has a lurking flaw that we've never noticed
before and needs a language change to fix are possible exceptions to
the moratorium, not something like this. PEP 3003 quite clearly lays
out the possible exceptions:


Case-by-Case Exemptions

  New methods on built-ins

The case for adding a method to a built-in object can be made.

  Incorrect language semantics

If the language semantics turn out to be ambiguous or improperly
implemented based on the intention of the original design then the
semantics may change.

  Language semantics that are difficult to implement

Because other VMs have not begun implementing Python 3.x semantics
there is a possibility that certain semantics are too difficult to
replicate. In those cases they can be changed to ease adoption of
Python 3.x by the other VMs.


This feature falls into none of these categories. It does fall into this one:


Cannot Change
  ...
  Language syntax

The grammar file essentially becomes immutable apart from ambiguity fixes.


Guido is taking a hard line on this.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread Skipper Seabold
On Wed, Apr 28, 2010 at 2:12 PM, Alan G Isaac ais...@american.edu wrote:
 On 4/28/2010 12:05 PM, Travis Oliphant wrote:
 A proposal was made to allow calling a NumPy array to infer dot
 product:

 a(b) is equivalent to dot(a,b)

 a(b)(c) would be equivalent to dot(dot(a,b),c)


 Here is a related ticket that proposes a more
 explicit alternative: adding a ``dot`` method
 to ndarray.
 http://projects.scipy.org/numpy/ticket/1456


FWIW, I have borrowed a convenience function chain_dot originally from
pandas that works for me as a stop gap for more readable code.

def chain_dot(*arrs):

Returns the dot product of the given matrices.

Parameters
--
arrs: argument list of ndarrays

Returns
---
Dot product of all arguments.

Example
---
 import numpy as np
 from scikits.statsmodels.tools import chain_dot
 A = np.arange(1,13).reshape(3,4)
 B = np.arange(3,15).reshape(4,3)
 C = np.arange(5,8).reshape(3,1)
 chain_dot(A,B,C)
array([[1820],
   [4300],
   [6780]])

return reduce(lambda x, y: np.dot(y, x), arrs[::-1])

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposing a beware of [as]matrix() warning

2010-04-28 Thread David Warde-Farley
On 2010-04-28, at 2:30 PM, Alan G Isaac wrote:

 Please let us not have this discussion all over again.

Agreed. See my preface to this discussion.

My main objection is that it's not easy to explain to a newcomer what the 
difference precisely is, how they interact, why two of them exist, how they are 
sort-of-compatible-but-not...

 The matrix class is very useful for teaching.
 In economics for example, the use of matrix algebra
 is widespread, while algebra with arrays that are
 not matrices is very rare.  I can (and do) use NumPy
 matrices even in undergraduate courses.

Would it be acceptable to retain the matrix class but not have it imported in 
the default namespace, and have to import e.g. numpy.matlib to get at them?

 If you do not like them, do not use them.

The problem isn't really with seasoned users of NumPy not liking them, but 
rather new users being confused by the presence of (what seems to be) two 
primitives, array and matrix.

Two things tend to happen:

a) Example code that expects arrays instead receives matrices. If these aren't 
cast with asarray(), mayhem ensues at the first sight of *.

b) Users of class matrix use a proper function correctly coerces input to 
ndarray, but returns an ndarray. Users are thus confused that, thinking of the 
function as a black box, putting matrices 'in' doesn't result in getting 
matrices 'out'. It doesn't take long to get the hang of if you really sit down 
and work it through, but it also doesn't take long to go back to MATLAB or 
whatever else. My interest is in having as few conceptual stumbling stones as 
possible.

c) Complicating the situation further, people try to use functions e.g. from 
scipy.optimize which expect a 1d array by passing in column or row matrices. 
Even when coerced to array, these have the wrong rank and you get unexpected 
results (you could argue that we should instead use asarray(..).squeeze() on 
all incoming arguments, but this may not generalize well).

 PS There is one change I would not mind: let
 A * M be undefined if A is an ndarray and
 M is a NumPy matrix.

What about the other binary ops? I would say, matrix goes with matrix, array 
with array, never the two shall meet unless you explicitly coerce. The ability 
to mix the two in a single expression does more harm than good, IMHO. 

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndimage.label - howto force SWIG to use int32 - even on 64bit Linux ?

2010-04-28 Thread Bill Spotz
Both types of typemaps are enabled, so you just need to do you %apply  
directives correctly:

   %apply (npy_intp* IN_ARRAY1, int DIM1) {(npy_intp* seq, int n)};
   etc

SWIG should be able to figure it out from there.

On Apr 28, 2010, at 12:58 PM, Charles R Harris wrote:

 On Tue, Apr 27, 2010 at 2:27 AM, Sebastian Haase  
 seb.ha...@gmail.com wrote:
 Hi,
 I wanted to write some C code to accept labels as they come from  
 ndimage.label.
 For some reason ndimage.label produces its output as an int32 array -
 even on my 64bit system .

 BTW, could this be considered a bug ?


 Likely.

 Now, if I use the typemaps of numpy.i I can choose between NPY_LONG  
 and NPY_INT.
 But those are sometimes 32 sometimes 64 bit, depending on the system.

 Any ideas ... ?

 npy_intp.

 Chuck

** Bill Spotz  **
** Sandia National Laboratories  Voice: (505)845-0170  **
** P.O. Box 5800 Fax:   (505)284-0154  **
** Albuquerque, NM 87185-0370Email: wfsp...@sandia.gov **






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion