[Numpy-discussion] What should be the value of nansum of nan's?
Hi All, We need to make a decision for ticket #1123http://projects.scipy.org/numpy/ticket/1123#comment:11regarding what nansum should return when all values are nan. At some earlier point it was zero, but currently it is nan, in fact it is nan whatever the operation is. That is consistent, simple and serves to mark the array or axis as containing all nans. I would like to close the ticket and am a bit inclined to go with the current behaviour although there is an argument to be made for returning 0 for the nansum case. Thoughts? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should be the value of nansum of nan's?
On Mon, Apr 26, 2010 at 10:55 AM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, We need to make a decision for ticket #1123http://projects.scipy.org/numpy/ticket/1123#comment:11regarding what nansum should return when all values are nan. At some earlier point it was zero, but currently it is nan, in fact it is nan whatever the operation is. That is consistent, simple and serves to mark the array or axis as containing all nans. I would like to close the ticket and am a bit inclined to go with the current behaviour although there is an argument to be made for returning 0 for the nansum case. Thoughts? To add a bit of context, one could argue that the results should be consistent with the equivalent operations on empty arrays and always be non-nan. In [1]: nansum([]) Out[1]: nan In [2]: sum([]) Out[2]: 0.0 Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Memory profiling NumPy code?
I know you're looking for something with much more fine-grained control, (which I can't help much with) but I often find it useful to just plot the overall memory of the program over time. There may be an slicker way to do it, but here's the script I use, anyway... (saved as ~/bin/quick_profile, usage quick_profile (whatever), e.g. quick_profile python script.py) # /bin/sh # Setup datfile=$(mktemp) echo ElapsedTime MemUsed $datfile starttime=$(date +%s.%N) # Run the specified command in the background $@ # While the last process is still going while [ -n `ps --no-headers $!` ] do bytes=$(ps -o rss -C $1 --no-headers | awk '{SUM += $1} END {print SUM}') elapsed=$(echo $(date +%s.%N) - $starttime | bc) echo $elapsed $bytes $datfile sleep 0.1 done # Plot up the results with matplotlib cat EOF | python import pylab, sys, numpy infile = file($datfile) infile.readline() # skip first line data = [[float(dat) for dat in line.strip().split()] for line in infile] data = numpy.array(data) time,mem = data[:,0], data[:,1]/1024 pylab.plot(time,mem) pylab.title(Profile of: \%s\ % $@) pylab.xlabel('Elapsed Time (s): Total %0.5f s' % time.max()) pylab.ylabel('Memory Used (MB): Peak %0.2f MB' % mem.max()) pylab.show() EOF rm $datfile Hope that helps a bit, anyway... -Joe On Mon, Apr 26, 2010 at 6:16 AM, David Cournapeau courn...@gmail.comwrote: On Mon, Apr 26, 2010 at 7:57 PM, Dag Sverre Seljebotn da...@student.matnat.uio.no wrote: I'd like to profile the memory usage of my application using tools like e.g. Heapy. However since NumPy arrays are allocated in C they are not tracked by Python memory profiling. Does anybody have a workaround to share? I really just want to track a few arrays in a friendly way from within Python (I am aware of the existance of C-level profilers). I think heapy has some hooks so that you can add support for extensions. Maybe we could provide a C API in numpy to make this easy, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] proposing a beware of [as]matrix() warning
Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line a * b, that was receiving an Nx1 matrix and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results... We've had this discussion before and it seems that the matrix class isn't going anywhere (I *really* wish it would at least be banished from the top-level namespace), but it has its adherents for pedagogical reasons. Could we at least consider putting a gigantic warning on all the functions for creating matrix objects (matrix, mat, asmatrix, etc.) that they may not behave quite so predictably in some situations and should be avoided when writing nontrivial code? There are already such warnings scattered about on SciPy.org but the situation is so bad, in my opinion (bad from a programming perspective and bad from a new user perspective, asking why doesn't this work? why doesn't that work? why is this language/library/etc. so stupid, inconsistent, etc.?) that the situation warrants steering people still further away from the matrix object. I apologize for ranting, but it pains me when people give up on Python/NumPy because they can't figure out inconsistencies that aren't really there for a good reason. IMHO, of course. David David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ndimage.label - howto force SWIG to use int32 - even on 64bit Linux ?
Hi, I wanted to write some C code to accept labels as they come from ndimage.label. For some reason ndimage.label produces its output as an int32 array - even on my 64bit system . BTW, could this be considered a bug ? Now, if I use the typemaps of numpy.i I can choose between NPY_LONG and NPY_INT. But those are sometimes 32 sometimes 64 bit, depending on the system. Any ideas ... ? Thanks, Sebastian Haase ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Speeding up loadtxt / savetxt
Hi Andreas On 23 April 2010 10:16, Andreas li...@hilboll.de wrote: I was wondering if there's a way to speedup loadtxt/savetxt for big arrays? So far, I'm plainly using something like this:: Do you specifically need to store text files? NumPy's binary storage functions (numpy.load and save) are faster. Also, an efficient reader for very simply formatted text is provided by numpy.fromfile. Regards Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ma.std(ddof=1) bug?
On Apr 23, 2010, at 12:45 PM, josef.p...@gmail.com wrote: Is there a reason why ma.std(ddof=1) does not calculated the std if there are 2 valid values? Bug! Good call... Should be fixed in SVN r8370. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [ANN] la 0.2, the labeled array
I am pleased to announce the second release of the la package, version 0.2. The main class of the la package is a labeled array, larry. A larry consists of a data array and a label list. The data array is stored as a NumPy array and the label list as a list of lists. larry has built-in methods such as movingsum, ranking, merge, shuffle, zscore, demean, lag as well as typical Numpy methods like sum, max, std, sign, clip. NaNs are treated as missing data. Alignment by label is automatic when you add (or subtract, multiply, divide) two larrys. larry adds the convenience of labels, provides many built-in methods, and let's you use many of your existing array functions. Download: https://launchpad.net/larry/+download docs http://larry.sourceforge.net code https://launchpad.net/larry list http://groups.google.ca/group/pystatsmodels = Release Notes = la 0.2 (avocado) *Release date: 2010-04-27* New larry methods - - lix : Index into a larry using labels or index numbers or both - swapaxes : Swap the two specified axes - sortaxis : Sort data (and label) according to label along specified axis - flipaxis : Reverse the order of the elements along the specified axis - tocsv : Save larry to a csv file - fromcsv : Load a larry from a csv file - insertaxis : Insert a new axis at the specified position - invert : Element by element inverting of True to False and False to True Enhancements - All larry methods can now take nd input arrays (some previously 2d only) - Added ability to save larrys with datetime.date labels to HDF5 - New function (panel) to convert larry of shape (n, m, k) to shape (m*k, n) - Expanded documentation - Over 280 new unit tests; testing easier with new assert_larry_equal function Bug fixes - - #517912: larry([]) == larry([]) raised IndexError - #518096: larry.fromdict failed due to missing import - #518106: la.larry.fromdict({}) failed - #518114: fromlist([]) and fromtuples([]) failed - #518135: keep_label crashed when there was nothing to keep - #518210: sum, std, var returned NaN for empty larrys; now return 0.0 - #518215: unflatten crashed on an empty larry - #518442: sum, std, var returned NaN for shapes that contain zero: (2, 0, 3) - #568175: larry.std(axis=-1) and var crashed on negative axis input - #569622: Negative axis input gave wrong output for several larry methods ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Recommended way to add Cython extension using numpy.distutils?
Hi, We (neuroimaging.scipy.org) are using numpy.distutils, and we have .pyx files that we build with Cython. I wanted to add these in our current setup.py scripts, with something like: def configuration(parent_package='',top_path=None): from numpy.distutils.misc_util import Configuration config = Configuration('statistics', parent_package, top_path) config.add_extension('intvol', ['intvol.pyx'], include_dirs = [np.get_include()]) return config but of course numpy only knows about Pyrex, and returns: error: Pyrex required for compiling 'nipy/algorithms/statistics/intvol.pyx' but notavailable Is there a recommended way to plumb Cython into the numpy build machinery? Should I try and patch numpy distutils to use Cython if present? Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] passing non-flat array to interpolator
On Mon, Apr 26, 2010 at 12:04 PM, Thomas temesgen...@gmail.com wrote: I have some problem with interpolators in Scipy does anyone knows if there is a way to pass a non-flat array variables to Rbf, or other Scipy interpolator eg. for my case of 17 x 1 problems of 500 data size x1.shape = (500,) x2.shape = (500,) ... X17.shape =(500,) b = Rbf(x1,x2,x3,...,x17,y) i would rather create a non-flat variables x.shape =(500,17) and pass it to Rbf, or even for all the interpolator in Scipy as bf = Rbf(X, Y) How can i do this ? Thank you for your time. Thomas Rbf(*np.c_[X,Y].T) or Rbf(*(list(X.T)+[Y])) I think the second version does not make a copy of the data when building the list. It would be easier if the xs and y were reversed in the signature, Rbf(y, *X.T) Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ndimage.label - howto force SWIG to use int32 - even on 64bit Linux ?
(2nd try to get this post into the mailing list archive...) Hi, I wanted to write some C code to accept labels as they come from ndimage.label. For some reason ndimage.label produces its output as an int32 array - even on my 64bit system . BTW, could this be considered a bug ? Now, if I use the typemaps of numpy.i I can choose between NPY_LONG and NPY_INT. But those are sometimes 32 sometimes 64 bit, depending on the system. Any ideas ... ? Thanks, Sebastian Haase ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Incomplete uninstall of 1.4.0 superpack
On 04/27/2010 01:08 AM, threexk threexk wrote: David Cournapeau wrote: On Mon, Apr 26, 2010 at 2:42 AM, threexk threexk thre...@hotmail.com wrote: Hello, I recently uninstalled the NumPy 1.4.0 superpack for Python 2.6 on Windows 7, and afterward a dialog popped up that said 1 file or directory could not be removed. Does anyone have any idea which file/directory this is? The dialog gave no indication. Is an uninstall log with details generated anywhere? There should be one in C:\Python*, something like numpy-*-wininst.log Looks like that log gets deleted after uninstallation (as it probably should), so I still could not figure out which file/directory was not deleted. I found that \Python26\Lib\site-packages\numpy and many files/directories under it have remained after uninstall. So, I tried reinstalling 1.4.0 and uninstalling again. This time, the uninstaller did not report not being able to remove files/directories, but it still did not delete the aforementioned numpy directory. I believe this is a bug with the uninstaller? Could you maybe post the log (before uninstalling) and list the remaining files ? Note though that we most likely won't be able to do much - we do not have much control over the generated installers, cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] floats as axis
On Apr 25, 2010, at 8:16 AM, josef.p...@gmail.com wrote: (some) numpy functions take floats as valid axis argument. Is this a feature? np.ones((2,3)).sum(1.2) array([ 3., 3.]) np.ones((2,3)).sum(1.99) array([ 3., 3.]) np.mean((1.5,0.5)) 1.0 np.mean(1.5,0.5) 1.5 Keith pointed out that scipy.stats.nanmean has a different behavior I think we should make float inputs raise an error for NumPy 2.0 -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote: Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line a * b, that was receiving an Nx1 matrix and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results... Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and dot-product calculations. A proposal was made to allow calling a NumPy array to infer dot product: a(b) is equivalent to dot(a,b) a(b)(c) would be equivalent to dot(dot(a,b),c) This seems rather reasonable. While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language. One of the problems of moving to Python 3.0 for many people is that there are not new features to outweigh the hassle of moving. Having a few more infix operators would be a huge incentive to the NumPy community to move to Python 3. Anybody willing to lead the charge with the Python developers? -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Speeding up loadtxt / savetxt
Hi Stéfan, Do you specifically need to store text files? NumPy's binary storage functions (numpy.load and save) are faster. Yes, I know. But the files I create must be readable by an application developed in-house at our institude, and that only supports a) ASCII files or b) some home-grown binary format, which I hate. Also, an efficient reader for very simply formatted text is provided by numpy.fromfile. Yes, I heard about it. But the files I have to read have comments in them, and I didn't find a way to exclude these easily. Time needed to read a 100M file is ~13 seconds, and to write ~5 seconds. Which is not too bad, but also still too much ... Thanks, Andreas ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
David Warde-Farley wrote: Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line a * b, that was receiving an Nx1 matrix and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results... If this was in a library function of some sort, I think they should always call np.asarray on the input arguments. That converts matrices to normal arrays. It could have been Python lists-of-lists, other PEP 3118 objects -- in Python an object can be everything in general, and I think it is very proper for most reusable functions to either validate the type of their arguments or take some steps to convert. That said, I second that it would be good to deprecate the matrix class from NumPy. The problem for me is not the existance of a matrix class as such, but the fact that it subclasses np.ndarray and is so similar with it, breaking a lot of rules for OO programming in the process. (Example: I happen to have my own oomatrix.py which allows me to do P, L = (A * A.H).cholesky() y = L.solve_right(x) This works fine because the matrices don't support any NumPy operations, and so I don't confuse them. But it helps to have to habit to do np.asarray in reusable functions so that errors are caught early. I do this so that A above can be either sparse, dense, triangular, diagonal, etc. -- i.e. polymorphic linear algebra. On the other hand, they don't even support single-element lookups, although that's just because I've been to lazy to implement it. Iteration is out of the question, it's just not the level of abstraction I'd like a matrix to work at.) Dag Sverre We've had this discussion before and it seems that the matrix class isn't going anywhere (I *really* wish it would at least be banished from the top-level namespace), but it has its adherents for pedagogical reasons. Could we at least consider putting a gigantic warning on all the functions for creating matrix objects (matrix, mat, asmatrix, etc.) that they may not behave quite so predictably in some situations and should be avoided when writing nontrivial code? There are already such warnings scattered about on SciPy.org but the situation is so bad, in my opinion (bad from a programming perspective and bad from a new user perspective, asking why doesn't this work? why doesn't that work? why is this language/library/etc. so stupid, inconsistent, etc.?) that the situation warrants steering people still further away from the matrix object. I apologize for ranting, but it pains me when people give up on Python/NumPy because they can't figure out inconsistencies that aren't really there for a good reason. IMHO, of course. David David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should be the value of nansum of nan's?
On Apr 26, 2010, at 12:03 PM, Charles R Harris wrote: On Mon, Apr 26, 2010 at 10:55 AM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, We need to make a decision for ticket #1123 regarding what nansum should return when all values are nan. At some earlier point it was zero, but currently it is nan, in fact it is nan whatever the operation is. That is consistent, simple and serves to mark the array or axis as containing all nans. I would like to close the ticket and am a bit inclined to go with the current behaviour although there is an argument to be made for returning 0 for the nansum case. Thoughts? To add a bit of context, one could argue that the results should be consistent with the equivalent operations on empty arrays and always be non-nan. In [1]: nansum([]) Out[1]: nan In [2]: sum([]) Out[2]: 0.0 I favor nansum([]) returning 0.0 which implies returning 0.0 when all the elements are nan. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On Wed, Apr 28, 2010 at 11:05, Travis Oliphant oliph...@enthought.com wrote: On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote: Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line a * b, that was receiving an Nx1 matrix and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results... Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and dot-product calculations. A proposal was made to allow calling a NumPy array to infer dot product: a(b) is equivalent to dot(a,b) a(b)(c) would be equivalent to dot(dot(a,b),c) This seems rather reasonable. While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language. One of the problems of moving to Python 3.0 for many people is that there are not new features to outweigh the hassle of moving. Having a few more infix operators would be a huge incentive to the NumPy community to move to Python 3. Anybody willing to lead the charge with the Python developers? There is currently a moratorium on language changes. This will have to wait. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should be the value of nansum of nan's?
Travis Oliphant wrote: On Apr 26, 2010, at 12:03 PM, Charles R Harris wrote: On Mon, Apr 26, 2010 at 10:55 AM, Charles R Harris charlesr.har...@gmail.com mailto:charlesr.har...@gmail.com wrote: Hi All, We need to make a decision for ticket #1123 http://projects.scipy.org/numpy/ticket/1123#comment:11 regarding what nansum should return when all values are nan. At some earlier point it was zero, but currently it is nan, in fact it is nan whatever the operation is. That is consistent, simple and serves to mark the array or axis as containing all nans. I would like to close the ticket and am a bit inclined to go with the current behaviour although there is an argument to be made for returning 0 for the nansum case. Thoughts? To add a bit of context, one could argue that the results should be consistent with the equivalent operations on empty arrays and always be non-nan. In [1]: nansum([]) Out[1]: nan In [2]: sum([]) Out[2]: 0.0 I favor nansum([]) returning 0.0 which implies returning 0.0 when all the elements are nan. +1 -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should be the value of nansum of nan's?
On Mon, Apr 26, 2010 at 9:55 AM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, We need to make a decision for ticket #1123 regarding what nansum should return when all values are nan. At some earlier point it was zero, but currently it is nan, in fact it is nan whatever the operation is. That is consistent, simple and serves to mark the array or axis as containing all nans. I would like to close the ticket and am a bit inclined to go with the current behaviour although there is an argument to be made for returning 0 for the nansum case. Thoughts? I use nansum a lot because I treat NaNs as missing data. I think a lot of people use NaNs as missing data but few admit it. My packages have grown to depend on nansum([nan, nan]) returning NaN. I vote to keep the current behavior. Changing nansum([]) to return zero, however, has no impact on me. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in MaskedArray min/max functions
if a masked array is created and has no masked values it can have a mask of just False scalar false that is. This causes an error when getting the max or min on axis=1 I don't have a fix but do offer a workaround. If you reset the mask to the expandedmask all works ok (but it's a bug that pops up all over my code) Below a and m1 work but m5 only works on axis = 0. anyway you can look at: (see http://www.openvest.com/trac/wiki/MaskedArrayMinMax if the formatting doesn't survive the mail posting process) # import numpy as np a = np.array([np.arange(5)]) a array([[0, 1, 2, 3, 4]]) m1 = np.ma.masked_values(a,1) m5 = np.ma.masked_values(a,5) m1 masked_array(data = [[0 -- 2 3 4]], mask = [[False True False False False]], fill_value = 1) m5 masked_array(data = [[0 1 2 3 4]], mask = False, fill_value = 5) a.min(axis=0) array([0, 1, 2, 3, 4]) m5.min(axis=0) masked_array(data = [0 1 2 3 4], mask = False, fill_value = 99) m1.min(axis=0) masked_array(data = [0 -- 2 3 4], mask = [False True False False False], fill_value = 99) a.min(axis=1) array([0]) m1.min(axis=1) masked_array(data = [0], mask = [False], fill_value = 99) m5.min(axis=1) Traceback (most recent call last): File stdin, line 1, in module File /Library/Python/2.6/site-packages/numpy/ma/core.py, line 5020, in min newmask = _mask.all(axis=axis) ValueError: axis(=1) out of bounds ### workaround m5.mask = np.ma.getmaskarray(m5) m1.min(axis=1) masked_array(data = [0], mask = [False], fill_value = 99) -- Philip J. Cooper (CFA)___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Speeding up loadtxt / savetxt
Andreas Hilboll wrote: Yes, I know. But the files I create must be readable by an application developed in-house at our institude, and that only supports a) ASCII files or b) some home-grown binary format, which I hate. Also, an efficient reader for very simply formatted text is provided by numpy.fromfile. Yes, I heard about it. But the files I have to read have comments in them, and I didn't find a way to exclude these easily. you can't do i with fromfile -- I think it would be Very useful to have a fromfile() like functionality with a few more features: comments lines and allowing non-whitespace delimiters while reading multiple lines. See my posts about this in the past. I did spend a non-trivial amount of time looking into how to add these features, and fix some bugs in the process -- again, see my posts in the past. It turns out that the fromfile code is some pretty ugly C--a result of supporting all numpy data types, and compatibility with tradition C functions--so it's a bit of a chore, at least for a lame C programmer like me. I'm still not sure what I'll do when I get some time to look at this again -- I may simply start from scratch with Cython. It would be great if someone wanted to take it on Time needed to read a 100M file is ~13 seconds, and to write ~5 seconds. Which is not too bad, but also still too much ... You might try running fromfile() on a file with no comments, and you could see from that how much speed gain is possible -- at some point, you're waiting on the disk anyway. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Recommended way to add Cython extension using numpy.distutils?
On Tue, Apr 27, 2010 at 6:09 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, We (neuroimaging.scipy.org) are using numpy.distutils, and we have .pyx files that we build with Cython. I wanted to add these in our current setup.py scripts, with something like: def configuration(parent_package='',top_path=None): from numpy.distutils.misc_util import Configuration config = Configuration('statistics', parent_package, top_path) config.add_extension('intvol', ['intvol.pyx'], include_dirs = [np.get_include()]) return config but of course numpy only knows about Pyrex, and returns: error: Pyrex required for compiling 'nipy/algorithms/statistics/intvol.pyx' but notavailable Is there a recommended way to plumb Cython into the numpy build machinery? Should I try and patch numpy distutils to use Cython if present? Patching distutils might be the way to go. We use Cython for the random build now because Pyrex couldn't handle long strings in a way suitable for Windows. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndimage.label - howto force SWIG to use int32 - even on 64bit Linux ?
On Tue, Apr 27, 2010 at 2:27 AM, Sebastian Haase seb.ha...@gmail.comwrote: Hi, I wanted to write some C code to accept labels as they come from ndimage.label. For some reason ndimage.label produces its output as an int32 array - even on my 64bit system . BTW, could this be considered a bug ? Likely. Now, if I use the typemaps of numpy.i I can choose between NPY_LONG and NPY_INT. But those are sometimes 32 sometimes 64 bit, depending on the system. Any ideas ... ? npy_intp. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] floats as axis
On Wed, Apr 28, 2010 at 8:44 AM, Travis Oliphant oliph...@enthought.comwrote: On Apr 25, 2010, at 8:16 AM, josef.p...@gmail.com wrote: (some) numpy functions take floats as valid axis argument. Is this a feature? np.ones((2,3)).sum(1.2) array([ 3., 3.]) np.ones((2,3)).sum(1.99) array([ 3., 3.]) np.mean((1.5,0.5)) 1.0 np.mean(1.5,0.5) 1.5 Keith pointed out that scipy.stats.nanmean has a different behavior I think we should make float inputs raise an error for NumPy 2.0 Agree... Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On Wed, Apr 28, 2010 at 10:08 AM, Dag Sverre Seljebotn da...@student.matnat.uio.no wrote: David Warde-Farley wrote: Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line a * b, that was receiving an Nx1 matrix and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results... If this was in a library function of some sort, I think they should always call np.asarray on the input arguments. That converts matrices to normal arrays. It could have been Python lists-of-lists, other PEP 3118 objects -- in Python an object can be everything in general, and I think it is very proper for most reusable functions to either validate the type of their arguments or take some steps to convert. That said, I second that it would be good to deprecate the matrix class from NumPy. The problem for me is not the existance of a matrix class as such, but the fact that it subclasses np.ndarray and is so similar with it, breaking a lot of rules for OO programming in the process. Yeah. Masked arrays have similar problems. Pierre has done so much work to have masked versions of the various functions that it might as well be a standalone class. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On Wed, Apr 28, 2010 at 10:05 AM, Travis Oliphant oliph...@enthought.comwrote: On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote: Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line a * b, that was receiving an Nx1 matrix and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results... Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and dot-product calculations. A proposal was made to allow calling a NumPy array to infer dot product: a(b) is equivalent to dot(a,b) a(b)(c) would be equivalent to dot(dot(a,b),c) This seems rather reasonable. I like this too. A similar proposal that recently showed up on the list was to add a dot method to ndarrays so that a(b)(c) would be written a.dot(b).dot(c). While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language. One of the problems of moving to Python 3.0 for many people is that there are not new features to outweigh the hassle of moving. Having a few more infix operators would be a huge incentive to the NumPy community to move to Python 3. Anybody willing to lead the charge with the Python developers? Problem is that we couldn't decide on an appropriate operator. Adding a keyword that functioned like and would likely break all sorts of code, so it needs to be something that is not currently seen in the wild. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Recommended way to add Cython extension using numpy.distutils?
On Tue, Apr 27, 2010 at 8:09 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, We (neuroimaging.scipy.org) are using numpy.distutils, and we have .pyx files that we build with Cython. I wanted to add these in our current setup.py scripts, with something like: def configuration(parent_package='',top_path=None): from numpy.distutils.misc_util import Configuration config = Configuration('statistics', parent_package, top_path) config.add_extension('intvol', ['intvol.pyx'], include_dirs = [np.get_include()]) return config but of course numpy only knows about Pyrex, and returns: error: Pyrex required for compiling 'nipy/algorithms/statistics/intvol.pyx' but notavailable Is there a recommended way to plumb Cython into the numpy build machinery? Should I try and patch numpy distutils to use Cython if present? Here is the monkey-patch I'm using in my project: def evil_numpy_monkey_patch(): from numpy.distutils.command import build_src import Cython import Cython.Compiler.Main build_src.Pyrex = Cython build_src.have_pyrex = True ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
Robert Kern robert.k...@gmail.com writes: Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and dot-product calculations. http://www.python.org/dev/peps/pep-0225/ was considered and rejected. But that was in 2000... While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language. I don't think that stands a chance: http://www.python.org/dev/peps/pep-3003/ Best, -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On 2010-04-28, at 12:05 PM, Travis Oliphant wrote: a(b) is equivalent to dot(a,b) a(b)(c) would be equivalent to dot(dot(a,b),c) This seems rather reasonable. Indeed, and it leads to a rather pleasant way of permuting syntax to change the order of operations, i.e. a(b(c)) vs. a(b)(c). David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On Wed, Apr 28, 2010 at 1:30 PM, David Warde-Farley d...@cs.toronto.edu wrote: On 2010-04-28, at 12:05 PM, Travis Oliphant wrote: a(b) is equivalent to dot(a,b) a(b)(c) would be equivalent to dot(dot(a,b),c) This seems rather reasonable. Indeed, and it leads to a rather pleasant way of permuting syntax to change the order of operations, i.e. a(b(c)) vs. a(b)(c). I like the explicit dot method much better, __call__ (parentheses) can mean anything, and reading the code will be more difficult. (especially when switching from matlab) Josef David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should be the value of nansum of nan's?
On Mon, Apr 26, 2010 at 10:03 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Apr 26, 2010 at 10:55 AM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, We need to make a decision for ticket #1123 regarding what nansum should return when all values are nan. At some earlier point it was zero, but currently it is nan, in fact it is nan whatever the operation is. That is consistent, simple and serves to mark the array or axis as containing all nans. I would like to close the ticket and am a bit inclined to go with the current behaviour although there is an argument to be made for returning 0 for the nansum case. Thoughts? To add a bit of context, one could argue that the results should be consistent with the equivalent operations on empty arrays and always be non-nan. In [1]: nansum([]) Out[1]: nan In [2]: sum([]) Out[2]: 0.0 This seems like an obvious one to me. What is the spirit of nansum? Return the sum of array elements over a given axis treating Not a Numbers (NaNs) as zero. Okay. So NaNs in an array are treated as zeros and the sum is performed as one normally would perform it starting with an initial sum of zero. So if all values are NaN, then we add nothing to our original sum and still return 0. I'm not sure I understand the argument that it should return NaN. It is counter to the *purpose* of nansum. Also, if one wants to determine if all values in an array are NaN, isn't there another way? Let's keep (or make) those distinct operations, as they are definitely distinct concepts. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On 4/28/2010 12:05 PM, Travis Oliphant wrote: A proposal was made to allow calling a NumPy array to infer dot product: a(b) is equivalent to dot(a,b) a(b)(c) would be equivalent to dot(dot(a,b),c) Here is a related ticket that proposes a more explicit alternative: adding a ``dot`` method to ndarray. http://projects.scipy.org/numpy/ticket/1456 fwiw, Alan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Recommended way to add Cython extension using numpy.distutils?
Hi, Thanks a lot for the suggestion - I appreciate it. Is there a recommended way to plumb Cython into the numpy build machinery? Should I try and patch numpy distutils to use Cython if present? Here is the monkey-patch I'm using in my project: def evil_numpy_monkey_patch(): from numpy.distutils.command import build_src import Cython import Cython.Compiler.Main build_src.Pyrex = Cython build_src.have_pyrex = True I think this patch does not work for current numpy trunk; I've put a minimal test case here: http://github.com/matthew-brett/du-cy-numpy If you run the setup.py there (python setup.py build) then all works fine for - say - numpy 1.1. For current trunk you get an error ending in: File /Users/mb312/usr/local/lib/python2.6/site-packages/numpy/distutils/command/build_src.py, line 466, in generate_a_pyrex_source if self.inplace or not have_pyrex(): TypeError: 'bool' object is not callable which is easily fixable of course ('build_src.have_pyrex = lambda : True') - leading to: File /Users/mb312/usr/local/lib/python2.6/site-packages/numpy/distutils/command/build_src.py, line 474, in generate_a_pyrex_source import Pyrex.Compiler.Main ImportError: No module named Pyrex.Compiler.Main I'm afraid I did a rather crude monkey-patch to replace the 'generate_a_pyrex_source' function. It seems to work for numpy 1.1 and current trunk. The patching process is here: http://github.com/matthew-brett/du-cy-numpy/blob/master/matthew_monkey.py Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On Apr 28, 2010, at 11:19 AM, Robert Kern wrote: On Wed, Apr 28, 2010 at 11:05, Travis Oliphant oliph...@enthought.com wrote: On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote: Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line a * b, that was receiving an Nx1 matrix and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results... Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and dot- product calculations. A proposal was made to allow calling a NumPy array to infer dot product: a(b) is equivalent to dot(a,b) a(b)(c) would be equivalent to dot(dot(a,b),c) This seems rather reasonable. While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language. One of the problems of moving to Python 3.0 for many people is that there are not new features to outweigh the hassle of moving. Having a few more infix operators would be a huge incentive to the NumPy community to move to Python 3. Anybody willing to lead the charge with the Python developers? There is currently a moratorium on language changes. This will have to wait. Exceptions can always be made for the right reasons.I don't think this particular question has received sufficient audience with Python core developers.The reason they want the moratorium is for stability, but they also want Python 3k to be adopted. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On Apr 28, 2010, at 11:50 AM, Nikolaus Rath wrote: Robert Kern robert.k...@gmail.com writes: Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and dot- product calculations. http://www.python.org/dev/peps/pep-0225/ was considered and rejected. But that was in 2000... While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language. I don't think that stands a chance: http://www.python.org/dev/peps/pep-3003/ Frankly, I still think we should move forward. It will take us as long as the moratorium is in effect to figure out what operators we want anyway and we can do things like put attributes on arrays in the meantime to implement the infix operators we think we need. It's too bad we don't have more of a voice with the Python core team.This is our fault of course (we don't have people with spare cycles to spend the time interfacing), but it's still too bad. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On 4/28/2010 12:08 PM, Dag Sverre Seljebotn wrote: it would be good to deprecate the matrix class from NumPy Please let us not have this discussion all over again. The matrix class is very useful for teaching. In economics for example, the use of matrix algebra is widespread, while algebra with arrays that are not matrices is very rare. I can (and do) use NumPy matrices even in undergraduate courses. If you do not like them, do not use them. If you want `matrix` replaced with a better matrix object, offer a replacement for community consideration. Thank you, Alan Isaac PS There is one change I would not mind: let A * M be undefined if A is an ndarray and M is a NumPy matrix. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Fwd: Advice for grouping recarrays
Someone inquired about this one today and I wanted to clarify there is now a better way to do this that I didn't know about when I posted the original: ind = numpy.array([0,0,0,0,1,1,1,2,2,2,]) data = numpy.arange(10) borders = numpy.arange(len(ind)).compress(numpy.hstack([[1], ind[1:]!=ind[:-1]])) numpy.add.reduceat(data, borders) array([ 6, 15, 24]) On Tue, Jul 18, 2006 at 8:49 AM, Tom Denniston tom.dennis...@alum.dartmouth.org wrote: I suggest lexsort itertools.groupby of the indices take I think it would be really great if numpy had the first two as a function or something like that. It is really useful to be able to take an array and bucket it and apply further numpy operations like accumulation functions. On 7/18/06, Stephen Simmons m...@stevesimmons.com wrote: Hi, Does anyone have any suggestions for summarising data in numpy? The quick description is that I want to do something like the SQL statement: SELECT sum(field1), sum(field2) FROM table GROUP BY field3; The more accurate description is that my data is stored in PyTables HDF format, with 24 monthly files, each with 4m records describing how customers performed that month. Each record looks something like this: ('200604', 65140450L, '800', 'K', 12L, 162.0, 2000.0, 0.054581, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.80, 0.86, 7.80 17.46, 0.0, 70.0, 0.0, 70.0, -142.93, 0.0, 2000.0, 2063.93, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -9.71, 7.75, 87.46, 77.75, -3.45, 0.22, -0.45, -0.57, 73.95) The first 5 fields are status fields (month_string, account_number, product_code, account_status, months_since_customer_joined). The remaining 48 fields represent different aspects of the customer's performance during that month. I read 100,000 of these records at a time and turn them into a numpy recarray with: dat = hdf_table.read(start=pos, stop=pos+block_size) dat = numpy.asarray(dat._flatArray, dtype=dat.array_descr) I'd like to reduce these 96m records x 53 fields down to monthly averages for each tuple (month_string, months_since_customer_joined) which in the case above is ('200604', 12L). This will let me compare the performance of newly acquired customers at the same point in their lifecycle as customers acquired 1 or 2 years ago. The end result should be a dataset something like res[month_index, months_since_customer_joined] = array([ num_records, sum_field_5, sum_field_6, sum_field_7, ... sum_field_52 ]) with a shape of (24, 24, 49). I've played around with lexsort(), take(), sum(), etc, but get very confused and end up feeling that I'm making things more complicated than they need to be. So any advice from numpy veterans on how best to proceed would be very welcome! Cheers Stephen - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Numpy-discussion mailing list numpy-discuss...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On 28 April 2010 14:30, Alan G Isaac ais...@american.edu wrote: On 4/28/2010 12:08 PM, Dag Sverre Seljebotn wrote: it would be good to deprecate the matrix class from NumPy Please let us not have this discussion all over again. I think you may be too late on this, but it's worth a try. The matrix class is very useful for teaching. In economics for example, the use of matrix algebra is widespread, while algebra with arrays that are not matrices is very rare. I can (and do) use NumPy matrices even in undergraduate courses. If you do not like them, do not use them. This is the problem: lots of people start using numpy and think hmm, I want to store two-dimensional data so I'll use a matrix, and have no idea that matrix means anything different from two-dimensional array. It was this that inspired David's original post, and it's this that we're trying to find a solution for. If you want `matrix` replaced with a better matrix object, offer a replacement for community consideration. Thank you, Alan Isaac PS There is one change I would not mind: let A * M be undefined if A is an ndarray and M is a NumPy matrix. I can definitely vote for this, in the interest of catching as many inadvertent matrix users as possible. Anne ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On Wed, Apr 28, 2010 at 15:50, Travis Oliphant oliph...@enthought.com wrote: On Apr 28, 2010, at 11:19 AM, Robert Kern wrote: On Wed, Apr 28, 2010 at 11:05, Travis Oliphant oliph...@enthought.com wrote: On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote: Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line a * b, that was receiving an Nx1 matrix and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results... Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and dot- product calculations. A proposal was made to allow calling a NumPy array to infer dot product: a(b) is equivalent to dot(a,b) a(b)(c) would be equivalent to dot(dot(a,b),c) This seems rather reasonable. While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language. One of the problems of moving to Python 3.0 for many people is that there are not new features to outweigh the hassle of moving. Having a few more infix operators would be a huge incentive to the NumPy community to move to Python 3. Anybody willing to lead the charge with the Python developers? There is currently a moratorium on language changes. This will have to wait. Exceptions can always be made for the right reasons. I don't think this particular question has received sufficient audience with Python core developers. It received plenty of audience on python-dev in 2008. But no one from our community cared enough to actually implement it. http://fperez.org/py4science/numpy-pep225/numpy-pep225.html The reason they want the moratorium is for stability, but they also want Python 3k to be adopted. This is not something that will justify an exception. Things like oh crap, this old feature has a lurking flaw that we've never noticed before and needs a language change to fix are possible exceptions to the moratorium, not something like this. PEP 3003 quite clearly lays out the possible exceptions: Case-by-Case Exemptions New methods on built-ins The case for adding a method to a built-in object can be made. Incorrect language semantics If the language semantics turn out to be ambiguous or improperly implemented based on the intention of the original design then the semantics may change. Language semantics that are difficult to implement Because other VMs have not begun implementing Python 3.x semantics there is a possibility that certain semantics are too difficult to replicate. In those cases they can be changed to ease adoption of Python 3.x by the other VMs. This feature falls into none of these categories. It does fall into this one: Cannot Change ... Language syntax The grammar file essentially becomes immutable apart from ambiguity fixes. Guido is taking a hard line on this. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On Wed, Apr 28, 2010 at 2:12 PM, Alan G Isaac ais...@american.edu wrote: On 4/28/2010 12:05 PM, Travis Oliphant wrote: A proposal was made to allow calling a NumPy array to infer dot product: a(b) is equivalent to dot(a,b) a(b)(c) would be equivalent to dot(dot(a,b),c) Here is a related ticket that proposes a more explicit alternative: adding a ``dot`` method to ndarray. http://projects.scipy.org/numpy/ticket/1456 FWIW, I have borrowed a convenience function chain_dot originally from pandas that works for me as a stop gap for more readable code. def chain_dot(*arrs): Returns the dot product of the given matrices. Parameters -- arrs: argument list of ndarrays Returns --- Dot product of all arguments. Example --- import numpy as np from scikits.statsmodels.tools import chain_dot A = np.arange(1,13).reshape(3,4) B = np.arange(3,15).reshape(4,3) C = np.arange(5,8).reshape(3,1) chain_dot(A,B,C) array([[1820], [4300], [6780]]) return reduce(lambda x, y: np.dot(y, x), arrs[::-1]) Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposing a beware of [as]matrix() warning
On 2010-04-28, at 2:30 PM, Alan G Isaac wrote: Please let us not have this discussion all over again. Agreed. See my preface to this discussion. My main objection is that it's not easy to explain to a newcomer what the difference precisely is, how they interact, why two of them exist, how they are sort-of-compatible-but-not... The matrix class is very useful for teaching. In economics for example, the use of matrix algebra is widespread, while algebra with arrays that are not matrices is very rare. I can (and do) use NumPy matrices even in undergraduate courses. Would it be acceptable to retain the matrix class but not have it imported in the default namespace, and have to import e.g. numpy.matlib to get at them? If you do not like them, do not use them. The problem isn't really with seasoned users of NumPy not liking them, but rather new users being confused by the presence of (what seems to be) two primitives, array and matrix. Two things tend to happen: a) Example code that expects arrays instead receives matrices. If these aren't cast with asarray(), mayhem ensues at the first sight of *. b) Users of class matrix use a proper function correctly coerces input to ndarray, but returns an ndarray. Users are thus confused that, thinking of the function as a black box, putting matrices 'in' doesn't result in getting matrices 'out'. It doesn't take long to get the hang of if you really sit down and work it through, but it also doesn't take long to go back to MATLAB or whatever else. My interest is in having as few conceptual stumbling stones as possible. c) Complicating the situation further, people try to use functions e.g. from scipy.optimize which expect a 1d array by passing in column or row matrices. Even when coerced to array, these have the wrong rank and you get unexpected results (you could argue that we should instead use asarray(..).squeeze() on all incoming arguments, but this may not generalize well). PS There is one change I would not mind: let A * M be undefined if A is an ndarray and M is a NumPy matrix. What about the other binary ops? I would say, matrix goes with matrix, array with array, never the two shall meet unless you explicitly coerce. The ability to mix the two in a single expression does more harm than good, IMHO. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndimage.label - howto force SWIG to use int32 - even on 64bit Linux ?
Both types of typemaps are enabled, so you just need to do you %apply directives correctly: %apply (npy_intp* IN_ARRAY1, int DIM1) {(npy_intp* seq, int n)}; etc SWIG should be able to figure it out from there. On Apr 28, 2010, at 12:58 PM, Charles R Harris wrote: On Tue, Apr 27, 2010 at 2:27 AM, Sebastian Haase seb.ha...@gmail.com wrote: Hi, I wanted to write some C code to accept labels as they come from ndimage.label. For some reason ndimage.label produces its output as an int32 array - even on my 64bit system . BTW, could this be considered a bug ? Likely. Now, if I use the typemaps of numpy.i I can choose between NPY_LONG and NPY_INT. But those are sometimes 32 sometimes 64 bit, depending on the system. Any ideas ... ? npy_intp. Chuck ** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-0154 ** ** Albuquerque, NM 87185-0370Email: wfsp...@sandia.gov ** ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion