[Numpy-discussion] interleaved indexing
A very beginner question about indexing: let x be an array where n = len(x). I would like to create a view y of x such that: y[i] = x[i:i+m,...] for each i and a fixed m n so I can do things like numpy.cov(y). With n large, allocating y is a problem for me. Currently, I either do for loops in cython or translate operations into correlate() but am hoping there is an easier way, maybe using fancy indexing or broadcasting. Memory usage is secondary to speed, though. Thanks. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Type checking of arrays containing strings
Hi I need to check if my array (a) is of type `string`. That is, I dont know the number of characters beforehand, so I cant do a.dtype == '|S*' (* = (max) number of characters) Looking at my options, I see either a.dtype.kind == 'S' or a.dtype.type == np.string_, might be ok. Are these any of the preffered ways, or is there some other way? Thanks, Arnar ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] interleaved indexing
Hi Amir 2008/7/18 Amir [EMAIL PROTECTED]: A very beginner question about indexing: let x be an array where n = len(x). I would like to create a view y of x such that: y[i] = x[i:i+m,...] for each i and a fixed m n so I can do things like numpy.cov(y). With n large, allocating y is a problem for me. Currently, I either do for loops in cython or translate operations into correlate() but am hoping there is an easier way, maybe using fancy indexing or broadcasting. Memory usage is secondary to speed, though. Robert Kern's recently added numpy.lib.stride_tricks should help: In [84]: x = np.arange(100).reshape(10,-1) In [85]: x Out[85]: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]) In [86]: x.stridesOut[86]: (40, 4) In [87]: xx = np.lib.stride_tricks.as_strided(x, shape=(8, 3, 10), strides=(40, 40, 4)) In [88]: xx Out[88]: array([[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]], [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]], [[20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]], [...] Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Type checking of arrays containing strings
2008/7/18 Arnar Flatberg [EMAIL PROTECTED]: I need to check if my array (a) is of type `string`. That is, I dont know the number of characters beforehand, so I cant do a.dtype == '|S*' (* = (max) number of characters) Looking at my options, I see either a.dtype.kind == 'S' or a.dtype.type == np.string_, might be ok. Are these any of the preffered ways, or is there some other way? Maybe np.issubdtype(x.dtype, str) Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Type checking of arrays containing strings
On Fri, Jul 18, 2008 at 11:40 AM, Stéfan van der Walt [EMAIL PROTECTED] wrote: 2008/7/18 Arnar Flatberg [EMAIL PROTECTED]: I need to check if my array (a) is of type `string`. That is, I dont know the number of characters beforehand, so I cant do a.dtype == '|S*' (* = (max) number of characters) Looking at my options, I see either a.dtype.kind == 'S' or a.dtype.type == np.string_, might be ok. Are these any of the preffered ways, or is there some other way? Maybe np.issubdtype(x.dtype, str) Yes, silly of me. I didnt look at the documentation (source) and tried np.issubdtype(x, str) as my first try. That not working, I got lost. That said, I think some other parameter names than arg1, arg2 would be nice for an undocumented function . Arnar ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Type checking of arrays containing strings
On Fri, Jul 18, 2008 at 12:38 PM, Arnar Flatberg [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 11:40 AM, Stéfan van der Walt [EMAIL PROTECTED] wrote: 2008/7/18 Arnar Flatberg [EMAIL PROTECTED]: I need to check if my array (a) is of type `string`. That is, I dont know the number of characters beforehand, so I cant do a.dtype == '|S*' (* = (max) number of characters) Looking at my options, I see either a.dtype.kind == 'S' or a.dtype.type == np.string_, might be ok. Are these any of the preffered ways, or is there some other way? Maybe np.issubdtype(x.dtype, str) Yes, silly of me. I didnt look at the documentation (source) and tried np.issubdtype(x, str) as my first try. That not working, I got lost. That said, I think some other parameter names than arg1, arg2 would be nice for an undocumented function . Arnar ... and instead of just complaining, I can do something about it :-) Could you add permissions to me for the documentation editor, username: ArnarFlatberg. Thanks, Arnar ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Type checking of arrays containing strings
2008/7/18 Arnar Flatberg [EMAIL PROTECTED]: ... and instead of just complaining, I can do something about it :-) Could you add permissions to me for the documentation editor, username: That's the spirit! You are now added. But I'm worried: I had a bet with Joe that we won't get more than 30 people to sign up. Looks like I might have to concede defeat! Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] chararray __mod__ behavior
This seems odd to me: A=np.array([['%.3f','%d'],['%s','%r']]).view(np.chararray) A % np.array([[1,2],[3,4]]) Traceback (most recent call last): File stdin, line 1, in module File /opt/local/lib/python2.5/site-packages/numpy/core/defchararray.py, line 126, in __mod__ newarr[:] = res ValueError: shape mismatch: objects cannot be broadcast to a single shape Is this expected behavior? The % gets broadcast as I'd expect for 1D arrays, but more dimensions fail as above. Changing the offending line in defchararray.py to newarr.flat = res makes it behave properly. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] chararray __mod__ behavior
2008/7/18 Alan McIntyre [EMAIL PROTECTED]: This seems odd to me: A=np.array([['%.3f','%d'],['%s','%r']]).view(np.chararray) A % np.array([[1,2],[3,4]]) Traceback (most recent call last): File stdin, line 1, in module File /opt/local/lib/python2.5/site-packages/numpy/core/defchararray.py, line 126, in __mod__ newarr[:] = res ValueError: shape mismatch: objects cannot be broadcast to a single shape Is this expected behavior? The % gets broadcast as I'd expect for 1D arrays, but more dimensions fail as above. Changing the offending line in defchararray.py to newarr.flat = res makes it behave properly. That looks like a bug to me. I would have expected at least one of the following to work: A % [[1, 2], [3, 4]] A % 1 A % (1, 2, 3, 4) and none of them do. Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] RFC: A (second) proposal for implementing some date/time types in NumPy
Francesc Alted (el 2008-07-16 a les 18:44:36 +0200) va dir:: After tons of excellent feedback received for our first proposal about the date/time types in NumPy Ivan and me have had another brainstorming session and ended with a new proposal for your consideration. After re-reading the proposal, Francesc and me found some points that needed small corrections and some clarifications or enhancements. Here you have a new version of the proposal. The changes aren't fundamental: * Reference to POSIX-like treatment of leap seconds. * Notes on default resolutions. * Meaning of the stored values. * Usage examples for scalar constructor. * Using an ISO 8601 string as a date value. * Fixed str() and repr() representations. * Note on operations with mixed resolutions. * Other small corrections. Thanks for the feedback! A (second) proposal for implementing some date/time types in NumPy :Author: Francesc Alted i Abad :Contact: [EMAIL PROTECTED] :Author: Ivan Vilata i Balaguer :Contact: [EMAIL PROTECTED] :Date: 2008-07-18 Executive summary = A date/time mark is something very handy to have in many fields where one has to deal with data sets. While Python has several modules that define a date/time type (like the integrated ``datetime`` [1]_ or ``mx.DateTime`` [2]_), NumPy has a lack of them. In this document, we are proposing the addition of a series of date/time types to fill this gap. The requirements for the proposed types are two-folded: 1) they have to be fast to operate with and 2) they have to be as compatible as possible with the existing ``datetime`` module that comes with Python. Types proposed == To start with, it is virtually impossible to come up with a single date/time type that fills the needs of every case of use. So, after pondering about different possibilities, we have stuck with *two* different types, namely ``datetime64`` and ``timedelta64`` (these names are preliminary and can be changed), that can have different resolutions so as to cover different needs. .. Important:: the resolution is conceived here as metadata that *complements* a date/time dtype, *without changing the base type*. It provides information about the *meaning* of the stored numbers, not about their *structure*. Now follows a detailed description of the proposed types. ``datetime64`` -- It represents a time that is absolute (i.e. not relative). It is implemented internally as an ``int64`` type. The internal epoch is the POSIX epoch (see [3]_). Like POSIX, the representation of a date doesn't take leap seconds into account. Resolution ~~ It accepts different resolutions, each of them implying a different time span. The table below describes the resolutions supported with their corresponding time spans. === == Resolution Time span (years) -- Code Meaning === == Y year[9.2e18 BC, 9.2e18 AC] Q quarter [3.0e18 BC, 3.0e18 AC] M month [7.6e17 BC, 7.6e17 AC] W week[1.7e17 BC, 1.7e17 AC] d day [2.5e16 BC, 2.5e16 AC] h hour[1.0e15 BC, 1.0e15 AC] m minute [1.7e13 BC, 1.7e13 AC] s second [ 2.9e9 BC, 2.9e9 AC] ms millisecond [ 2.9e6 BC, 2.9e6 AC] us microsecond [290301 BC, 294241 AC] ns nanosecond [ 1678 AC, 2262 AC] === == When a resolution is not provided, the default resolution of microseconds is used. The value of an absolute date is thus *an integer number of units of the chosen resolution* passed since the internal epoch. Building a ``datetime64`` dtype ~~~ The proposed way to specify the resolution in the dtype constructor is: Using parameters in the constructor:: dtype('datetime64', res=us) # the default res. is microseconds Using the long string notation:: dtype('datetime64[us]') # equivalent to dtype('datetime64') Using the short string notation:: dtype('T8[us]') # equivalent to dtype('T8') Compatibility issues This will be fully compatible with the ``datetime`` class of the ``datetime`` module of Python only when using a resolution of microseconds. For other resolutions, the conversion process will loose precision or will overflow as needed. The conversion from/to a ``datetime`` object doesn't take leap seconds into account. ``timedelta64`` --- It represents a time that is relative (i.e. not absolute). It is implemented internally as an ``int64`` type. Resolution ~~ It accepts different resolutions,
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
On Fri, Jul 18, 2008 at 5:15 AM, Michael Abbott [EMAIL PROTECTED] wrote: I'm afraid this is going to be a long one, and I don't see any good way to cut down the quoted text either... Charles, I'm going to plea with you to read what I've just written and think about it. I'm trying to make the case as clear as I can. I think the case is actually extremely simple: the existing @[EMAIL PROTECTED] code is broken. snip Heh. As the macro is undefined after the code is generated, it should probably be moved into the code. I would actually like to get rid of the ifdef's (almost everywhere), but that is a later stage of cleanup. 3. Here's the reference count we're responsible for. Yep. 4. If obj is NULL we use the typecode 5. otherwise we pass it to PyArray_FromAny. 6. The first early return 7. All paths (apart from 6) come together here. So at this point let's take stock. typecode is in one of three states: NULL (path 2, or if creation failed), allocated with a single reference count (path 4), or lost (path 5). This is not good. It still has a single reference after 5 if PyArray_FromAny succeeded, that reference is held by arr and transferred to robj. If the transfer fails, the reference to arr is decremented and NULL returned by PyArray_Return. When arr is garbage collected the reference to typecode will be decremented. LET ME EMPHASISE THIS: the state of the code at the finish label is dangerous and simply broken. The original state at the finish label is indeterminate: typecode has either been lost by passing it to PyArray_FromAny (in which case we're not allowed to touch it again), or else it has reference count that we're still responsible for. There seems to be a fantasy expressed in a comment in a recent update to this routine that PyArray_Scalar has stolen a reference, but fortunately a quick code inspection (of arrayobject.c) quickly refutes this frightening possibility. No, no, Pyarray_Scalar doesn't do anything to the reference count. Where did you see otherwise? So, the only way to fix the problem at (7) is to unify the two non-NULL cases. One answer is to add a DECREF at (4), but we see at (11) that we still need typecode at (7) -- so the only solution is to add an extra ADDREF just before (5). This then of course sadly means that we also need an extra DECREF at (6). PLEASE don't suggest moving the ADDREF until after (6) -- at this point typecode is lost and may have been destroyed, and relying on any possibility to the contrary is a recipe for continued screw ups. The rest is easy. Once we've established the invariant that typecode is either NULL or has a single reference count at (7) then the two early returns at (8) and (9) unfortunately need to be augmented with DECREFs. And we're done. Responses to your original comments: By the time we hit finish, robj is NULL or holds a reference to typecode and the NULL case is taken care of up front. robj has nothing to do with the lifetime management of typecode, the only issue is the early return. After the finish label typecode is either NULL (no problem) or else has a single reference count that needs to be accounted for. Later on, the reference to typecode might be decremented, That *might* is at the heart of the problem. You can't be so cavalier about managing references. perhaps leaving robj crippled, but in that case robj itself is marked for deletion upon exit. Please ignore robj in ths discussion, it's beside the point. If the garbage collector can handle zero reference counts I think we are alright. No, no, no. This is nothing to do with the garbage collector. If we screw up our reference counts here then the garbage collector isn't going to dig us out of the hole. The garbage collector destroys the object and should decrement all the references it holds. If that is not the case then there are bigger problems afoot. The finalizer for the object should hold the knowledge of what needs to be decremented. I admit I haven't quite followed all the subroutines and macros, which descend into the hazy depths without the slightest bit of documentation, but at this point I'm inclined to leave things alone unless you have a test that shows a leak from this source. Part of my point is that proper reference count discipline should not require any descent into subroutines (except for the very nasty case of reference theft, which I think is generally agreed to be a bad thing). Agreed. But that is not the code we are looking at. My personal schedule for this sort of cleanup/refactoring looks like this. 1) Format the code into readable C. (ongoing) 2) Document the functions so we know what they do. 3) Understand the code. 4) Fix up functions starting from the bottom layers. 5) Flatten the code -- the calls go too deep for my taste and make understanding difficult. My attempts to generate a call graph have
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
On Fri, Jul 18, 2008 at 12:03 PM, Charles R Harris [EMAIL PROTECTED] wrote: snip Simpler test case: import sys, gc import numpy as np def main() : t = np.dtype(np.float32) print sys.getrefcount(t) for i in range(100) : np.float32() gc.collect() print sys.getrefcount(t) if __name__ == __main__ : main() Result [EMAIL PROTECTED] ~]$ python debug.py 5 105 So there is a leak. The question is the proper fix. I want to take a closer look at PyArray_Return and also float32() and relations. The reference leak seems specific to the float32 and complex64 types called with default arguments. In [1]: import sys, gc In [2]: t = float32 In [3]: sys.getrefcount(dtype(t)) Out[3]: 4 In [4]: for i in range(10) : t(); ...: In [5]: sys.getrefcount(dtype(t)) Out[5]: 14 In [6]: for i in range(10) : t(0); ...: In [7]: sys.getrefcount(dtype(t)) Out[7]: 14 In [8]: t = complex64 In [9]: sys.getrefcount(dtype(t)) Out[9]: 4 In [10]: for i in range(10) : t(); : In [11]: sys.getrefcount(dtype(t)) Out[11]: 14 In [12]: t = float64 In [13]: sys.getrefcount(dtype(t)) Out[13]: 19 In [14]: for i in range(10) : t(); : In [15]: sys.getrefcount(dtype(t)) Out[15]: 19 This shouldn't actually leak any memory as these types are singletons, but it points up a logic flaw somewhere. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] building a better OSX install for 1.1.1
Sorry Russell, I was a bit brief before. Installer.app checks for a system requirement when the user installs numpy. I build numpy using bdist_mpkg against the python.org version of python (MacPython). If a user tries to install numpy and they _do not_ have this version of python installed, Installer.app issues a warning: numpy requires System Python 2.5 to install. The phrase System Python is misleading, it's reasonable to assume that refers to the system version of python. So I'd like to change it. This string is stored in an Info.plist buried in the .mpkg that bdist_mpkg builds. I'd like to be able to override that string from the command line, but there does not seem to be any options for changing the requirements from the command line. The hack solution is to modify the string in the Info.plist after the package is built. But I'm hoping there's a proper solution that I'm missing. Thanks! Chris On Fri, Jul 18, 2008 at 11:59 AM, Russell E. Owen [EMAIL PROTECTED] wrote: In article [EMAIL PROTECTED], Christopher Burns [EMAIL PROTECTED] wrote: I've been using bdist_mpkg to build the OSX Installer. I'd like to update the requirement documentation for the 1.1.1 release candidate to say MacPython from python.org instead of System Python. bdist_mpkg specifies this, does anyone know how to override it? I suspect I am misunderstanding your question, but... If you are asking how to make bdist_mkpg actually use MacPython, then surely you simply have to have MacPython installed for that to happen? That was certainly true for MacPython and bdist_mpkg on 10.4.x. Or are you asking how to make the installer fail if the user's system is missing MacPython? That I don't know. I usually rely on the .mpkg's ReadMe and the user being intelligent enough to read it, but of course that is a bit risky. If you are asking how to modify the ReadMe file then that is trivial -- just look through the .mpkg package and you'll find it right away. I often replace the default ReadMe with my own when creating .mpkg installer for others. -- Russell ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Burns Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] integer array creation oddity
Hi, Can someone please explain to me this oddity? In [1]: import numpy as n In [8]: a = n.array((1,2,3), 'i') In [9]: type(a[0]) Out[9]: type 'numpy.int32' In [10]: type(a[0]) == n.int32 Out[10]: False When I create an array with 'int', 'int32' etc it works fine What is the type of 'i' and what is n.int0? Thanks, Suchindra ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
On Fri, Jul 18, 2008 at 2:41 PM, Charles R Harris [EMAIL PROTECTED] wrote: snip There are actually two bugs here which is confusing things. Bug 1) Deleting a numpy scalar doesn't decrement the descr reference count. Bug 2) PyArray_Return decrements the reference to arr, which in turn correctly decrements the descr on gc. So calls that go through the default (obj == NULL) correctly leave typecode with a reference, it just never gets decremented when the return object is deleted. On the other hand, if the function is called with something like float32(0), then arr is allocated, stealing a reference to typecode. PyArray_Return then deletes arr (which decrements the typecode reference), but that doesn't matter since typecode is a singleton. In this case there is no follow on stack dump when the returned object is deleted because the descr reference is not correctly decremented. BTW, both cases get returned in the first if statement after the finish label, i.e., robj is returned. Oy, what a mess. Here is a short program to follow all the reference counts. import sys, gc import numpy as np def main() : typecodes = ?bBhHiIlLqQfdgFDG for typecode in typecodes : t = np.dtype(typecode) refcnt = sys.getrefcount(t) t.type() gc.collect() print typecode, t.name, sys.getrefcount(t) - refcnt if __name__ == __main__ : main() Which gives the following output: [EMAIL PROTECTED] ~]$ python debug.py ? bool 0 b int8 1 B uint8 1 h int16 1 H uint16 1 i int32 0 I uint32 1 l int32 0 L uint32 1 q int64 1 Q uint64 1 f float32 1 d float64 0 g float96 1 F complex64 1 D complex128 1 G complex192 1 Note that the python types, which the macro handles, don't have the reference leak problem. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numpy.loadtext() fails with dtype + usecols
Hi, I was trying to use loadtxt() today to read in some text data, and I had a problem when I specified a dtype that only contained as many elements as in columns in usecols. The example below shows the problem: import numpy as np import StringIO data = '''STID RELH TAIR JOE 70.1 25.3 BOB 60.5 27.9 ''' f = StringIO.StringIO(data) names = ['stid', 'temp'] dtypes = ['S4', 'f8'] arr = np.loadtxt(f, usecols=(0,2),dtype=zip(names,dtypes), skiprows=1) With current 1.1 (and SVN head), this yields: IndexErrorTraceback (most recent call last) /home/rmay/ipython console in module() /usr/lib64/python2.5/site-packages/numpy/lib/io.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack) 309 for j in xrange(len(vals))] 310 if usecols is not None: -- 311 row = [converterseq[j](vals[j]) for j in usecols] 312 else: 313 row = [converterseq[j](val) for j,val in enumerate(vals)] IndexError: list index out of range -- I've added a patch that checks for usecols, and if present, correctly creates the converters dictionary to map each specified column with converter for the corresponding field in the dtype. With the attached patch, this works fine: arr array([('JOE', 25.301), ('BOB', 27.899)], dtype=[('stid', '|S4'), ('temp', 'f8')]) Comments? Can I get this in for 1.1.1? Thanks, Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma --- io.py.bak 2008-07-18 18:12:17.0 -0400 +++ io.py 2008-07-16 22:49:13.0 -0400 @@ -292,8 +292,13 @@ if converters is None: converters = {} if dtype.names is not None: -converterseq = [_getconv(dtype.fields[name][0]) \ -for name in dtype.names] +if usecols is None: +converterseq = [_getconv(dtype.fields[name][0]) \ +for name in dtype.names] +else: +converters.update([(col,_getconv(dtype.fields[name][0])) \ +for col,name in zip(usecols, dtype.names)]) + for i,line in enumerate(fh): if iskiprows: continue ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] f2py - how to use .pyf files?
Howdy, again f2py... Since I can't seem to figure out how to pass the --fcompiler option to f2py via setup.py/distutils, I decided to just do things for now via a plain makefile. But I'm struggling here too. The problem is this: the call f2py -c --fcompiler=gfortran -m text2 Text2.f90 works perfectly, and at some point in the output I see customize Gnu95FCompiler Found executable /usr/bin/gfortran and the result is a nice text2.so module. But I'd like to clean up a bit the python interface to the fortran routines, so I did the usual f2py -h text2.pyf Text2.f90 to create the .pyf, edited the pyf to adjust and 'pythonize' the interface, and then when I try to build using this pyf, I get a crash and the *same* gfortran option is now not recognized: maqroll[felipe_fortran] f2py -c --fcompiler=gfortran text2.pyf Unknown vendor: gfortran Traceback (most recent call last): File /usr/bin/f2py, line 26, in module main() File /home/fperez/usr/opt/lib/python2.5/site-packages/numpy/f2py/f2py2e.py, line 560, in main run_compile() File /home/fperez/usr/opt/lib/python2.5/site-packages/numpy/f2py/f2py2e.py, line 536, in run_compile ext = Extension(**ext_args) File /home/fperez/usr/opt/lib/python2.5/site-packages/numpy/distutils/extension.py, line 45, in __init__ export_symbols) File /usr/lib/python2.5/distutils/extension.py, line 106, in __init__ assert type(name) is StringType, 'name' must be a string AssertionError: 'name' must be a string Note that it doesn't matter if I add Text2.f90 or not to the above call, it still fails. I could swear I'd done similar things in the past without any problems (albeit with f77 sources), and the user guide http://cens.ioc.ee/projects/f2py2e/usersguide/index.html#three-ways-to-wrap-getting-started gives instructions very much along the lines of what I'm doing. Are these changes since the integration into numpy, regressions, or mistakes on how I'm calling it? Thanks for any help, f ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.loadtext() fails with dtype + usecols
On Fri, Jul 18, 2008 at 4:16 PM, Ryan May [EMAIL PROTECTED] wrote: Hi, I was trying to use loadtxt() today to read in some text data, and I had a problem when I specified a dtype that only contained as many elements as in columns in usecols. The example below shows the problem: import numpy as np import StringIO data = '''STID RELH TAIR JOE 70.1 25.3 BOB 60.5 27.9 ''' f = StringIO.StringIO(data) names = ['stid', 'temp'] dtypes = ['S4', 'f8'] arr = np.loadtxt(f, usecols=(0,2),dtype=zip(names,dtypes), skiprows=1) With current 1.1 (and SVN head), this yields: IndexErrorTraceback (most recent call last) /home/rmay/ipython console in module() /usr/lib64/python2.5/site-packages/numpy/lib/io.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack) 309 for j in xrange(len(vals))] 310 if usecols is not None: -- 311 row = [converterseq[j](vals[j]) for j in usecols] 312 else: 313 row = [converterseq[j](val) for j,val in enumerate(vals)] IndexError: list index out of range -- I've added a patch that checks for usecols, and if present, correctly creates the converters dictionary to map each specified column with converter for the corresponding field in the dtype. With the attached patch, this works fine: arr array([('JOE', 25.301), ('BOB', 27.899)], dtype=[('stid', '|S4'), ('temp', 'f8')]) Comments? Can I get this in for 1.1.1? Can someone familiar with loadtxt comment on this patch? Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] f2py - how to use .pyf files?
On Fri, Jul 18, 2008 at 17:39, Fernando Perez [EMAIL PROTECTED] wrote: Howdy, again f2py... Since I can't seem to figure out how to pass the --fcompiler option to f2py via setup.py/distutils, I decided to just do things for now via a plain makefile. But I'm struggling here too. The problem is this: the call f2py -c --fcompiler=gfortran -m text2 Text2.f90 works perfectly, and at some point in the output I see customize Gnu95FCompiler Found executable /usr/bin/gfortran and the result is a nice text2.so module. But I'd like to clean up a bit the python interface to the fortran routines, so I did the usual f2py -h text2.pyf Text2.f90 You still need -m text2, I believe. to create the .pyf, edited the pyf to adjust and 'pythonize' the interface, and then when I try to build using this pyf, I get a crash and the *same* gfortran option is now not recognized: maqroll[felipe_fortran] f2py -c --fcompiler=gfortran text2.pyf Unknown vendor: gfortran It's --fcompiler=gnu95, not --fcompiler=gfortran -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] f2py - how to use .pyf files?
On Fri, Jul 18, 2008 at 18:19, Fernando Perez [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 3:53 PM, Robert Kern [EMAIL PROTECTED] wrote: You still need -m text2, I believe. Right, thanks. But it still doesn't quite work. Consider a makefile with lib: seglib.so seglib.so: Text2.f90 f2py -c --fcompiler=gnu95 -m seglib Text2.f90 pyf: Text2.f90 f2py -h seglib.pyf -m seglib Text2.f90 --overwrite-signature lib2: Text2.f90 f2py -c --fcompiler=gnu95 seglib.pyf You still need to have Text2.f90 on the line. lib2: Text2.f90 f2py -c --fcompiler=gnu95 seglib.pyf Text2.f90 -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] f2py - how to use .pyf files?
On Fri, Jul 18, 2008 at 4:26 PM, Robert Kern [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 18:19, Fernando Perez [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 3:53 PM, Robert Kern [EMAIL PROTECTED] wrote: You still need -m text2, I believe. Right, thanks. But it still doesn't quite work. Consider a makefile with lib: seglib.so seglib.so: Text2.f90 f2py -c --fcompiler=gnu95 -m seglib Text2.f90 pyf: Text2.f90 f2py -h seglib.pyf -m seglib Text2.f90 --overwrite-signature lib2: Text2.f90 f2py -c --fcompiler=gnu95 seglib.pyf You still need to have Text2.f90 on the line. Ahah! I went on this: -h filenameWrite signatures of the fortran routines to file filename and exit. You can then edit filename and use it ***instead*** of fortran files. [emphasis mine]. The instead there led me to think that I should NOT list the fortran files. Should that docstring be fixed, or am I just misreading something? And do you have any ideas on why the f2py_options in setup.py don't correctly pass the --fcompiler flag down to f2py? It does work if one calls setup.py via ./setup.py config_fc --fcompiler=gnu95 build_ext --inplace but it seems it would be good to be able to set all f2py options inside the setup.py file (without resorting to sys.argv hacks). Or does this go against the grain of numpy.distutils? Cheers, f ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [matplotlib-devel] Matplotlib and latest numpy
On Sat, Jul 19, 2008 at 01:25:51AM +0200, Gael Varoquaux wrote: Am I wrong, or does matploib not build with current numpy svn? OK, Fernando told me that matplotlib builds fine with latest numpy on his box so Ienquired a bit more. The problem is that the build of matplotlib tries to include a header file that is generated automatically during the build of numpy (__multiarray_api.h). If you install numpy using python setup.py install, this header file is inlcuded in the install. However I used python setupegg.py develop to install numpy. As a result the header file does not get put in the include directory. I guess this is thus a bug in the setupegg.py of numpy, but my knowledge of setuptools is way to low to be able to fix that. Cheers, Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [matplotlib-devel] Matplotlib and latest numpy
On Fri, Jul 18, 2008 at 18:40, Gael Varoquaux [EMAIL PROTECTED] wrote: On Sat, Jul 19, 2008 at 01:25:51AM +0200, Gael Varoquaux wrote: Am I wrong, or does matploib not build with current numpy svn? OK, Fernando told me that matplotlib builds fine with latest numpy on his box so Ienquired a bit more. The problem is that the build of matplotlib tries to include a header file that is generated automatically during the build of numpy (__multiarray_api.h). If you install numpy using python setup.py install, this header file is inlcuded in the install. However I used python setupegg.py develop to install numpy. As a result the header file does not get put in the include directory. I guess this is thus a bug in the setupegg.py of numpy, but my knowledge of setuptools is way to low to be able to fix that. It's not a setuptools issue at all. build_ext --inplace just doesn't install header files. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [matplotlib-devel] Matplotlib and latest numpy
On Fri, Jul 18, 2008 at 5:44 PM, Robert Kern [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 18:40, Gael Varoquaux [EMAIL PROTECTED] wrote: On Sat, Jul 19, 2008 at 01:25:51AM +0200, Gael Varoquaux wrote: Am I wrong, or does matploib not build with current numpy svn? OK, Fernando told me that matplotlib builds fine with latest numpy on his box so Ienquired a bit more. The problem is that the build of matplotlib tries to include a header file that is generated automatically during the build of numpy (__multiarray_api.h). If you install numpy using python setup.py install, this header file is inlcuded in the install. However I used python setupegg.py develop to install numpy. As a result the header file does not get put in the include directory. I guess this is thus a bug in the setupegg.py of numpy, but my knowledge of setuptools is way to low to be able to fix that. It's not a setuptools issue at all. build_ext --inplace just doesn't install header files. So is there a fix for this problem? Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] f2py - how to use .pyf files?
On Fri, Jul 18, 2008 at 18:35, Fernando Perez [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 4:26 PM, Robert Kern [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 18:19, Fernando Perez [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 3:53 PM, Robert Kern [EMAIL PROTECTED] wrote: You still need -m text2, I believe. Right, thanks. But it still doesn't quite work. Consider a makefile with lib: seglib.so seglib.so: Text2.f90 f2py -c --fcompiler=gnu95 -m seglib Text2.f90 pyf: Text2.f90 f2py -h seglib.pyf -m seglib Text2.f90 --overwrite-signature lib2: Text2.f90 f2py -c --fcompiler=gnu95 seglib.pyf You still need to have Text2.f90 on the line. Ahah! I went on this: -h filenameWrite signatures of the fortran routines to file filename and exit. You can then edit filename and use it ***instead*** of fortran files. [emphasis mine]. The instead there led me to think that I should NOT list the fortran files. Should that docstring be fixed, or am I just misreading something? And do you have any ideas on why the f2py_options in setup.py don't correctly pass the --fcompiler flag down to f2py? It does work if one calls setup.py via ./setup.py config_fc --fcompiler=gnu95 build_ext --inplace but it seems it would be good to be able to set all f2py options inside the setup.py file (without resorting to sys.argv hacks). Or does this go against the grain of numpy.distutils? For publicly distributed packages, it does go against the grain. Never hard-code the compiler! Use a setup.cfg file, instead. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [matplotlib-devel] Matplotlib and latest numpy
On Fri, Jul 18, 2008 at 06:44:21PM -0500, Robert Kern wrote: It's not a setuptools issue at all. build_ext --inplace just doesn't install header files. Does that mean that python setup.py develop should be banned for numpy? If so I consider it a setuptools issue: one more caveat to learn about setuptools, ie the fact that setuptools develop is not reliable and can lead to interested bugs without conplaining at all. I am just frustated of loosing a significant amount of my time discovering the behavior of the langage introduced by setuptools that keeps popup all the time. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [matplotlib-devel] Matplotlib and latest numpy
On Fri, Jul 18, 2008 at 18:49, Gael Varoquaux [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 06:44:21PM -0500, Robert Kern wrote: It's not a setuptools issue at all. build_ext --inplace just doesn't install header files. Does that mean that python setup.py develop should be banned for numpy? No. If so I consider it a setuptools issue: one more caveat to learn about setuptools, ie the fact that setuptools develop is not reliable and can lead to interested bugs without conplaining at all. IT'S NOT A SETUPTOOLS ISSUE. If you had done a python setup.py build_ext --inplace and then set your PYTHONPATH manually, you would have the same problem. Stop blaming setuptools for every little problem. Building inplace is not a setuptools feature. It's a distutils feature. The fact that we have header files in the package is a numpy feature. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [matplotlib-devel] Matplotlib and latest numpy
On Fri, Jul 18, 2008 at 5:53 PM, Robert Kern [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 18:49, Gael Varoquaux [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 06:44:21PM -0500, Robert Kern wrote: It's not a setuptools issue at all. build_ext --inplace just doesn't install header files. Does that mean that python setup.py develop should be banned for numpy? No. If so I consider it a setuptools issue: one more caveat to learn about setuptools, ie the fact that setuptools develop is not reliable and can lead to interested bugs without conplaining at all. IT'S NOT A SETUPTOOLS ISSUE. If you had done a python setup.py build_ext --inplace and then set your PYTHONPATH manually, you would have the same problem. Stop blaming setuptools for every little problem. Building inplace is not a setuptools feature. It's a distutils feature. The fact that we have header files in the package is a numpy feature. So what was Gael doing wrong? Was it the develop on this line? python setupegg.py develop I'm asking because of the upcoming release. I never use setupegg.py and I don't know what is going on here. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] f2py - how to use .pyf files?
On Fri, Jul 18, 2008 at 4:47 PM, Robert Kern [EMAIL PROTECTED] wrote: For publicly distributed packages, it does go against the grain. Never hard-code the compiler! Use a setup.cfg file, instead. Quite all right. But this was for in-house code where a group has agreed to all use the same compiler. It's basically a matter of wanting ./setup.py install to work without further flags. More generically, the way that f2py_options work is kind of confusing, since it doesn't just pass options to f2py :) In any case, I'm very grateful for all your help, but I get the sense that all of this distutils/f2py stuff would greatly benefit from clearer documentation. I imagine the doc team is already working on a pretty full plate, but if anyone tackles this particular issue, they'd be making a very useful contribution. Cheers, f ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [matplotlib-devel] Matplotlib and latest numpy
On Fri, Jul 18, 2008 at 19:02, Charles R Harris [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 5:53 PM, Robert Kern [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 18:49, Gael Varoquaux [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 06:44:21PM -0500, Robert Kern wrote: It's not a setuptools issue at all. build_ext --inplace just doesn't install header files. Does that mean that python setup.py develop should be banned for numpy? No. If so I consider it a setuptools issue: one more caveat to learn about setuptools, ie the fact that setuptools develop is not reliable and can lead to interested bugs without conplaining at all. IT'S NOT A SETUPTOOLS ISSUE. If you had done a python setup.py build_ext --inplace and then set your PYTHONPATH manually, you would have the same problem. Stop blaming setuptools for every little problem. Building inplace is not a setuptools feature. It's a distutils feature. The fact that we have header files in the package is a numpy feature. So what was Gael doing wrong? Was it the develop on this line? python setupegg.py develop I'm asking because of the upcoming release. I never use setupegg.py and I don't know what is going on here. The code generation logic in numpy does not know anything about build_ext --inplace (which is a distutils feature that develop invokes). That's it. This problem has always existed for all versions of numpy whether you use setuptools or not. The logic in numpy/core/setup.py that places the generated files needs to be fixed if you want to fix this issue. I'm testing out a fix right now. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] f2py - how to use .pyf files?
On Fri, Jul 18, 2008 at 19:04, Fernando Perez [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 4:47 PM, Robert Kern [EMAIL PROTECTED] wrote: For publicly distributed packages, it does go against the grain. Never hard-code the compiler! Use a setup.cfg file, instead. Quite all right. But this was for in-house code where a group has agreed to all use the same compiler. It's basically a matter of wanting ./setup.py install to work without further flags. Right. setup.cfg is still the way to go. More generically, the way that f2py_options work is kind of confusing, since it doesn't just pass options to f2py :) Probably. What other options would you need, though? If everything is accessible from setup.cfg, I'd rather just remove the argument. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] f2py - how to use .pyf files?
On Fri, Jul 18, 2008 at 5:15 PM, Robert Kern [EMAIL PROTECTED] wrote: Probably. What other options would you need, though? If everything is accessible from setup.cfg, I'd rather just remove the argument. I'd be OK with that too, and simply telling people to ship a companion setup.cfg. It's just that the current situation is confusing and error-prone. One way to do it and all that... Cheers, f ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [matplotlib-devel] Matplotlib and latest numpy
On Fri, Jul 18, 2008 at 19:07, Robert Kern [EMAIL PROTECTED] wrote: The code generation logic in numpy does not know anything about build_ext --inplace (which is a distutils feature that develop invokes). That's it. This problem has always existed for all versions of numpy whether you use setuptools or not. The logic in numpy/core/setup.py that places the generated files needs to be fixed if you want to fix this issue. I'm testing out a fix right now. Fixed in r5452. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [matplotlib-devel] Matplotlib and latest numpy
On Fri, Jul 18, 2008 at 08:14:00PM -0500, Robert Kern wrote: The code generation logic in numpy does not know anything about build_ext --inplace (which is a distutils feature that develop invokes). That's it. This problem has always existed for all versions of numpy whether you use setuptools or not. The logic in numpy/core/setup.py that places the generated files needs to be fixed if you want to fix this issue. I'm testing out a fix right now. Fixed in r5452. Thanks a lot Rob, you rock. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [matplotlib-devel] Matplotlib and latest numpy
On Fri, Jul 18, 2008 at 7:14 PM, Robert Kern [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 19:07, Robert Kern [EMAIL PROTECTED] wrote: The code generation logic in numpy does not know anything about build_ext --inplace (which is a distutils feature that develop invokes). That's it. This problem has always existed for all versions of numpy whether you use setuptools or not. The logic in numpy/core/setup.py that places the generated files needs to be fixed if you want to fix this issue. I'm testing out a fix right now. Fixed in r5452. Great...Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] f2py - how to use .pyf files?
On Fri, Jul 18, 2008 at 5:20 PM, Fernando Perez [EMAIL PROTECTED] wrote: On Fri, Jul 18, 2008 at 5:15 PM, Robert Kern [EMAIL PROTECTED] wrote: Probably. What other options would you need, though? If everything is accessible from setup.cfg, I'd rather just remove the argument. I just remembered code I had with things like: # Add '--debug-capi' for verbose debugging of low-level # calls #f2py_options = ['--debug-capi'], that can be very useful when in debug hell. Would this go through via setup.cfg? I'd be OK with that too, and simply telling people to ship a companion setup.cfg. It's just that the current situation is confusing and error-prone. One way to do it and all that... BTW, I would have thought this file maqroll[felipe_fortran] cat setup.cfg [config_fc] fcompiler = gnu95 would do the trick. But it doesn't seem to: maqroll[felipe_fortran] ./setup.py build_ext --inplace ... gnu: no Fortran 90 compiler found customize GnuFCompiler using build_ext building 'seglib' extension ... error: f90 not supported by GnuFCompiler needed for Text2.f90 I'm sure I'm doing something blindingly wrong, but reading the distutils docs http://docs.python.org/inst/config-syntax.html made me think that the way to override the config_fc option is precisely the above. Thanks for any help... All this would be good to include in the end in a little example we ship... Cheers, f ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Another reference count leak: ticket #848
Looking at the uses of PyArray_FromAny I can see the motivation for this design: core/include/numpy/ndarrayobject.h has a lot of calls which take a value returned by PyArray_DescrFromType as argument. This has prompted me to take a trawl through the code to see what else is going on, and I note a couple more issues with patches below. The core issue is that NumPy grew out of Numeric. In Numeric PyArray_Descr was just a C-structure, but in NumPy it is now a real Python object with reference counting. Trying to have a compatible C-API to the old one and making the transition with out huge changes to the C-API is what led to the stealing strategy. I did not just out the blue decide to do it that way. Yes, it is a bit of a pain, and yes, it isn't the way it is always done, but as you point out there are precedents, and so that's the direction I took. It is *way* too late to change that in any meaningful way In the patch below the problem being fixed is that the first call to PyArray_FromAny can result in the erasure of dtype *before* Py_INCREF is called. Perhaps you can argue that this only occurs when NULL is returned... Indeed I would argue that because the array object holds a reference to the typecode (data-type object). Only if the call returns NULL does the data-type object lose it's reference count, and in fact that works out rather nicely and avoids a host of extra Py_DECREFs. The next patch deals with an interestingly subtle memory leak in _string_richcompare where if casting to a common type fails then a reference count will leaked. Actually this one has nothing to do with PyArray_FromAny, but I spotted it in passing. This is a good catch. Thanks! I really don't think that this design of reference count handling in PyArray_FromAny (and consequently PyArray_CheckFromAny) is a good idea. Your point is well noted, but again given the provenance of the code, I still think it was the best choice. And, yes, it is too late to change it. Not only is this not a good idea, it's not documented in the API documentation (I'm referring to the Guide to NumPy book) Hmmm... Are you sure? I remember writing a bit about in the paragraphs that describe the relevant API calls. But, you could be right. I've been trying to find some documentation on stealing references. The Python C API reference (http://docs.python.org/api/refcountDetails.html) says Few functions steal references; the two notable exceptions are PyList_SetItem() and PyTuple_SetItem() An interesting essay on reference counting is at http://lists.blender.org/pipermail/bf-python/2005-September/003092.html Believe me, I understand reference counting pretty well. Still, it can be tricky to do correctly and it is easy to forget corner cases and error-returns. I very much appreciate your careful analysis, but I did an analysis of my own when I wrote the code, and so I will be resistant to change things if I can't see the error. I read something from Guido once that said something to the effect that nothing beats studying the code to get reference counting right. I think this is true. In conclusion, I can't find much about the role of stealing in reference count management, but it's such a source of surprise (and frankly doesn't actually work out all that well in numpy) I strongly beg to differ. This sounds very naive to me.IMHO it has worked out extremely well in converting the PyArray_Descr C-structure into the data-type objects that adds so much power to NumPy. Yes, there are a few corner cases that you have done an excellent job in digging up, but they are corner cases that don't cause problems for 99.9% of the use-cases. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
Michael Abbott wrote: Only half of my patch for this bug has gone into trunk, and without the rest of my patch there remains a leak. Thanks for your work Michael. I've been so grateful to have you and Chuck and others looking carefully at the code to fix its problems. In this particular case, I'm not sure I see how (the rest of) your patch fixes any remaining leak. We do seem to be having a disagreement about whether or not the reference to typecode can be pre-maturely destroyed, but this doesn't fit what I usually call a memory leak. I think there may be some other cause for remaining leaks. I can see that there might be an argument that PyArray_FromAny has the semantics that it retains a reference to typecode unless it returns NULL ... but I don't want to go there. That would not be a good thing to rely on -- and even with those semantics the existing code still needs fixing. Good, that is the argument I'm making. Why don't you want to rely on it? Thanks for all your help. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
Charles R Harris wrote: The reference leak seems specific to the float32 and complex64 types called with default arguments. In [1]: import sys, gc In [2]: t = float32 In [3]: sys.getrefcount(dtype(t)) Out[3]: 4 In [4]: for i in range(10) : t(); ...: In [5]: sys.getrefcount(dtype(t)) Out[5]: 14 In [6]: for i in range(10) : t(0); ...: In [7]: sys.getrefcount(dtype(t)) Out[7]: 14 In [8]: t = complex64 In [9]: sys.getrefcount(dtype(t)) Out[9]: 4 In [10]: for i in range(10) : t(); : In [11]: sys.getrefcount(dtype(t)) Out[11]: 14 In [12]: t = float64 In [13]: sys.getrefcount(dtype(t)) Out[13]: 19 In [14]: for i in range(10) : t(); : In [15]: sys.getrefcount(dtype(t)) Out[15]: 19 This shouldn't actually leak any memory as these types are singletons, but it points up a logic flaw somewhere. That is correct. There is no memory leak, but we do need to get it right. I appreciate the extra eyes on some of these intimate details. What can happen (after a lot of calls) is that the reference count can wrap around to 0 and then cause a funny dealloc (actually, the dealloc won't happen, but a warning will be printed). Fixing the reference counting issues has been the single biggest difficulty in converting PyArray_Descr from a C-structure to a regular Python object. It was a very large pain to begin with, and then there has been more code added since the original conversion some of which does not do reference counting correctly (mostly my fault). I've looked over ticket #848 quite a bit and would like to determine the true cause of the growing reference count which I don't believe is fixed by the rest of the patch that is submitted there. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
On Fri, Jul 18, 2008 at 9:15 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Michael Abbott wrote: Only half of my patch for this bug has gone into trunk, and without the rest of my patch there remains a leak. Thanks for your work Michael. I've been so grateful to have you and Chuck and others looking carefully at the code to fix its problems. In this particular case, I'm not sure I see how (the rest of) your patch fixes any remaining leak. We do seem to be having a disagreement about whether or not the reference to typecode can be pre-maturely destroyed, but this doesn't fit what I usually call a memory leak. I think there may be some other cause for remaining leaks. Travis, There really is (at least) one reference counting error in PyArray_FromAny. In particular, the obj == NULL case leaves a reference to typecode, then exits through the first return after finish. In this case robj doesn't steal a reference to typecode and the result can be seen in the python program above or by printing out the typecode-ob_refcnt from the code itself. So that needs fixing. I would suggest a DECREF in that section and a direct return of robj. The next section before finish is also a bit odd. The direct return of an array works fine, but if that isn't the branch taken, then PyArray_Return decrements the refcnt of arr, which in turn decrements the refcnt of typecode. I don't know if the resulting scalar holds a reference to typecode, but in any case the situation there should also be clarified. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
Charles R Harris wrote: On Fri, Jul 18, 2008 at 9:15 PM, Travis E. Oliphant [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Michael Abbott wrote: Only half of my patch for this bug has gone into trunk, and without the rest of my patch there remains a leak. Thanks for your work Michael. I've been so grateful to have you and Chuck and others looking carefully at the code to fix its problems. In this particular case, I'm not sure I see how (the rest of) your patch fixes any remaining leak. We do seem to be having a disagreement about whether or not the reference to typecode can be pre-maturely destroyed, but this doesn't fit what I usually call a memory leak. I think there may be some other cause for remaining leaks. Travis, There really is (at least) one reference counting error in PyArray_FromAny. In particular, the obj == NULL case leaves a reference to typecode, then exits through the first return after finish. In this case robj doesn't steal a reference to typecode and the result can be seen in the python program above or by printing out the typecode-ob_refcnt from the code itself. So that needs fixing. I would suggest a DECREF in that section and a direct return of robj. The next section before finish is also a bit odd. The direct return of an array works fine, but if that isn't the branch taken, then PyArray_Return decrements the refcnt of arr, which in turn decrements the refcnt of typecode. I don't know if the resulting scalar holds a reference to typecode, but in any case the situation there should also be clarified. Thank you. I will direct attention there and try to clear this up tonight. I know Michael is finding problems that do need fixing. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
On Fri, Jul 18, 2008 at 9:30 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Charles R Harris wrote: The reference leak seems specific to the float32 and complex64 types called with default arguments. In [1]: import sys, gc In [2]: t = float32 In [3]: sys.getrefcount(dtype(t)) Out[3]: 4 In [4]: for i in range(10) : t(); ...: In [5]: sys.getrefcount(dtype(t)) Out[5]: 14 In [6]: for i in range(10) : t(0); ...: In [7]: sys.getrefcount(dtype(t)) Out[7]: 14 In [8]: t = complex64 In [9]: sys.getrefcount(dtype(t)) Out[9]: 4 In [10]: for i in range(10) : t(); : In [11]: sys.getrefcount(dtype(t)) Out[11]: 14 In [12]: t = float64 In [13]: sys.getrefcount(dtype(t)) Out[13]: 19 In [14]: for i in range(10) : t(); : In [15]: sys.getrefcount(dtype(t)) Out[15]: 19 This shouldn't actually leak any memory as these types are singletons, but it points up a logic flaw somewhere. That is correct. There is no memory leak, but we do need to get it right. I appreciate the extra eyes on some of these intimate details. What can happen (after a lot of calls) is that the reference count can wrap around to 0 and then cause a funny dealloc (actually, the dealloc won't happen, but a warning will be printed). Fixing the reference counting issues has been the single biggest difficulty in converting PyArray_Descr from a C-structure to a regular Python object. It was a very large pain to begin with, and then there has been more code added since the original conversion some of which does not do reference counting correctly (mostly my fault). I've looked over ticket #848 quite a bit and would like to determine the true cause of the growing reference count which I don't believe is fixed by the rest of the patch that is submitted there. I've attached a test script. Chuck import sys, gc import numpy as np def main() : typecodes = ?bBhHiIlLqQfdgFDG for typecode in typecodes: t = np.dtype(typecode) print typecode, t.name refcnt = sys.getrefcount(t) t.type() print delta,sys.getrefcount(t) - refcnt print if __name__ == __main__ : main() ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] f2py - a recap
Howdy, today's exercise with f2py left some lessons learned, mostly thanks to Robert's excellent help, for which I'm grateful. I'd like to recap here what we have, mostly to decide what changes (if any) should go into numpy to make the experience less interesting for future users: - Remove the f2py_options flag from numpy.distutils.extension.Extension? If so, do options like '--debug_capi' get correctly passed via setup.cfg? This flag is potentially very confusing, because only *some* f2py options get actually set this way, while others need to be set in calls to config_fc. - How to properly set the compiler options in a setup.py file? Robert suggested the setup.cfg file, but this doesn't get picked up unless config_fc is explicitly called by the user: ./setup.py config_fc etc... This is perhaps a distutils problem, but I don't know if we can solve it more cleanly. It seems to me that it should be possible to provide a setup.py file that can be used simply as ./setup.py install (with the necessary setup.cfg file sitting next to it). I'm thinking here of what we need to do when showing how 'easy' these tools are for scientists migrating from matlab, for example. Obscure, special purpose incantations tend to tarnish our message of ease :) - Should the 'instead' word be removed from the f2py docs regarding the use of .pyf sources? It appears to be a mistake, which threw at least me for a loop for a while. - Why does f2py in the source tree have *both* a doc/ and a docs/ directory? It's really confusing to see this. f2py happens to be a very important tool, not just because scipy couldn't build without it, but also to position python as a credible integration language for scientific work. So I hope we can make using it as easy and robust as is technically feasible. Cheers, f ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
Charles R Harris wrote: On Fri, Jul 18, 2008 at 9:15 PM, Travis E. Oliphant [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Michael Abbott wrote: Only half of my patch for this bug has gone into trunk, and without the rest of my patch there remains a leak. Thanks for your work Michael. I've been so grateful to have you and Chuck and others looking carefully at the code to fix its problems. In this particular case, I'm not sure I see how (the rest of) your patch fixes any remaining leak. We do seem to be having a disagreement about whether or not the reference to typecode can be pre-maturely destroyed, but this doesn't fit what I usually call a memory leak. I think there may be some other cause for remaining leaks. Travis, There really is (at least) one reference counting error in PyArray_FromAny. In particular, the obj == NULL case leaves a reference to typecode, then exits through the first return after finish. In this case robj doesn't steal a reference to typecode and the result can be seen in the python program above or by printing out the typecode-ob_refcnt from the code itself. So that needs fixing. I would suggest a DECREF in that section and a direct return of robj. agreed! I'll commit the change. The next section before finish is also a bit odd. The direct return of an array works fine, but if that isn't the branch taken, then PyArray_Return decrements the refcnt of arr, which in turn decrements the refcnt of typecode. I don't know if the resulting scalar holds a reference to typecode, but in any case the situation there should also be clarified. This looks fine to me. At the PyArray_Return call, the typecode reference is held by the array. When it is decref'd the typecode is decref'd appropriately as well. The resulting scalar does *not* contain a reference to typecode. The scalar C-structure has no place to put it (it's just a PyObject_HEAD and the memory for the scalar value). Michael is correct that PyArray_Scalar does not change the reference count of typecode (as the comments above that function indicates). I tried to be careful to put comments near the functions that deal with PyArray_Descr objects to describe how they affect reference counting. I also thought I put that in my book. -Travis -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
Charles R Harris wrote: On Fri, Jul 18, 2008 at 9:15 PM, Travis E. Oliphant [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Michael Abbott wrote: Only half of my patch for this bug has gone into trunk, and without the rest of my patch there remains a leak. Thanks for your work Michael. I've been so grateful to have you and Chuck and others looking carefully at the code to fix its problems. In this particular case, I'm not sure I see how (the rest of) your patch fixes any remaining leak. We do seem to be having a disagreement about whether or not the reference to typecode can be pre-maturely destroyed, but this doesn't fit what I usually call a memory leak. I think there may be some other cause for remaining leaks. Travis, There really is (at least) one reference counting error in PyArray_FromAny. In particular, the obj == NULL case leaves a reference to typecode, then exits through the first return after finish. In this case robj doesn't steal a reference to typecode and the result can be seen in the python program above or by printing out the typecode-ob_refcnt from the code itself. So that needs fixing. I would suggest a DECREF in that section and a direct return of robj. agreed! I'll commit the change. The next section before finish is also a bit odd. The direct return of an array works fine, but if that isn't the branch taken, then PyArray_Return decrements the refcnt of arr, which in turn decrements the refcnt of typecode. I don't know if the resulting scalar holds a reference to typecode, but in any case the situation there should also be clarified. This looks fine to me. At the PyArray_Return call, the typecode reference is held by the array. When it is decref'd the typecode is decref'd appropriately as well. The resulting scalar does *not* contain a reference to typecode. The scalar C-structure has no place to put it (it's just a PyObject_HEAD and the memory for the scalar value). Michael is correct that PyArray_Scalar does not change the reference count of typecode (as the comments above that function indicates). I tried to be careful to put comments near the functions that deal with PyArray_Descr objects to describe how they affect reference counting. I also thought I put that in my book. -Travis -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
I've attached a test script. Thank you! It looks like with that added DECREF, the reference count leak is gone.While it was a minor issue (it should be noted that reference counting errors on the built-in data-types won't cause issues), it is nice to clean these things up when we can. I agree that the arrtype_new function is hairy, and I apologize for that. The scalartypes.inc.src was written very quickly. I added a few more comments in the change to the function (and removed a hard-coded hackish multiply with one that takes into account the actual size of Py_UNICODE). -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
On Fri, Jul 18, 2008 at 11:35:50PM -0500, Travis E. Oliphant wrote: I've attached a test script. Thank you! It looks like with that added DECREF, the reference count leak is gone.While it was a minor issue (it should be noted that reference counting errors on the built-in data-types won't cause issues), it is nice to clean these things up when we can. Yes. I think it is worth thanking all of you who are currently putting a large effort on QA. This effort is very valuable to all of us, as having a robust underlying library on which you can unquestionably rely is priceless. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType
On Fri, Jul 18, 2008 at 10:04 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Charles R Harris wrote: On Fri, Jul 18, 2008 at 9:15 PM, Travis E. Oliphant [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Michael Abbott wrote: Only half of my patch for this bug has gone into trunk, and without the rest of my patch there remains a leak. Thanks for your work Michael. I've been so grateful to have you and Chuck and others looking carefully at the code to fix its problems. In this particular case, I'm not sure I see how (the rest of) your patch fixes any remaining leak. We do seem to be having a disagreement about whether or not the reference to typecode can be pre-maturely destroyed, but this doesn't fit what I usually call a memory leak. I think there may be some other cause for remaining leaks. Travis, There really is (at least) one reference counting error in PyArray_FromAny. In particular, the obj == NULL case leaves a reference to typecode, then exits through the first return after finish. In this case robj doesn't steal a reference to typecode and the result can be seen in the python program above or by printing out the typecode-ob_refcnt from the code itself. So that needs fixing. I would suggest a DECREF in that section and a direct return of robj. agreed! I'll commit the change. The next section before finish is also a bit odd. The direct return of an array works fine, but if that isn't the branch taken, then PyArray_Return decrements the refcnt of arr, which in turn decrements the refcnt of typecode. I don't know if the resulting scalar holds a reference to typecode, but in any case the situation there should also be clarified. This looks fine to me. At the PyArray_Return call, the typecode reference is held by the array. When it is decref'd the typecode is decref'd appropriately as well. The resulting scalar does *not* contain a reference to typecode. The scalar C-structure has no place to put it (it's just a PyObject_HEAD and the memory for the scalar value). I was thinking of just pulling the relevant part out of PyArray_Return and including it in the function, which would make what was going on quite explicit to anyone reading the code. Then maybe a direct return of robj as I think it is always going to be a scalar at that point. Michael is correct that PyArray_Scalar does not change the reference count of typecode (as the comments above that function indicates). I tried to be careful to put comments near the functions that deal with PyArray_Descr objects to describe how they affect reference counting. I also thought I put that in my book. Yep, it was a brain fart on my part. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion