Re: [Numpy-discussion] Proposal for changing the names of inverse trigonometrical/hyperbolic functions
A Monday 24 November 2008, Jarrod Millman escrigué: On Mon, Nov 24, 2008 at 10:45 AM, Francesc Alted [EMAIL PROTECTED] wrote: So, IMHO, I think it would be better to rename the inverse trigonometric functions from ``arc*`` to ``a*`` prefix. Of course, in order to do that correctly, one should add the new names and add a ``DeprecationWarning`` informing that people should start to use the new names. After two or three NumPy versions, the old function names can be removed safely. What people think? +1 It seems there is a fair amount of favor for adding the new names. There is some resistance to removing the old ones. I would be happy to deprecate the old ones, but leave them in until we release a new major release (i.e., NumPy 2.0.0). We could start creating a list of API/ABI clean-ups for whenever we find a compelling reason to release a new major version. In the meantime, we can leave the old names in and just add a deprecation note to the docs. Once we are ready to release 2.0, we can release a 1.x with deprecation warnings. Sounds like a plan. +1 on this. If there are worries about portability issues, I'd even let the old names in 2.0 (with the deprecation warning, of course), although if the 1.x series are going to live long time (say, at least, a year), I don't think this is going to be necessary. -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy on Mac OS X python 2.6
FYI, I can't reproduce David's failures on my machine (intel core2 duo w/ 10.5.5) * python 2.6 from macports * numpy svn 6098 * GCC 4.0.1 (Apple Inc. build 5488) I have only 1 failure: FAIL: test_umath.TestComplexFunctions.test_against_cmath -- Traceback (most recent call last): File /opt/local/lib/python2.6/site-packages/nose-0.10.4-py2.6.egg/ nose/case.py, line 182, in runTest self.test(*self.arg) File /Users/pierregm/Computing/.pythonenvs/default26/lib/python2.6/ site-packages/numpy/core/tests/test_umath.py, line 423, in test_against_cmath assert abs(a - b) atol, %s %s: %s; cmath: %s%(fname,p,a,b) AssertionError: arcsin 2: (1.57079632679-1.31695789692j); cmath: (1.57079632679+1.31695789692j) -- (Well, there's another one in numpy.ma.min, but that's a different matter). On Nov 25, 2008, at 2:19 AM, David Cournapeau wrote: On Mon, 2008-11-24 at 22:06 -0700, Charles R Harris wrote: Well, it may not be that easy to figure. The (generated) pyconfig-32.h has /* Define to 1 if your processor stores words with the most significant byte first (like Motorola and SPARC, unlike Intel and VAX). The block below does compile-time checking for endianness on platforms that use GCC and therefore allows compiling fat binaries on OSX by using '-arch ppc -arch i386' as the compile flags. The phrasing was choosen such that the configure-result is used on systems that don't use GCC. */ #ifdef __BIG_ENDIAN__ #define WORDS_BIGENDIAN 1 #else #ifndef __LITTLE_ENDIAN__ /* #undef WORDS_BIGENDIAN */ #endif #endif Hm, interesting: just by grepping, I do have WORDS_BIGENDIAN defined to 1 on *both* python 2.5 and python 2.6 on Mac OS X (running Intel). Looking closer, I do have the above code (conditional) in 2.5, but not in 2.6: it is inconditionally defined to BIGENDIAN on 2.6 !! That's actually part of something I have wondered for quite some time about fat binaries: how do you handle config headers, since they are generated only once for every fat binary, but they should really be generated for each arch. And I guess that __BIG_ENDIAN__ is a compiler flag, it isn't in any of the include files. In any case, this looks like a Python bug or the Python folks have switched their API on us. Hm, actually, it is a bug in numpy as much as in python: python should NOT include any config.h in their public namespace, and we should not rely on it. But with this info, it should be relatively easy to fix (by setting the correct endianness by ourselves with some detection code) David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy on Mac OS X python 2.6
Pierre GM wrote: FYI, I can't reproduce David's failures on my machine (intel core2 duo w/ 10.5.5) * python 2.6 from macports I think that's the main difference. I feel more and more that the problem is linked to fat binaries (more exactly multi arch build in one autoconf run: since only one pyconfig.h is generated for all archs, only one value is defined for CPU specific configurations). On my machine, pyconfig.h has WORDS_BIGENDIAN defined to one, which I can only explain by the binary being built on ppc (unfortunately, I can't find this information from python itself - maybe in the release notes). And that cannot work on Intel. The general solution would be to generate different arch specific config files, and import them conditionally in the main config file. But doing so in a platform-neutral manner is not trivial, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] PIL.Image.fromarray bug in numpy interface
2008/11/24 Chris Barker [EMAIL PROTECTED]: Robert Kern wrote: Jim Vickroy wrote: While using the PIL interface to numpy, I rediscovered a logic error in the PIL.Image.fromarray() procedure. The problem (and a solution) was mentioned earlier at: Tell them that we approve of the change. We don't have commit access to PIL, so I believe that our approval is the only reason they could possibly send you over here. Just for the record, it was me that sent him over here. I thought it would be good for a numpy dev to check out the patch for correctnesses -- it looked like a numpy API issue, and I figured Fredrik wouldn't want to look too hard at it to determine if it was correct. so if you approval means you've looked at the fix and think it's correct, great! I also submitted an issue in 2007: http://mail.python.org/pipermail/image-sig/2007-August/004570.html I recently reminded Frederik, who replied: The NumPy support was somewhat broken and has been partially rewritten for PIL 1.2; I'll compare those fixes with your patch when I find the time. So, I guess we should try the latest PIL and see if the problems are still there? Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] working on multiple matrices of the same shape
2008/11/24 Sébastien Barthélemy [EMAIL PROTECTED]: Are you sure ? Here it reports ValueError: setting an array element with a sequence. probably because theta, sintheta and costheta are 1-d arrays of n1 elements. Sorry, I missed that detail. Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] CorePy 1.0 Release (x86, Cell BE, BSD!)
Exactly what I thought this morning ;) I'm reading your PhD thesis, Chris, it's great ! Matthieu 2008/11/25 Brian Granger [EMAIL PROTECTED]: Chris, Wow, this is fantastic...both the BSD license and the x86 support. I look forward to playing with this! Cheers, Brian On Mon, Nov 24, 2008 at 7:49 PM, Chris Mueller [EMAIL PROTECTED] wrote: Announcing CorePy 1.0 - http://www.corepy.org We are pleased to announce the latest release of CorePy. CorePy is a complete system for developing machine-level programs in Python. CorePy lets developers build and execute assembly-level programs interactively from the Python command prompt, embed them directly in Python applications, or export them to standard assembly languages. CorePy's straightforward APIs enable the creation of complex, high-performance applications that take advantage of processor features usually inaccessible from high-level scripting languages, such as multi-core execution and vector instruction sets (SSE, VMX, SPU). This version addresses the two most frequently asked questions about CorePy: 1) Does CorePy support x86 processors? Yes! CorePy now has extensive support for 32/64-bit x86 and SSE ISAs on Linux and OS X*. 2) Is CorePy Open Source? Yes! CorePy now uses the standard BSD license. Of course, CorePy still supports PowerPC and Cell BE SPU processors. In fact, for this release, the Cell run-time was redesigned from the ground up to remove the dependency on IBM's libspe and now uses the system-level interfaces to work directly with the SPUs (and, CorePy is still the most fun way to program the PS3). CorePy is written almost entirely in Python. Its run-time system does not rely on any external compilers or assemblers. If you have the need to write tight, fast code from Python, want to demystify machine-level code generation, or just miss the good-old days of assembly hacking, check out CorePy! And, if you don't believe us, here's our favorite user quote: CorePy makes assembly fun again! __credits__ = CorePy is developed by Chris Mueller, Andrew Friedley, and Ben Martin and is supported by the Open Systems Lab at Indiana University. Chris can be reached at cmueller[underscore]dev[at]yahoo[dot]com. __footnote__ = *Any volunteers for a Windows port? :) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] More loadtxt() changes
Hi, I have a couple more changes to loadtxt() that I'd like to code up in time for 1.3, but I thought I should run them by the list before doing too much work. These are already implemented in some fashion in matplotlib.mlab.csv2rec(), but the code bases are different enough, that pretty much only the idea can be lifted. All of these changes would be done in a manner that is backwards compatible with the current API. 1) Support for setting the names of fields in the returned structured array without using dtype. This can be a passed in list of names or reading the names of fields from the first line of the file. Many files have a header line that gives a name for each column. Adding this would obviously make loadtxt much more general and allow for more generic code, IMO. My current thinking is to add a *name* keyword parameter that defaults to None, for no support for reading names. Setting it to True would tell loadtxt() to read the names from the first line (after skiprows). The other option would be to set names to a list of strings. 2) Support for automatic dtype inference. Instead of assuming all values are floats, this would try a list of options until one worked. For strings, this would keep track of the longest string within a given field before setting the dtype. This would allow reading of files containing a mixture of types much more easily, without having to go to the trouble of constructing a full dtype by hand. This would work alongside any custom converters one passes in. My current thinking of API would just be to add the option of passing the string 'auto' as the dtype parameter. 3) Better support for missing values. The docstring mentions a way of handling missing values by passing in a converter. The problem with this is that you have to pass in a converter for *every column* that will contain missing values. If you have a text file with 50 columns, writing this dictionary of converters seems like ugly and needless boilerplate. I'm unsure of how best to pass in both what values indicate missing values and what values to fill in their place. I'd love suggestions Here's an example of my use case (without 50 columns): ID,First Name,Last Name,Homework1,Homework2,Quiz1,Homework3,Final 1234,Joe,Smith,85,90,,76, 5678,Jane,Doe,65,99,,78, 9123,Joe,Plumber,45,90,,92, Currently reading in this code requires a bit of boilerplace (declaring dtypes, converters). While it's nothing I can't write, it still would be easier to write it once within loadtxt and have it for everyone. Any support for *any* of these ideas? Any suggestions on how the user should pass in the information? Thanks, Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy on Mac OS X python 2.6
On Tue, Nov 25, 2008 at 10:55 PM, David Cournapeau [EMAIL PROTECTED] wrote: I used the path of least resistance: instead of using the WORDS_BIGENDIAN macro, I added a numpy header which gives the endianness every time it is included. IOW, instead of the endianness to be fixed at numpy build time (which would fail for universal builds), it is set everytime the numpy headers are included (which is the only way to make it work). A better solution IMO would be to avoid any endianness dependency at all in the headers, but that does not seem possible without breaking the API (because the endianness-related macro PyArray_NBO and co would need to be set as functions instead). Hm, for reference, I came across this: http://www.mail-archive.com/[EMAIL PROTECTED]/msg14382.html So some people thought about the same problem. David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy on Mac OS X python 2.6
On Tue, Nov 25, 2008 at 8:03 AM, David Cournapeau [EMAIL PROTECTED]wrote: On Tue, Nov 25, 2008 at 10:55 PM, David Cournapeau [EMAIL PROTECTED] wrote: I used the path of least resistance: instead of using the WORDS_BIGENDIAN macro, I added a numpy header which gives the endianness every time it is included. IOW, instead of the endianness to be fixed at numpy build time (which would fail for universal builds), it is set everytime the numpy headers are included (which is the only way to make it work). A better solution IMO would be to avoid any endianness dependency at all in the headers, but that does not seem possible without breaking the API (because the endianness-related macro PyArray_NBO and co would need to be set as functions instead). Hm, for reference, I came across this: http://www.mail-archive.com/[EMAIL PROTECTED]/msg14382.html So some people thought about the same problem. Apart from the Mac, the ppc can be configured to run either bigendian or littleendian, so the hardware encompasses more than just the cpu, it's the whole darn board. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bilateral filter
Hi Nadav 2008/8/6 Nadav Horesh [EMAIL PROTECTED]: I made the following modification to the source code, I hope it is ready to be included in scipy. Added a BSD licence declaration. Small optimisation. The code is split into a cython back-end and a python front-end. All remarks are welcome, Thanks for working on a bilateral filter implementation. Some comments: 1. Needs a setup.py file to build the Cython module (simplest possible is attached) 2. numpy.numarray.nd_image should be scipy.ndimage 3. For inclusion in SciPy, we'll need some tests and preferably some examples. 4. Docstrings should be in SciPy format. 5. ndarray.h should be numpy/ndarray.h Thanks for writing this filter; I found it useful! Cheers Stéfan from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext import numpy setup( cmdclass = {'build_ext': build_ext}, ext_modules = [Extension(bilateral_base, [bilateral_base.pyx], include_dirs = [numpy.get_include()], extra_compile_args=['-O3']) ] ) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal for changing the names of inverse trigonometrical/hyperbolic functions
On Nov 24, 2008, at 5:55 PM, Jarrod Millman wrote: On Mon, Nov 24, 2008 at 10:45 AM, Francesc Alted [EMAIL PROTECTED] wrote: So, IMHO, I think it would be better to rename the inverse trigonometric functions from ``arc*`` to ``a*`` prefix. Of course, in order to do that correctly, one should add the new names and add a ``DeprecationWarning`` informing that people should start to use the new names. After two or three NumPy versions, the old function names can be removed safely. What people think? +1 It seems there is a fair amount of favor for adding the new names. There is some resistance to removing the old ones. I would be happy to deprecate the old ones, but leave them in until we release a new major release (i.e., NumPy 2.0.0). We could start creating a list of API/ABI clean-ups for whenever we find a compelling reason to release a new major version. In the meantime, we can leave the old names in and just add a deprecation note to the docs. Once we are ready to release 2.0, we can release a 1.x with deprecation warnings. I tend to favor this approach. Perry ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal for changing the names of inverse trigonometrical/hyperbolic functions
On 24 Nov 2008, at 19:45 , Francesc Alted wrote: standards in computer science. For example, where Python writes: asin, acos, atan, asinh, acosh, atanh NumPy choose: arcsin, arccos, arctan, arcsinh, arccosh, arctanh So, IMHO, I think it would be better to rename the inverse trigonometric functions from ``arc*`` to ``a*`` prefix. -1 The current slightly deviating (and in fact more clear) naming convention of Numpy is IMO not even remotely enough reason to break the API. Adding honey by introducing a transition period with a deprecation warning postpones but doesn't avoid breaking the API. Joris Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
Ryan, FYI, I've been coding over the last couple of weeks an extension of loadtxt for a better support of masked data, with the option to read column names in a header. Please find an example below (I also have unittest). Most of the work is actually inspired from matplotlib's mlab.csv2rec. It might be worth not duplicating efforts. Cheers, P. :mod:`_preview` A collection of utilities from incoming versions of numpy.ma import itertools import numpy as np import numpy.ma as ma _string_like = np.lib.io._string_like def _to_filehandle(fname, flag='r', return_opened=False): Returns the filehandle corresponding to a string or a file. If the string ends in '.gz', the file is automatically unzipped. Parameters -- fname : string, filehandle Name of the file whose filehandle must be returned. flag : string, optional Flag indicating the status of the file ('r' for read, 'w' for write). return_opened : boolean, optional Whether to return the opening status of the file. if _string_like(fname): if fname.endswith('.gz'): import gzip fhd = gzip.open(fname, flag) else: fhd = file(fname, flag) opened = True elif hasattr(fname, 'seek'): fhd = fname opened = False else: raise ValueError('fname must be a string or file handle') if return_opened: return fhd, opened return fhd def flatten_dtype(dtp): Unpack a structured data-type. if dtp.names is None: return [dtp] else: types = [] for field in dtp.names: (typ, _) = dtp.fields[field] flat_dt = flatten_dtype(typ) types.extend(flat_dt) return types class LineReader: File reader that automatically split each line. This reader behaves like an iterator. Parameters -- fhd : filehandle File handle of the underlying file. comment : string, optional The character used to indicate the start of a comment. delimiter : string, optional The string used to separate values. By default, this is any whitespace. # def __init__(self, fhd, comment='#', delimiter=None): self.fh = fhd self.comment = comment self.delimiter = delimiter if delimiter == ' ': self.delimiter = None # def close(self): Close the current reader. self.fh.close() # def seek(self, arg): Moves to a new position in the file. See Also file.seek self.fh.seek(arg) # def splitter(self, line): Splits the line at each current delimiter. Comments are stripped beforehand. line = line.split(self.comment)[0].strip() delimiter = self.delimiter if line: return line.split(delimiter) else: return [] # def next(self): Moves to the next line or raises :exc:StopIteration. return self.splitter(self.fh.next()) # def __iter__(self): for line in self.fh: yield self.splitter(line) def readline(self): Returns the next line of the file, splitted at the delimiter and stripped of comments. return self.splitter(self.fh.readline()) def skiprows(self, nbrows=1): Skips `nbrows` from the file. for i in range(nbrows): self.fh.readline() def get_first_valid_row(self): Returns the values in the first valid (uncommented and not empty) line of the file. first_values = None while not first_values: first_line = self.fh.readline() if first_line == '': # EOF reached raise IOError('End-of-file reached before encountering data.') first_values = self.splitter(first_line) return first_values itemdictionary = {'return': 'return_', 'file':'file_', 'print':'print_' } def process_header(headers): Validates a list of strings to use as field names. The strings are stripped of any non alphanumeric character, and spaces are replaced by `_` # # Define the characters to delete from the headers delete = set([EMAIL PROTECTED]*()-=+~\|]}[{';: /?.,) delete.add('') names = [] seen = dict() for i, item in enumerate(headers): item = item.strip().lower().replace(' ', '_') item = ''.join([c for c in item if c not in delete]) if not len(item): item = 'column%d' % i item = itemdictionary.get(item, item) cnt = seen.get(item, 0) if cnt 0: names.append(item + '_%d'%cnt) else: names.append(item) seen[item] = cnt+1 return
[Numpy-discussion] in(np.nan) on python 2.6
All, Sorry to bump my own post, and I was kinda threadjacking anyway: Some functions of numy.ma (eg, ma.max, ma.min...) accept explicit outputs that may not be MaskedArrays. When such an explicit output is not a MaskedArray, a value that should have been masked is transformed into np.nan. That worked great in 2.5, with np.nan automatically transformed to 0 when the explicit output had a int dtype. With Python 2.6, a ValueError is raised instead, as np.nan can no longer be casted to int. What should be the recommended behavior in this case ? Raise a ValueError or some other exception, to follow the new Python2.6 convention, or silently replace np.nan by some value acceptable by int dtype (0, or something else) ? Thanks for any suggestion, P. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
Pierre GM wrote: FYI, I've been coding over the last couple of weeks an extension of loadtxt for a better support of masked data, with the option to read column names in a header. Please find an example below great, thanks! this could be very useful to me. Two comments: missing : string, optional A string representing a missing value, irrespective of the column where it appears (e.g., ``'missing'`` or ``'unused'``. It might be nice if missing could be a sequence of strings, if there is more than one value for missing values, that are not clearly mapped to a particular field. missing_values : {None, dictionary}, optional A dictionary mapping a column number to a string indicating whether the corresponding field should be masked. would it possible to specify column header, rather than number here? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy on Mac OS X python 2.6
On Wed, Nov 26, 2008 at 12:59 AM, Charles R Harris [EMAIL PROTECTED] wrote: Apart from the Mac, the ppc can be configured to run either bigendian or littleendian, so the hardware encompasses more than just the cpu, it's the whole darn board. Yep, many CPU families have double endian support (MIPS, ARM, PA-RISC, ALPHA. There is also mixed endian. Honestly, I think it is safe to assume that we don't need to care so much about those configurations for the time being. If it is a problem, we can then discuss about our headers being endian-free (which is really the best approach). David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
On Nov 25, 2008, at 12:30 PM, Christopher Barker wrote: missing : string, optional A string representing a missing value, irrespective of the column where it appears (e.g., ``'missing'`` or ``'unused'``. It might be nice if missing could be a sequence of strings, if there is more than one value for missing values, that are not clearly mapped to a particular field. OK, easy enough. missing_values : {None, dictionary}, optional A dictionary mapping a column number to a string indicating whether the corresponding field should be masked. would it possible to specify column header, rather than number here? A la mlab.csv2rec ? It could work with a bit more tweaking, basically following John Hunter's et al. path. What happens when the column names are unknown (read from the header) or wrong ? Actually, I'd like John to comment on that, hence the CC. More generally, wouldn't be useful to push the recarray manipulating functions from matplotlib.mlab to numpy ? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
Pierre GM wrote: would it possible to specify column header, rather than number here? A la mlab.csv2rec ? I'll have to take a look at that. following John Hunter's et al. path. What happens when the column names are unknown (read from the header) or wrong ? well, my use case is that I don't know column numbers, but I do now column headers, and what missing value is associated with a give header. You have to know something! if the header is wrong, you get an error, though we may need to decide what wrong means. In my case, I'm dealing with data that has pre-specified headers (and I think missing values that go with them), but in any given file I don't know which of those columns is there. I want to read it in, and be able to query the result for what data it has. Actually, I'd like John to comment on that, hence the CC. I don't see a CC ,but yes, it would be nice to get his input. More generally, wouldn't be useful to push the recarray manipulating functions from matplotlib.mlab to numpy ? I think so -- or scipy. I 'd really like MPL to be about plotting, and only plotting. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.ma.sort failing with bus error
Thx Pierre, don't worry about it it's not a show stopper at all C. On Nov 24, 2008, at 12:04 PM, Pierre GM wrote: Charles, Confirmed on my machine... I gonna have to clean ma.sort, as there are indeed some temporaries that probably don't need to be created. I must warn you however that I won;t have a lot of time to spend on that in the next few days. In any case, of course, I'll keep you posted. Thx for reporting! On Nov 24, 2008, at 12:03 PM, Charles سمير Doutriaux wrote: i mistyped the second line of the sample failing script it should obviously read: a=numpy.ma.ones((16800,60,96),'f') not numpy.ma.sort((16800,60,96),'f') C. On Nov 24, 2008, at 8:40 AM, Charles سمير Doutriaux wrote: Hello, Using numpy 1.2.1 on a mac os 10.5 I admit the user was sort of stretching the limits but (on his machine) import numpy a=numpy.ones((16800,60,96),'f') numpy.sort(a,axis=0) works import numpy.ma a=numpy.ma.sort((16800,60,96),'f') numpy.ma.sort(a,axis=0) failed with some malloc error: python(435) malloc: *** mmap(size=2097152) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug Bus error Since there's no mask I don't really see how much more memory it's using. Beside changing 16800 to 15800 still fails (and now that should be using much less memory) Anyhow I would expect i nicer error than a bus error :) Thx, C ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http:// projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
Pierre GM wrote: Ryan, FYI, I've been coding over the last couple of weeks an extension of loadtxt for a better support of masked data, with the option to read column names in a header. Please find an example below (I also have unittest). Most of the work is actually inspired from matplotlib's mlab.csv2rec. It might be worth not duplicating efforts. Cheers, P. Absolutely! Definitely don't want to duplicate effort here. What I see here meets a lot of what I was looking for. Here are some questions: 1) It looks like the function returns a structured array rather than a rec array, so that fields are obtained by doing a dictionary access. Since it's a dictionary access, is there any reason that the header needs to be munged to replace characters and reserved names? IIUC, csv2rec changes names b/c it returns a rec array, which uses attribute lookup and hence all names need to be valid python identifiers. This is not the case for a structured array. 2) Can we avoid the use of seek() in here? I just posted a patch to change the check to readline, which was the only file function used previously. This allowed the direct use of a file-like object returned by urllib2.urlopen(). 3) In order to avoid breaking backwards compatibility, can we change to default for dtype to be float32, and instead use some kind of special value ('auto' ?) to use the automatic dtype determination? I'm currently cooking up some of these changes myself, but thought I would see what you thought first. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
On Nov 25, 2008, at 2:06 PM, Ryan May wrote: 1) It looks like the function returns a structured array rather than a rec array, so that fields are obtained by doing a dictionary access. Since it's a dictionary access, is there any reason that the header needs to be munged to replace characters and reserved names? IIUC, csv2rec changes names b/c it returns a rec array, which uses attribute lookup and hence all names need to be valid python identifiers. This is not the case for a structured array. Personally, I prefer flexible ndarrays to recarrays, hence the output. However, I still think that names should be as clean as possible to avoid bad surprises down the road. 2) Can we avoid the use of seek() in here? I just posted a patch to change the check to readline, which was the only file function used previously. This allowed the direct use of a file-like object returned by urllib2.urlopen(). I coded that a couple of weeks ago, before you posted your patch and I didn't have tme to check it. Yes, we could try getting rid of seek. However, we need to find a way to rewind to the beginning of the file if the dtypes are not given in input (as we parsed the whole file to find the best converter in that case). 3) In order to avoid breaking backwards compatibility, can we change to default for dtype to be float32, and instead use some kind of special value ('auto' ?) to use the automatic dtype determination? I'm not especially concerned w/ backwards compatibility, because we're supporting masked values (something that np.loadtxt shouldn't have to worry about). Initially, I needed a replacement to the fromfile function in the scikits.timeseries.trecords package. I figured it'd be easier and more portable to get a function for generic masked arrays, that could be adapted afterwards to timeseries. In any case, I was more considering the functions I send you to be part of some numpy.ma.io module than a replacement to np.loadtxt. I tried to get the syntax as close as possible to np.loadtxt and mlab.csv2rec, but there'll always be some differences. So, yes, we could try to use a default dtype=float and yes, we could have an extra parameter 'auto'. But is it really that useful ? I'm not sure (well, no, I'm sure it's not...) I'm currently cooking up some of these changes myself, but thought I would see what you thought first. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
On Tue, Nov 25, 2008 at 12:16 PM, Pierre GM [EMAIL PROTECTED] wrote: A la mlab.csv2rec ? It could work with a bit more tweaking, basically following John Hunter's et al. path. What happens when the column names are unknown (read from the header) or wrong ? Actually, I'd like John to comment on that, hence the CC. More generally, wouldn't be useful to push the recarray manipulating functions from matplotlib.mlab to numpy ? Yes, I've said on a number of occasions I'd like to see these functions in numpy, since a number of them make more sense as numpy methods than as stand alone functions. What happens when the column names are unknown (read from the header) or wrong ? I'm not quite sure what you are looking for here. Either the user will have to know the correct column name or the column number or you should raise an error. I think supporting column names everywhere they make sense is critical since this is how most people think about these CSV-like files with column headers. One other thing that is essential for me is that date support is included. Virtually every CSV file I work with has date data in it, in a variety of formats, and I depend on csv2rec (via dateutil.parser.parse which mpl ships) to be able to handle it w/o any extra cognitive overhead, albeit at the expense of some performance overhead, but my files aren't too big. I'm not sure how numpy would handle the date parsing aspect, but this came up in the date datatype PEP discussion I think. For me, having to manually specify a date converter with the proper format string every time I load a CSV file is probably not viable. Another feature that is critical to me is to be able to get a np.recarray back instead of a record array. I use these all day long, and the convenience of r.date over r['date'] is too much for me to give up. Feel free to ignore these suggestions if they are too burdensome or not appropriate for numpy -- I'm just letting you know some of the things I need to see before I personally would stop using mlab.csv2rec and use numpy.loadtxt instead. One last thing, I consider the masked array support in csv2rec somewhat broken because when using a masked array you cannot get at the data (eg datetime methods or string methods) directly using the same interface that regular recarrays use. Pierre, last I brought this up you asked for some example code and indicated a willingness to work on it but I fell behind and never posted it. The code illustrating the problem is below. I'm really not sure what the right solution is, but the current implementation -- sometimes returning a plain-vanilla rec array, sometimes returning a masked record array -- with different interfaces is not good. Perhaps the best solution is to force the user to ask for masked support, and then always return a masked array whether any of the data is masked or not. csv2rec conditionally returns a masked array only if some of the data are masked, which makes it difficult to use. JDH Here is the problem I referred to above -- in f1 none of the rows are masked and so I can access the object attributes from the rows directly. In the 2nd example, row 3 has some missing data so I get an mrecords recarray back, which does not allow me to directly access the valid data methods. from StringIO import StringIO import matplotlib.mlab as mlab f1 = StringIO(\ date,name,age,weight 2008-10-12,'Bill',22,125. 2008-10-13,'Tom',23,135. 2008-10-14,'Sally',23,145. ) r1 = mlab.csv2rec(f1) row0 = r1[0] print row0.date.year, row0.name.upper() f2 = StringIO(\ date,name,age,weight 2008-10-12,'Bill',22,125. 2008-10-13,'Tom',23,135. 2008-10-14,'',,145. ) r2 = mlab.csv2rec(f2) row0 = r2[0] print row0.date.year, row0.name.upper() ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
Pierre GM wrote: On Nov 25, 2008, at 2:06 PM, Ryan May wrote: 1) It looks like the function returns a structured array rather than a rec array, so that fields are obtained by doing a dictionary access. Since it's a dictionary access, is there any reason that the header needs to be munged to replace characters and reserved names? IIUC, csv2rec changes names b/c it returns a rec array, which uses attribute lookup and hence all names need to be valid python identifiers. This is not the case for a structured array. Personally, I prefer flexible ndarrays to recarrays, hence the output. However, I still think that names should be as clean as possible to avoid bad surprises down the road. Ok, I'm not really partial to this, I just thought it would simplify. Your point is valid. 2) Can we avoid the use of seek() in here? I just posted a patch to change the check to readline, which was the only file function used previously. This allowed the direct use of a file-like object returned by urllib2.urlopen(). I coded that a couple of weeks ago, before you posted your patch and I didn't have tme to check it. Yes, we could try getting rid of seek. However, we need to find a way to rewind to the beginning of the file if the dtypes are not given in input (as we parsed the whole file to find the best converter in that case). What about doing the parsing and type inference in a loop and holding onto the already split lines? Then loop through the lines with the converters that were finally chosen? In addition to making my usecase work, this has the benefit of not doing the I/O twice. 3) In order to avoid breaking backwards compatibility, can we change to default for dtype to be float32, and instead use some kind of special value ('auto' ?) to use the automatic dtype determination? I'm not especially concerned w/ backwards compatibility, because we're supporting masked values (something that np.loadtxt shouldn't have to worry about). Initially, I needed a replacement to the fromfile function in the scikits.timeseries.trecords package. I figured it'd be easier and more portable to get a function for generic masked arrays, that could be adapted afterwards to timeseries. In any case, I was more considering the functions I send you to be part of some numpy.ma.io module than a replacement to np.loadtxt. I tried to get the syntax as close as possible to np.loadtxt and mlab.csv2rec, but there'll always be some differences. So, yes, we could try to use a default dtype=float and yes, we could have an extra parameter 'auto'. But is it really that useful ? I'm not sure (well, no, I'm sure it's not...) I understand you're not concerned with backwards compatibility, but with the exception of missing handling, which is probably specific to masked arrays, I was hoping to just add functionality to loadtxt(). Numpy doesn't need a separate text reader for most of this and breaking API for any of this is likely a non-starter. So while, yes, having float be the default dtype is probably not the most useful, leaving it also doesn't break existing code. -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
On Nov 25, 2008, at 2:26 PM, John Hunter wrote: Yes, I've said on a number of occasions I'd like to see these functions in numpy, since a number of them make more sense as numpy methods than as stand alone functions. Great. Could we think about getting that on for 1.3x, would you have time ? Or should we wait till early jan. ? One other thing that is essential for me is that date support is included. As I mentioned in an earlier post, I needed to get a replacement for a function in scikits.timeseries, where we do need dates, but I also needed something not too specific for numpy.ma. So I thought about extracting the conversion methods from the bulk of the function and create this new object, StringConverter, that takes care of the conversion. If you need to add date support, the simplest is to extend your StringConverter to take the date/datetime functions just after you import _preview (or numpy.ma.io if we go that path) dateparser = dateutil.parser.parse # Update the StringConvert mapper, so that date-like columns are automatically # converted _preview.StringConverter.mapper.insert(-1, (dateparser, datetime.date(2000, 1, 1))) That way, if a date is found i one of the column, it'll be converted appropiately. Seems to work pretty well for scikits.timeseries, I'll try to post that in the next couples of weeks (once I ironed out some of the numpy.ma bugs...) Another feature that is critical to me is to be able to get a np.recarray back instead of a record array. I use these all day long, and the convenience of r.date over r['date'] is too much for me to give up. No problem: just take a view once you got your output. I thought about adding yet another parameter that'd take care of that directly, but then we end up with far too many keywords... One last thing, I consider the masked array support in csv2rec somewhat broken because when using a masked array you cannot get at the data (eg datetime methods or string methods) directly using the same interface that regular recarrays use. Well, it's more mrecords which is broken. I committed some fix a little while back, but it might not be very robust. I need to check that w/ your example. Perhaps the best solution is to force the user to ask for masked support, and then always return a masked array whether any of the data is masked or not. csv2rec conditionally returns a masked array only if some of the data are masked, which makes it difficult to use. Forcing to a flexible masked array would make quite sense if we pushed that function in numpy.ma.io. I don't think we should overload np.loadtxt too much anyway... On Nov 25, 2008, at 2:37 PM, Ryan May wrote: What about doing the parsing and type inference in a loop and holding onto the already split lines? Then loop through the lines with the converters that were finally chosen? In addition to making my usecase work, this has the benefit of not doing the I/O twice. You mean, filling a list and relooping on it if we need to ? Sounds like a plan, but doesn't it create some extra temporaries we may not want ? I understand you're not concerned with backwards compatibility, but with the exception of missing handling, which is probably specific to masked arrays, I was hoping to just add functionality to loadtxt(). Numpy doesn't need a separate text reader for most of this and breaking API for any of this is likely a non-starter. So while, yes, having float be the default dtype is probably not the most useful, leaving it also doesn't break existing code. Depends on how we do it. We could have a modified np.loadtxt that takes some of the ideas of the file I send you (the StringConverter, for example), then I could have a numpy.ma.io that would take care of the missing data. And something in scikits.timeseries for the dates... The new np.loadtxt could use the default of the initial one, or we could create yet another function (np.loadfromtxt) that would match what I was suggesting, and np.loadtxt would be a special stripped downcase with dtype=float by default. thoughts? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
On Nov 25, 2008, at 2:37 PM, Ryan May wrote: What about doing the parsing and type inference in a loop and holding onto the already split lines? Then loop through the lines with the converters that were finally chosen? In addition to making my usecase work, this has the benefit of not doing the I/O twice. You mean, filling a list and relooping on it if we need to ? Sounds like a plan, but doesn't it create some extra temporaries we may not want ? It shouldn't create any *extra* temporaries since we already make a list of lists before creating the final array. It just introduces an extra looping step. (I'd reuse the existing list of lists). Depends on how we do it. We could have a modified np.loadtxt that takes some of the ideas of the file I send you (the StringConverter, for example), then I could have a numpy.ma.io that would take care of the missing data. And something in scikits.timeseries for the dates... The new np.loadtxt could use the default of the initial one, or we could create yet another function (np.loadfromtxt) that would match what I was suggesting, and np.loadtxt would be a special stripped downcase with dtype=float by default. thoughts? My personal opinion is that if it doesn't make loadtxt too unwieldly, to just add a few of the options to loadtxt() itself. I'm working on tweaking loadtxt() to add the auto dtype and the names, relying heavily on your StringConverter class (nice code btw.). If my understanding of StringConverter is correct, tweaking the new loadtxt for ma or timeseries would only require passing in modified versions of StringConverter. I'll post that when I'm done and we can see if it looks like too much functionality stapled together or not. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
It shouldn't create any *extra* temporaries since we already make a list of lists before creating the final array. It just introduces an extra looping step. (I'd reuse the existing list of lists). Cool then, go for it. If my understanding of StringConverter is correct, tweaking the new loadtxt for ma or timeseries would only require passing in modified versions of StringConverter. Nope, we still need to double check whether there's any missing data in any field of the line we process, independently of the conversion. So there must be some extra loop involved, and I'd need a special function in numpy.ma to take care of that. So our options are * create a new function in numpy.ma and leave np.loadtxt like that * write a new np.loadtxt incorporating most of the ideas of the code I send, but I'd still need to adapt it to support masked values. I'll post that when I'm done and we can see if it looks like too much functionality stapled together or not. Sounds like a plan. Wouldn't mind getting more feedback from fellow users before we get too deep, however... ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
Pierre GM wrote: Nope, we still need to double check whether there's any missing data in any field of the line we process, independently of the conversion. So there must be some extra loop involved, and I'd need a special function in numpy.ma to take care of that. So our options are * create a new function in numpy.ma and leave np.loadtxt like that * write a new np.loadtxt incorporating most of the ideas of the code I send, but I'd still need to adapt it to support masked values. You couldn't run this loop on the array returned by np.loadtxt() (by masking on the appropriate fill value)? I'll post that when I'm done and we can see if it looks like too much functionality stapled together or not. Sounds like a plan. Wouldn't mind getting more feedback from fellow users before we get too deep, however... Agreed. Anyone? -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
On Nov 25, 2008, at 3:33 PM, Ryan May wrote: You couldn't run this loop on the array returned by np.loadtxt() (by masking on the appropriate fill value)? Yet an extra loop... Doable, yes... But meh. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
On Tue, Nov 25, 2008 at 2:01 PM, Pierre GM [EMAIL PROTECTED] wrote: On Nov 25, 2008, at 2:26 PM, John Hunter wrote: Yes, I've said on a number of occasions I'd like to see these functions in numpy, since a number of them make more sense as numpy methods than as stand alone functions. Great. Could we think about getting that on for 1.3x, would you have time ? Or should we wait till early jan. ? I wasn't volunteering to do it, just that I support the migration if someone else wants to do it. I'm fully committed with mpl already... JDH ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
OK then, I'll take care of that over the next few weeks... On Nov 25, 2008, at 4:56 PM, John Hunter wrote: On Tue, Nov 25, 2008 at 2:01 PM, Pierre GM [EMAIL PROTECTED] wrote: On Nov 25, 2008, at 2:26 PM, John Hunter wrote: Yes, I've said on a number of occasions I'd like to see these functions in numpy, since a number of them make more sense as numpy methods than as stand alone functions. Great. Could we think about getting that on for 1.3x, would you have time ? Or should we wait till early jan. ? I wasn't volunteering to do it, just that I support the migration if someone else wants to do it. I'm fully committed with mpl already... JDH ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
John Hunter wrote: On Tue, Nov 25, 2008 at 12:16 PM, Pierre GM [EMAIL PROTECTED] wrote: A la mlab.csv2rec ? It could work with a bit more tweaking, basically following John Hunter's et al. path. What happens when the column names are unknown (read from the header) or wrong ? Actually, I'd like John to comment on that, hence the CC. More generally, wouldn't be useful to push the recarray manipulating functions from matplotlib.mlab to numpy ? Yes, I've said on a number of occasions I'd like to see these functions in numpy, since a number of them make more sense as numpy methods than as stand alone functions. John and I are in agreement here. The issue has remained somebody stepping up and doing the conversions (and fielding the questions and the resulting discussion) for the various routines that probably ought to go into NumPy. This would be a great place to get involved if there is a lurker looking for a project. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
Pierre GM wrote: OK then, I'll take care of that over the next few weeks... Thanks Pierre. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
Oh don't mention... However, I'd be quite grateful if you could give an eye to the pb of mixing np.scalars and 0d subclasses of ndarray: looks like it's a C pb, quite out of my league... http://scipy.org/scipy/numpy/ticket/826 http://article.gmane.org/gmane.comp.python.numeric.general/26354/match=priority+rules http://article.gmane.org/gmane.comp.python.numeric.general/25670/match=priority+rules On Nov 25, 2008, at 5:24 PM, Travis E. Oliphant wrote: Pierre GM wrote: OK then, I'll take care of that over the next few weeks... Thanks Pierre. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
Pierre GM wrote: Sounds like a plan. Wouldn't mind getting more feedback from fellow users before we get too deep, however... Ok, I've attached, as a first cut, a diff against SVN HEAD that does (I think) what I'm looking for. It passes all of the old tests and passes my own quick test. A more rigorous test suite will follow, but I want this out the door before I need to leave for the day. What this changeset essentially does is just add support for automatic dtypes along with supplying/reading names for flexible dtypes. It leverages StringConverter heavily, using a few tweaks so that old behavior is kept. This is by no means a final version. Probably the biggest change from what I mentioned earlier is that instead of dtype='auto', I've used dtype=None to signal the detection code, since dtype=='auto' causes problems. I welcome any and all suggestions here, both on the code and on the original idea of adding these capabilities to loadtxt(). Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma Index: lib/io.py === --- lib/io.py (revision 6099) +++ lib/io.py (working copy) @@ -233,29 +233,138 @@ for name in todel: os.remove(name) -# Adapted from matplotlib +def _string_like(obj): +try: obj + '' +except (TypeError, ValueError): return False +return True -def _getconv(dtype): -typ = dtype.type -if issubclass(typ, np.bool_): -return lambda x: bool(int(x)) -if issubclass(typ, np.integer): -return lambda x: int(float(x)) -elif issubclass(typ, np.floating): -return float -elif issubclass(typ, np.complex): -return complex +def str2bool(value): + +Tries to transform a string supposed to represent a boolean to a boolean. + +Raises +-- +ValueError +If the string is not 'True' or 'False' (case independent) + +value = value.upper() +if value == 'TRUE': +return True +elif value == 'FALSE': +return False else: -return str +return int(bool(value)) +class StringConverter(object): + +Factory class for function transforming a string into another object (int, +float). -def _string_like(obj): -try: obj + '' -except (TypeError, ValueError): return 0 -return 1 +After initialization, an instance can be called to transform a string +into another object. If the string is recognized as representing a missing +value, a default value is returned. +Parameters +-- +dtype : dtype, optional +Input data type, used to define a basic function and a default value +for missing data. For example, when `dtype` is float, the :attr:`func` +attribute is set to ``float`` and the default value to `np.nan`. +missing_values : sequence, optional +Sequence of strings indicating a missing value. + +Attributes +-- +func : function +Function used for the conversion +default : var +Default value to return when the input corresponds to a missing value. +mapper : sequence of tuples +Sequence of tuples (function, default value) to evaluate in order. + + +from numpy.core import nan # To avoid circular import +mapper = [(str2bool, None), + (lambda x: int(float(x)), -1), + (float, nan), + (complex, nan+0j), + (str, '???')] + +def __init__(self, dtype=None, missing_values=None): +if dtype is None: +self.func = str2bool +self.default = None +self._status = 0 +else: +dtype = np.dtype(dtype).type +self.func,self.default,self._status = self._get_from_dtype(dtype) + +# Store the list of strings corresponding to missing values. +if missing_values is None: +self.missing_values = [] +else: +self.missing_values = set(list(missing_values) + ['']) + +def __call__(self, value): +if value in self.missing_values: +return self.default +return self.func(value) + +def upgrade(self, value): + +Tries to find the best converter for `value`, by testing different +converters in order. +The order in which the converters are tested is read from the +:attr:`_status` attribute of the instance. + +try: +self.__call__(value) +except ValueError: +_statusmax = len(self.mapper) +if self._status == _statusmax: +raise ValueError(Could not find a valid conversion function) +elif self._status _statusmax - 1: +self._status += 1 +(self.func, self.default) = self.mapper[self._status] +self.upgrade(value) + +def _get_from_dtype(self, dtype): + +Sets the :attr:`func`
Re: [Numpy-discussion] More loadtxt() changes
Ryan, Quick comments: * I already have some unittests for StringConverter, check the file I attach. * Your str2bool will probably mess things up in upgrade compared to the one JDH had written (the one I send you): you don't wanna use int(bool(value)), as it'll always give you 0 or 1 when you might need a ValueError * Your locked version of update won't probably work either, as you force the converter to output a string (you set the status to largest possible, that's the one that outputs strings). Why don't you set the status to the current one (make a tmp one if needed). * I'd probably get rid of StringConverter._get_from_dtype, as it is not needed outside the __init__. You may wanna stick to the original __init__. # pylint disable-msg=E1101, W0212, W0621 import os import tempfile import numpy as np import numpy.ma as ma from numpy.ma.testutils import * from StringIO import StringIO from _preview import * class TestStringConverter(TestCase): Test Stringconverter # def test_upgrade(self): Tests the upgrade method. converter = StringConverter() assert_equal(converter._status, 0) converter.upgrade('0') assert_equal(converter._status, 1) converter.upgrade('0.') assert_equal(converter._status, 2) converter.upgrade('0j') assert_equal(converter._status, 3) converter.upgrade('a') assert_equal(converter._status, len(converter.mapper)-1) # def test_missing(self): Tests the use of missing values. converter = StringConverter(missing_values=('missing','missed')) converter.upgrade('0') assert_equal(converter('0'), 0) assert_equal(converter(''), converter.default) assert_equal(converter('missing'), converter.default) assert_equal(converter('missed'), converter.default) try: converter('miss') except ValueError: pass class TestLineReader(TestCase): Tests the LineReader class # def test_spacedelimiter(self): Tests the use of space as delimiter. data = StringIO(0 1\n2 3\n4 5 6) reader = LineReader(data) nbfields = [len(line) for line in reader] assert_equal(nbfields, [2, 2, 3]) # def test_get_first_row(self): Tests the access of the first row. data = StringIO(0 1\n2 3\n4 5 6) reader = LineReader(data) assert_equal(reader.get_first_valid_row(), ['0', '1']) class TestLoadTxt(TestCase): Test the `loadtxt` function. # def setUp(self): Pre-processing and initialization. data = 0 1\n2 3 (self.fhdw, self.fhnw) = tempfile.mkstemp() (self.fhdwo, self.fhnwo) = tempfile.mkstemp() os.write(self.fhdw, A B\n) os.write(self.fhdwo, data) os.write(self.fhdw, data) os.close(self.fhdw) os.close(self.fhdwo) # def tearDown(self): Post-processing. os.remove(self.fhnw) os.remove(self.fhnwo) # def test_noheader(self): Tests loadtxt in absence of an header. data = self.fhnwo # No dtype test = loadtxt(data) assert_equal(test.shape, (1,)) assert_equal(test.item(), (2, 3)) assert_equal(test.dtype.names, ['0', '1']) # w/ basic dtype test = loadtxt(data, dtype=np.float) control = ma.array([[0, 1], [2, 3]], mask=False) assert_equal(test, control) # w/ flexible dtype dtype = [('A', np.int), ('B', np.float)] test = loadtxt(data, dtype=dtype) control = ma.array([(0, 1), (2, 3)], mask=(False, False), dtype=dtype) assert_equal(test, control) # w/ descriptor descriptor = {'names':('A', 'B'), 'formats':(np.int, np.float)} test = loadtxt(data, dtype=descriptor) control = ma.array([(0, 1), (2, 3)], mask=(False, False), dtype=dtype) assert_equal(test, control) # w/ names test = loadtxt(data, names=a,b) dtype = [('a', np.int), ('b', np.int)] assert_equal(test, np.array([(0, 1), (2, 3)], dtype=dtype)) assert_equal(test['a'].dtype, np.dtype(np.int)) # def test_with_noheader_with_missing(self): Tests `loadtxt` on a file w/o header, but w/ missig values. data = StringIO(0 1\n2 ) test = loadtxt(data, dtype=float) assert_equal(test, [[0, 1], [2, 3]]) assert_equal(test.mask, [[0, 0], [0, 1]]) # def test_with_header(self): Tests `loadtxt` on a file w/ header. data = self.fhnw control = ma.array([(0, 1), (2, 3)], dtype=[('a', np.int), ('b', np.int)]) # No dtype test = loadtxt(data) assert_equal(test.dtype.names, ['a', 'b']) assert_equal(test, control) # W dtype: should fail, as there's already a header dtype = [('A', np.float), ('B', np.int)] try:
Re: [Numpy-discussion] More loadtxt() changes
On Tue, Nov 25, 2008 at 5:00 PM, Pierre GM [EMAIL PROTECTED] wrote: snip All, another question: What's the best way to have some kind of sandbox for code like the one Ryan is writing ? So that we can try it, modify it, without commiting anything to SVN yet ? Probably make a branch and do commits there. If you don't want to hassle with a merge, just copy the file over to the trunk when you are done and commit it from there, then remove the branch. Instructions on making branches are at http://projects.scipy.org/scipy/numpy/wiki/MakingBranches . snip Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Problems building numpy on solaris 10 x86
Back in the beginning of the summer, I jumped through a lot of hoops to build numpy+scipy on solaris, 64-bit with gcc. I received a lot of help from David C., and ended up, by some very ugly hacking, building an acceptable numpy+scipy+matplotlib trio for use at my company. However, I'm back at it again trying to build the same tools in both a 32-bit abi and a 64-bit ABI. I'm starting with the 32-bit build, because I suspect it'd be simpler (less trouble adding things like -m64 and other such flags). However, I've run into a very basic problem right at the get-go. This time instead of bothering David at the beginning of my build, I was hoping that other people may have experience to contribute to resolving my issues. Here is my build environment: 1) gcc-4.3.1 2) Solaris 10 update 3 3) sunperf libraries (for blas+lapack support) I can provide more detail since that's not a very specific list. Anyway, when I try building numpy-1.2.1 after setting up my site.cfg and build-related environment this is what I get: Setting the site.cfg Running from numpy source directory. F2PY Version 2_5972 non-existing path in 'numpy/core': 'code_generators/array_api_order.txt' [continues...] scons: Reading SConscript files ... scons: warning: Ignoring missing SConscript 'build/scons/numpy/core/SConscript' File /usr/local/python-2.5.1/lib/python2.5/site-packages/numscons-0.9.4-py2.5.egg/numscons/core/numpyenv.py, line 108, in DistutilsSConscript scons: done reading SConscript files. scons: Building targets ... scons: *** [Errno 2] No such file or directory: 'numpy/core/../../build/scons/numpy/core/sconsign.dblite' scons: building terminated because of errors. error: Error while executing scons command. See above for more information. If you think it is a problem in numscons, you can also try executing the scons command with --log-level option for more detailed output of what numscons is doing, for example --log-level=0; the lowest the level is, the more detailed the output it. [etc.] then similar errors repeat themselves over and over including ignoreing missing SConscript, and no sconsign.dblite file, until the build bombs out. I've got numscons installed from pypi: import numscons.version numscons.version.VERSION '0.9.4' Can anyone get me on the right track here? Thanks, -Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Problems building numpy on solaris 10 x86
On Tue, Nov 25, 2008 at 4:54 PM, Peter Norton [EMAIL PROTECTED] wrote: Back in the beginning of the summer, I jumped through a lot of hoops to build numpy+scipy on solaris, 64-bit with gcc. I received a lot of help from David C., and ended up, by some very ugly hacking, building an acceptable numpy+scipy+matplotlib trio for use at my company. However, I'm back at it again trying to build the same tools in both a 32-bit abi and a 64-bit ABI. I'm starting with the 32-bit build, because I suspect it'd be simpler (less trouble adding things like -m64 and other such flags). However, I've run into a very basic problem right at the get-go. This time instead of bothering David at the beginning of my build, I was hoping that other people may have experience to contribute to resolving my issues. Here is my build environment: 1) gcc-4.3.1 2) Solaris 10 update 3 3) sunperf libraries (for blas+lapack support) I can provide more detail since that's not a very specific list. Anyway, when I try building numpy-1.2.1 after setting up my site.cfg and build-related environment this is what I get: Setting the site.cfg Running from numpy source directory. F2PY Version 2_5972 non-existing path in 'numpy/core': 'code_generators/array_api_order.txt' [continues...] scons: Reading SConscript files ... scons: warning: Ignoring missing SConscript 'build/scons/numpy/core/SConscript' File /usr/local/python-2.5.1/lib/python2.5/site-packages/numscons-0.9.4-py2.5.egg/numscons/core/numpyenv.py, line 108, in DistutilsSConscript scons: done reading SConscript files. scons: Building targets ... scons: *** [Errno 2] No such file or directory: 'numpy/core/../../build/scons/numpy/core/sconsign.dblite' scons: building terminated because of errors. error: Error while executing scons command. See above for more information. If you think it is a problem in numscons, you can also try executing the scons command with --log-level option for more detailed output of what numscons is doing, for example --log-level=0; the lowest the level is, the more detailed the output it. [etc.] then similar errors repeat themselves over and over including ignoreing missing SConscript, and no sconsign.dblite file, until the build bombs out. I've got numscons installed from pypi: import numscons.version numscons.version.VERSION '0.9.4' Can anyone get me on the right track here? What happens if you go the usual python setup.py {build,install} route? Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
Pierre GM wrote: Ryan, Quick comments: * I already have some unittests for StringConverter, check the file I attach. Ok, great. * Your str2bool will probably mess things up in upgrade compared to the one JDH had written (the one I send you): you don't wanna use int(bool(value)), as it'll always give you 0 or 1 when you might need a ValueError Ok, I wasn't sure. I was trying to merge what the old code used with the new str2bool you supplied. That's probably not all that necessary. * Your locked version of update won't probably work either, as you force the converter to output a string (you set the status to largest possible, that's the one that outputs strings). Why don't you set the status to the current one (make a tmp one if needed). Looking at the code, it looks like mapper is only used in the upgrade() method. My goal by setting status to the largest possible is to lock the converter to the supplied function. That way for the user supplied converters, the StringConverter doesn't try to upgrade away from it. My thinking was that if the user supplied converter function fails, the user should know. (Though I got this wrong the first time.) * I'd probably get rid of StringConverter._get_from_dtype, as it is not needed outside the __init__. You may wanna stick to the original __init__. Done. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
On Nov 25, 2008, at 10:02 PM, Ryan May wrote: Pierre GM wrote: * Your locked version of update won't probably work either, as you force the converter to output a string (you set the status to largest possible, that's the one that outputs strings). Why don't you set the status to the current one (make a tmp one if needed). Looking at the code, it looks like mapper is only used in the upgrade() method. My goal by setting status to the largest possible is to lock the converter to the supplied function. That way for the user supplied converters, the StringConverter doesn't try to upgrade away from it. My thinking was that if the user supplied converter function fails, the user should know. (Though I got this wrong the first time.) Then, define a _locked attribute in StringConverter, and prevent upgrade to run if self._locked is True. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
Pierre GM wrote: On Nov 25, 2008, at 10:02 PM, Ryan May wrote: Pierre GM wrote: * Your locked version of update won't probably work either, as you force the converter to output a string (you set the status to largest possible, that's the one that outputs strings). Why don't you set the status to the current one (make a tmp one if needed). Looking at the code, it looks like mapper is only used in the upgrade() method. My goal by setting status to the largest possible is to lock the converter to the supplied function. That way for the user supplied converters, the StringConverter doesn't try to upgrade away from it. My thinking was that if the user supplied converter function fails, the user should know. (Though I got this wrong the first time.) Then, define a _locked attribute in StringConverter, and prevent upgrade to run if self._locked is True. Sure if you're into logic and sound design. I was going more for hackish and obtuse. (No seriously, I don't know why I didn't think of that.) Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Minimum dtype
Hi, I'm running on a 64-bit machine, and see the following: numpy.array(64.6).dtype dtype('float64') numpy.array(64).dtype dtype('int64') Is there any function/setting to make these default to 32-bit types except where necessary? I don't mean by specifying dtype=numpy.float32 or dtype=numpy.int32. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Minimum dtype
On Tue, Nov 25, 2008 at 21:57, Ryan May [EMAIL PROTECTED] wrote: Hi, I'm running on a 64-bit machine, and see the following: numpy.array(64.6).dtype dtype('float64') numpy.array(64).dtype dtype('int64') Is there any function/setting to make these default to 32-bit types except where necessary? Nope. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Problems building numpy on solaris 10 x86
Charles R Harris wrote: What happens if you go the usual python setup.py {build,install} route? Won't go far since it does not handle sunperf. David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] More loadtxt() changes
Pierre GM wrote: On Nov 25, 2008, at 10:02 PM, Ryan May wrote: Pierre GM wrote: * Your locked version of update won't probably work either, as you force the converter to output a string (you set the status to largest possible, that's the one that outputs strings). Why don't you set the status to the current one (make a tmp one if needed). Looking at the code, it looks like mapper is only used in the upgrade() method. My goal by setting status to the largest possible is to lock the converter to the supplied function. That way for the user supplied converters, the StringConverter doesn't try to upgrade away from it. My thinking was that if the user supplied converter function fails, the user should know. (Though I got this wrong the first time.) Updated patch attached. This includes: * Updated docstring * New tests * Fixes for previous issues * Fixes to make new tests actually work I appreciate any and all feedback. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma Index: numpy/lib/io.py === --- numpy/lib/io.py (revision 6107) +++ numpy/lib/io.py (working copy) @@ -233,29 +233,136 @@ for name in todel: os.remove(name) -# Adapted from matplotlib +def _string_like(obj): +try: obj + '' +except (TypeError, ValueError): return False +return True -def _getconv(dtype): -typ = dtype.type -if issubclass(typ, np.bool_): -return lambda x: bool(int(x)) -if issubclass(typ, np.integer): -return lambda x: int(float(x)) -elif issubclass(typ, np.floating): -return float -elif issubclass(typ, np.complex): -return complex +def str2bool(value): + +Tries to transform a string supposed to represent a boolean to a boolean. + +Raises +-- +ValueError +If the string is not 'True' or 'False' (case independent) + +value = value.upper() +if value == 'TRUE': +return True +elif value == 'FALSE': +return False else: -return str +raise ValueError(Invalid boolean) +class StringConverter(object): + +Factory class for function transforming a string into another object (int, +float). -def _string_like(obj): -try: obj + '' -except (TypeError, ValueError): return 0 -return 1 +After initialization, an instance can be called to transform a string +into another object. If the string is recognized as representing a missing +value, a default value is returned. +Parameters +-- +dtype : dtype, optional +Input data type, used to define a basic function and a default value +for missing data. For example, when `dtype` is float, the :attr:`func` +attribute is set to ``float`` and the default value to `np.nan`. +missing_values : sequence, optional +Sequence of strings indicating a missing value. + +Attributes +-- +func : function +Function used for the conversion +default : var +Default value to return when the input corresponds to a missing value. +mapper : sequence of tuples +Sequence of tuples (function, default value) to evaluate in order. + + +from numpy.core import nan # To avoid circular import +mapper = [(str2bool, None), + (int, -1), #Needs to be int so that it can fail and promote + #to float + (float, nan), + (complex, nan+0j), + (str, '???')] + +def __init__(self, dtype=None, missing_values=None): +self._locked = False +if dtype is None: +self.func = str2bool +self.default = None +self._status = 0 +else: +dtype = np.dtype(dtype).type +if issubclass(dtype, np.bool_): +(self.func, self.default, self._status) = (str2bool, 0, 0) +elif issubclass(dtype, np.integer): +#Needs to be int(float(x)) so that floating point values will +#be coerced to int when specifid by dtype +(self.func, self.default, self._status) = (lambda x: int(float(x)), -1, 1) +elif issubclass(dtype, np.floating): +(self.func, self.default, self._status) = (float, np.nan, 2) +elif issubclass(dtype, np.complex): +(self.func, self.default, self._status) = (complex, np.nan + 0j, 3) +else: +(self.func, self.default, self._status) = (str, '???', -1) + +# Store the list of strings corresponding to missing values. +if missing_values is None: +self.missing_values = [] +else: +self.missing_values = set(list(missing_values) + ['']) + +def __call__(self, value): +if value in self.missing_values: +return self.default +return
[Numpy-discussion] ANN: SciPy 0.7.0b1 (beta release)
I'm pleased to announce the first beta release of SciPy 0.7.0. SciPy is a package of tools for science and engineering for Python. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. This beta release comes almost one year after the 0.6.0 release and contains many new features, numerous bug-fixes, improved test coverage, and better documentation. Please note that SciPy 0.7.0b1 requires Python 2.4 or greater and NumPy 1.2.0 or greater. For information, please see the release notes: http://sourceforge.net/project/shownotes.php?group_id=27747release_id=642769 You can download the release from here: http://sourceforge.net/project/showfiles.php?group_id=27747package_id=19531release_id=642769 Thank you to everybody who contributed to this release. Enjoy, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] 2D phase unwrapping
Is there a 2D phase unwrapping for python? I read a presentation by GERI (http://www.ljmu.ac.uk/GERI) that their code is implemented in scipy, but I could not find it. Nadav. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Problems building numpy on solaris 10 x86
On Wed, Nov 26, 2008 at 8:54 AM, Peter Norton [EMAIL PROTECTED] wrote: scons: warning: Ignoring missing SConscript 'build/scons/numpy/core/SConscript' File /usr/local/python-2.5.1/lib/python2.5/site-packages/numscons-0.9.4-py2.5.egg/numscons/core/numpyenv.py, line 108, in DistutilsSConscript scons: done reading SConscript files. scons: Building targets ... scons: *** [Errno 2] No such file or directory: It could be considered a bug because the error message is bad: the problem really is the missing scons script (it is not so easy to handle because scons is in a different process than distutils, so it is difficult to get useful information back from the scons process). Which version of numpy are you using ? David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion