Re: [Numpy-discussion] Proposal for changing the names of inverse trigonometrical/hyperbolic functions

2008-11-25 Thread Francesc Alted
A Monday 24 November 2008, Jarrod Millman escrigué:
 On Mon, Nov 24, 2008 at 10:45 AM, Francesc Alted [EMAIL PROTECTED] 
wrote:
  So, IMHO, I think it would be better to rename the inverse
  trigonometric functions from ``arc*`` to ``a*`` prefix.  Of course,
  in order to do that correctly, one should add the new names and add
  a
  ``DeprecationWarning`` informing that people should start to use
  the new names.  After two or three NumPy versions, the old function
  names can be removed safely.
 
  What people think?

 +1
 It seems there is a fair amount of favor for adding the new names.
 There is some resistance to removing the old ones.  I would be happy
 to deprecate the old ones, but leave them in until we release a new
 major release (i.e., NumPy 2.0.0).  We could start creating a list of
 API/ABI clean-ups for whenever we find a compelling reason to release
 a new major version.  In the meantime, we can leave the old names in
 and just add a deprecation note to the docs.  Once we are ready to
 release 2.0, we can release a 1.x with deprecation warnings.

Sounds like a plan. +1 on this.  If there are worries about portability 
issues, I'd even let the old names in 2.0 (with the deprecation 
warning, of course), although if the 1.x series are going to live long 
time (say, at least, a year), I don't think this is going to be 
necessary.

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy on Mac OS X python 2.6

2008-11-25 Thread Pierre GM
FYI,
I can't reproduce David's failures on my machine (intel core2 duo w/  
10.5.5)
* python 2.6 from macports
* numpy svn 6098
* GCC 4.0.1 (Apple Inc. build 5488)

I have only 1 failure:
FAIL: test_umath.TestComplexFunctions.test_against_cmath
--
Traceback (most recent call last):
   File /opt/local/lib/python2.6/site-packages/nose-0.10.4-py2.6.egg/ 
nose/case.py, line 182, in runTest
 self.test(*self.arg)
   File /Users/pierregm/Computing/.pythonenvs/default26/lib/python2.6/ 
site-packages/numpy/core/tests/test_umath.py, line 423, in  
test_against_cmath
 assert abs(a - b)  atol, %s %s: %s; cmath: %s%(fname,p,a,b)
AssertionError: arcsin 2: (1.57079632679-1.31695789692j); cmath:  
(1.57079632679+1.31695789692j)

--

(Well, there's another one in numpy.ma.min, but that's a different  
matter).



On Nov 25, 2008, at 2:19 AM, David Cournapeau wrote:

 On Mon, 2008-11-24 at 22:06 -0700, Charles R Harris wrote:


 Well, it may not be that easy to figure.  The (generated)
 pyconfig-32.h has

 /* Define to 1 if your processor stores words with the most
 significant byte
first (like Motorola and SPARC, unlike Intel and VAX).

The block below does compile-time checking for endianness on
 platforms
that use GCC and therefore allows compiling fat binaries on OSX by
 using
'-arch ppc -arch i386' as the compile flags. The phrasing was
 choosen
such that the configure-result is used on systems that don't use
 GCC.
  */
 #ifdef __BIG_ENDIAN__
 #define WORDS_BIGENDIAN 1
 #else
 #ifndef __LITTLE_ENDIAN__
 /* #undef WORDS_BIGENDIAN */
 #endif
 #endif


 Hm, interesting: just by grepping, I do have WORDS_BIGENDIAN defined  
 to
 1 on *both* python 2.5 and python 2.6 on Mac OS X (running Intel).
 Looking closer, I do have the above code (conditional) in 2.5, but not
 in 2.6: it is inconditionally defined to BIGENDIAN on 2.6 !! That's
 actually part of something I have wondered for quite some time about  
 fat
 binaries: how do you handle config headers, since they are generated
 only once for every fat binary, but they should really be generated  
 for
 each arch.

 And I guess that __BIG_ENDIAN__  is a compiler flag, it isn't in any
 of the include files. In any case, this looks like a Python bug or  
 the
 Python folks have switched their API on us.

 Hm, actually, it is a bug in numpy as much as in python: python should
 NOT include any config.h in their public namespace, and we should not
 rely on it.

 But with this info, it should be relatively easy to fix (by setting  
 the
 correct endianness by ourselves with some detection code)

 David


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy on Mac OS X python 2.6

2008-11-25 Thread David Cournapeau
Pierre GM wrote:
 FYI,
 I can't reproduce David's failures on my machine (intel core2 duo w/  
 10.5.5)
 * python 2.6 from macports
   

I think that's the main difference. I feel more and more that the
problem is linked to fat binaries (more exactly multi arch build in one
autoconf run: since only one pyconfig.h is generated for all archs, only
one value is defined for CPU specific configurations). On my machine,
pyconfig.h has WORDS_BIGENDIAN defined to one, which I can only explain
by the binary being built on ppc (unfortunately, I can't find this
information from python itself - maybe in the release notes). And that
cannot work on Intel.

The general solution would be to generate different arch specific config
files, and import them conditionally in the main config file.  But doing
so in a platform-neutral manner is not trivial,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PIL.Image.fromarray bug in numpy interface

2008-11-25 Thread Stéfan van der Walt
2008/11/24 Chris Barker [EMAIL PROTECTED]:
 Robert Kern wrote:
 Jim Vickroy wrote:
 While using the PIL interface to numpy, I rediscovered a logic error
 in the PIL.Image.fromarray() procedure.  The problem (and a solution)
 was mentioned earlier at:

 Tell them that we approve of the change. We don't have commit access
 to PIL, so I believe that our approval is the only reason they could
 possibly send you over here.

 Just for the record, it was me that sent him over here. I thought it
 would be good for a numpy dev to check out the patch for correctnesses
 -- it looked like a numpy API issue, and I figured Fredrik wouldn't want
 to look too hard at it to determine if it was correct.

 so if you approval means you've looked at the fix and think it's
 correct, great!

I also submitted an issue in 2007:

http://mail.python.org/pipermail/image-sig/2007-August/004570.html

I recently reminded Frederik, who replied:

The NumPy support was somewhat broken and has been partially rewritten
for PIL 1.2; I'll compare those fixes with your patch when I find the
time.

So, I guess we should try the latest PIL and see if the problems are
still there?

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] working on multiple matrices of the same shape

2008-11-25 Thread Stéfan van der Walt
2008/11/24 Sébastien Barthélemy [EMAIL PROTECTED]:
 Are you sure ? Here it reports
 ValueError: setting an array element with a sequence.
 probably because theta, sintheta and costheta are 1-d arrays of n1 elements.

Sorry, I missed that detail.

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] CorePy 1.0 Release (x86, Cell BE, BSD!)

2008-11-25 Thread Matthieu Brucher
Exactly what I thought this morning ;)
I'm reading your PhD thesis, Chris, it's great !

Matthieu

2008/11/25 Brian Granger [EMAIL PROTECTED]:
 Chris,

 Wow, this is fantastic...both the BSD license and the x86 support.  I
 look forward to playing with this!

 Cheers,

 Brian

 On Mon, Nov 24, 2008 at 7:49 PM, Chris Mueller [EMAIL PROTECTED] wrote:
 Announcing CorePy 1.0 - http://www.corepy.org

 We are pleased to announce the latest release of CorePy. CorePy is a
 complete system for developing machine-level programs in Python.
 CorePy lets developers build and execute assembly-level programs
 interactively from the Python command prompt, embed them directly in
 Python applications, or export them to standard assembly languages.

 CorePy's straightforward APIs enable the creation of complex,
 high-performance applications that take advantage of processor
 features usually inaccessible from high-level scripting languages,
 such as multi-core execution and vector instruction sets (SSE, VMX,
 SPU).

 This version addresses the two most frequently asked questions about
 CorePy:

 1) Does CorePy support x86 processors?
Yes! CorePy now has extensive support for 32/64-bit x86 and SSE
ISAs on Linux and OS X*.

 2) Is CorePy Open Source?
Yes!  CorePy now uses the standard BSD license.

 Of course, CorePy still supports PowerPC and Cell BE SPU processors.
 In fact, for this release, the Cell run-time was redesigned from the
 ground up to remove the dependency on IBM's libspe and now uses the
 system-level interfaces to work directly with the SPUs (and, CorePy is
 still the most fun way to program the PS3).

 CorePy is written almost entirely in Python.  Its run-time system
 does not rely on any external compilers or assemblers.

 If you have the need to write tight, fast code from Python, want
 to demystify machine-level code generation, or just miss the good-old
 days of assembly hacking, check out CorePy!

 And, if you don't believe us, here's our favorite user quote:

 CorePy makes assembly fun again!


 __credits__ = 
   CorePy is developed by Chris Mueller, Andrew Friedley, and Ben
   Martin and is supported by the Open Systems Lab at Indiana
   University.

   Chris can be reached at cmueller[underscore]dev[at]yahoo[dot]com.
 

 __footnote__ = 
   *Any volunteers for a Windows port? :)
 




 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion




-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Hi,

I have a couple more changes to loadtxt() that I'd like to code up in time
for 1.3, but I thought I should run them by the list before doing too much
work.  These are already implemented in some fashion in
matplotlib.mlab.csv2rec(), but the code bases are different enough, that
pretty much only the idea can be lifted.  All of these changes would be done
in a manner that is backwards compatible with the current API.

1) Support for setting the names of fields in the returned structured array
without using dtype.  This can be a passed in list of names or reading the
names of fields from the first line of the file.  Many files have a header
line that gives a name for each column.  Adding this would obviously make
loadtxt much more general and allow for more generic code, IMO. My current
thinking is to add a *name* keyword parameter that defaults to None, for no
support for reading names.  Setting it to True would tell loadtxt() to read
the names from the first line (after skiprows).  The other option would be
to set names to a list of strings.

2) Support for automatic dtype inference.  Instead of assuming all values
are floats, this would try a list of options until one worked.  For strings,
this would keep track of the longest string within a given field before
setting the dtype.  This would allow reading of files containing a mixture
of types much more easily, without having to go to the trouble of
constructing a full dtype by hand.  This would work alongside any custom
converters one passes in.  My current thinking of API would just be to add
the option of passing the string 'auto' as the dtype parameter.

3) Better support for missing values.  The docstring mentions a way of
handling missing values by passing in a converter.  The problem with this is
that you have to pass in a converter for *every column* that will contain
missing values.  If you have a text file with 50 columns, writing this
dictionary of converters seems like ugly and needless boilerplate.  I'm
unsure of how best to pass in both what values indicate missing values and
what values to fill in their place.  I'd love suggestions

Here's an example of my use case (without 50 columns):

ID,First Name,Last Name,Homework1,Homework2,Quiz1,Homework3,Final
1234,Joe,Smith,85,90,,76,
5678,Jane,Doe,65,99,,78,
9123,Joe,Plumber,45,90,,92,

Currently reading in this code requires a bit of boilerplace (declaring
dtypes, converters).  While it's nothing I can't write, it still would be
easier to write it once within loadtxt and have it for everyone.

Any support for *any* of these ideas?  Any suggestions on how the user
should pass in the information?

Thanks,

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy on Mac OS X python 2.6

2008-11-25 Thread David Cournapeau
On Tue, Nov 25, 2008 at 10:55 PM, David Cournapeau
[EMAIL PROTECTED] wrote:


 I used the path of least resistance: instead of using the
 WORDS_BIGENDIAN macro, I added a numpy header which gives the endianness
 every time it is included. IOW, instead of the endianness to be fixed at
 numpy build time (which would fail for universal builds), it is set
 everytime the numpy headers are included (which is the only way to make
 it work). A better solution IMO would be to avoid any endianness
 dependency at all in the headers, but that does not seem possible
 without breaking the API (because the endianness-related macro
 PyArray_NBO and co would need to be set as functions instead).

Hm, for reference, I came across this:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg14382.html

So some people thought about the same problem.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy on Mac OS X python 2.6

2008-11-25 Thread Charles R Harris
On Tue, Nov 25, 2008 at 8:03 AM, David Cournapeau [EMAIL PROTECTED]wrote:

 On Tue, Nov 25, 2008 at 10:55 PM, David Cournapeau
 [EMAIL PROTECTED] wrote:

 
  I used the path of least resistance: instead of using the
  WORDS_BIGENDIAN macro, I added a numpy header which gives the endianness
  every time it is included. IOW, instead of the endianness to be fixed at
  numpy build time (which would fail for universal builds), it is set
  everytime the numpy headers are included (which is the only way to make
  it work). A better solution IMO would be to avoid any endianness
  dependency at all in the headers, but that does not seem possible
  without breaking the API (because the endianness-related macro
  PyArray_NBO and co would need to be set as functions instead).

 Hm, for reference, I came across this:

 http://www.mail-archive.com/[EMAIL PROTECTED]/msg14382.html

 So some people thought about the same problem.


Apart from the Mac, the ppc can be configured to run either bigendian or
littleendian, so the hardware encompasses more than just the cpu, it's the
whole darn board.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bilateral filter

2008-11-25 Thread Stéfan van der Walt
Hi Nadav

2008/8/6 Nadav Horesh [EMAIL PROTECTED]:
 I made the following modification to the source code, I hope it is ready to
 be included in scipy.

 Added a BSD licence declaration.
 Small optimisation.
 The code is split into a cython back-end  and a python front-end.

 All remarks are welcome,

Thanks for working on a bilateral filter implementation.  Some comments:

1. Needs a setup.py file to build the Cython module (simplest possible
is attached)
2. numpy.numarray.nd_image should be scipy.ndimage
3. For inclusion in SciPy, we'll need some tests and preferably some examples.
4. Docstrings should be in SciPy format.
5. ndarray.h should be numpy/ndarray.h

Thanks for writing this filter; I found it useful!

Cheers
Stéfan
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy

setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension(bilateral_base, [bilateral_base.pyx],
 include_dirs = [numpy.get_include()],
 extra_compile_args=['-O3'])
   ]
)
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for changing the names of inverse trigonometrical/hyperbolic functions

2008-11-25 Thread Perry Greenfield

On Nov 24, 2008, at 5:55 PM, Jarrod Millman wrote:

 On Mon, Nov 24, 2008 at 10:45 AM, Francesc Alted  
 [EMAIL PROTECTED] wrote:
 So, IMHO, I think it would be better to rename the inverse  
 trigonometric
 functions from ``arc*`` to ``a*`` prefix.  Of course, in order to do
 that correctly, one should add the new names and add a
 ``DeprecationWarning`` informing that people should start to use the
 new names.  After two or three NumPy versions, the old function names
 can be removed safely.

 What people think?

 +1
 It seems there is a fair amount of favor for adding the new names.
 There is some resistance to removing the old ones.  I would be happy
 to deprecate the old ones, but leave them in until we release a new
 major release (i.e., NumPy 2.0.0).  We could start creating a list of
 API/ABI clean-ups for whenever we find a compelling reason to release
 a new major version.  In the meantime, we can leave the old names in
 and just add a deprecation note to the docs.  Once we are ready to
 release 2.0, we can release a 1.x with deprecation warnings.

I tend to favor this approach.

Perry

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for changing the names of inverse trigonometrical/hyperbolic functions

2008-11-25 Thread Joris De Ridder

On 24 Nov 2008, at 19:45 , Francesc Alted wrote:

 standards in computer science.  For example, where Python writes:

 asin, acos, atan, asinh, acosh, atanh

 NumPy choose:

 arcsin, arccos, arctan, arcsinh, arccosh, arctanh

 So, IMHO, I think it would be better to rename the inverse  
 trigonometric
 functions from ``arc*`` to ``a*`` prefix.

-1

The current slightly deviating (and in fact more clear) naming  
convention of Numpy is IMO not even remotely enough reason to break  
the API.
Adding honey by introducing a transition period with a deprecation  
warning postpones but doesn't avoid breaking the API.

Joris


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM

Ryan,
FYI,  I've been coding over the last couple of weeks an extension of  
loadtxt for a better support of masked data, with the option to read  
column names in a header. Please find an example below (I also have  
unittest). Most of the work is actually inspired from matplotlib's  
mlab.csv2rec. It might be worth not duplicating efforts.

Cheers,
P.




:mod:`_preview`

A collection of utilities from incoming versions of numpy.ma





import itertools
import numpy as np
import numpy.ma as ma



_string_like = np.lib.io._string_like

def _to_filehandle(fname, flag='r', return_opened=False):

Returns the filehandle corresponding to a string or a file.
If the string ends in '.gz', the file is automatically unzipped.

Parameters
--
fname : string, filehandle
Name of the file whose filehandle must be returned.
flag : string, optional
Flag indicating the status of the file ('r' for read, 'w' for write).
return_opened : boolean, optional
Whether to return the opening status of the file.

if _string_like(fname):
if fname.endswith('.gz'):
import gzip
fhd = gzip.open(fname, flag)
else:
fhd = file(fname, flag)
opened = True
elif hasattr(fname, 'seek'):
fhd = fname
opened = False
else:
raise ValueError('fname must be a string or file handle')
if return_opened:
return fhd, opened
return fhd


def flatten_dtype(dtp):

Unpack a structured data-type.


if dtp.names is None:
return [dtp]
else:
types = []
for field in dtp.names:
(typ, _) = dtp.fields[field]
flat_dt = flatten_dtype(typ)
types.extend(flat_dt)
return types



class LineReader:

File reader that automatically split each line. This reader behaves like
an iterator.

Parameters
--
fhd : filehandle
File handle of the underlying file.
comment : string, optional
The character used to indicate the start of a comment.
delimiter : string, optional
The string used to separate values.  By default, this is any
whitespace.


#
def __init__(self, fhd, comment='#', delimiter=None):
self.fh = fhd
self.comment = comment
self.delimiter = delimiter
if delimiter == ' ':
self.delimiter = None
#
def close(self):
Close the current reader.
self.fh.close()
#
def seek(self, arg):

Moves to a new position in the file.

See Also

file.seek


self.fh.seek(arg)
#
def splitter(self, line):

Splits the line at each current delimiter.
Comments are stripped beforehand.

line = line.split(self.comment)[0].strip()
delimiter = self.delimiter
if line:
return line.split(delimiter)
else:
return []
#
def next(self):

Moves to the next line or raises :exc:StopIteration.

return self.splitter(self.fh.next())
#
def __iter__(self):
for line in self.fh:
yield self.splitter(line)

def readline(self):

Returns the next line of the file, splitted at the delimiter and stripped
of comments.

return self.splitter(self.fh.readline())

def skiprows(self, nbrows=1):

Skips `nbrows` from the file.

for i in range(nbrows):
self.fh.readline()

def get_first_valid_row(self):

Returns the values in the first valid (uncommented and not empty) line
of the file.

first_values = None
while not first_values:
first_line = self.fh.readline()
if first_line == '': # EOF reached
raise IOError('End-of-file reached before encountering data.')
first_values = self.splitter(first_line)
return first_values



itemdictionary = {'return': 'return_',
  'file':'file_',
  'print':'print_'
  }


def process_header(headers):

Validates a list of strings to use as field names.
The strings are stripped of any non alphanumeric character, and spaces
are replaced by `_`

#
# Define the characters to delete from the headers
delete = set([EMAIL PROTECTED]*()-=+~\|]}[{';: /?.,)
delete.add('')

names = []
seen = dict()
for i, item in enumerate(headers):
item = item.strip().lower().replace(' ', '_')
item = ''.join([c for c in item if c not in delete])
if not len(item):
item = 'column%d' % i

item = itemdictionary.get(item, item)
cnt = seen.get(item, 0)
if cnt  0:
names.append(item + '_%d'%cnt)
else:
names.append(item)
seen[item] = cnt+1
return 

[Numpy-discussion] in(np.nan) on python 2.6

2008-11-25 Thread Pierre GM
All,
Sorry to bump my own post, and I was kinda threadjacking anyway:

Some functions of numy.ma (eg, ma.max, ma.min...) accept explicit  
outputs that may not be MaskedArrays.
When such an explicit output is not a MaskedArray, a value that should  
have been masked is transformed into np.nan.

That worked great in 2.5, with np.nan automatically transformed to 0  
when the explicit output had a int dtype. With Python 2.6, a  
ValueError is raised instead, as np.nan can no longer be casted to int.

What should be the recommended behavior in this case ? Raise a  
ValueError or some other exception, to follow the new Python2.6  
convention, or silently replace np.nan by some value acceptable by int  
dtype (0, or something else) ?

Thanks for any suggestion,
P.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Christopher Barker
Pierre GM wrote:
 FYI,  I've been coding over the last couple of weeks an extension of 
 loadtxt for a better support of masked data, with the option to read 
 column names in a header. Please find an example below

great, thanks! this could be very useful to me.

Two comments:


missing : string, optional
 A string representing a missing value, irrespective of the 
column where it appears (e.g., ``'missing'`` or ``'unused'``.


It might be nice if missing could be a sequence of strings, if there 
is more than one value for missing values, that are not clearly mapped 
to a particular field.



missing_values : {None, dictionary}, optional
 A dictionary mapping a column number to a string indicating 
whether the corresponding field should be masked.


would it possible to specify column header, rather than number here?


-Chris







-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy on Mac OS X python 2.6

2008-11-25 Thread David Cournapeau
On Wed, Nov 26, 2008 at 12:59 AM, Charles R Harris
[EMAIL PROTECTED] wrote:

 Apart from the Mac, the ppc can be configured to run either bigendian or
 littleendian, so the hardware encompasses more than just the cpu, it's the
 whole darn board.

Yep, many CPU families have double endian support (MIPS, ARM, PA-RISC,
ALPHA. There is also mixed endian. Honestly, I think it is safe to
assume that we don't need to care so much about those configurations
for the time being. If it is a problem, we can then discuss about our
headers being endian-free (which is really the best approach).

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM

On Nov 25, 2008, at 12:30 PM, Christopher Barker wrote:

 
 missing : string, optional
 A string representing a missing value, irrespective of the
 column where it appears (e.g., ``'missing'`` or ``'unused'``.
 

 It might be nice if missing could be a sequence of strings, if there
 is more than one value for missing values, that are not clearly mapped
 to a particular field.

OK, easy enough.

 
 missing_values : {None, dictionary}, optional
 A dictionary mapping a column number to a string indicating
 whether the corresponding field should be masked.
 

 would it possible to specify column header, rather than number here?

A la mlab.csv2rec ? It could work with a bit more tweaking, basically  
following John Hunter's et al. path. What happens when the column  
names are unknown (read from the header) or wrong ?

Actually, I'd like John to comment on that, hence the CC. More  
generally, wouldn't be useful to push the recarray manipulating  
functions from matplotlib.mlab to numpy ?
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Christopher Barker
Pierre GM wrote:
 would it possible to specify column header, rather than number here?
 
 A la mlab.csv2rec ?

I'll have to take a look at that.

 following John Hunter's et al. path. What happens when the column  
 names are unknown (read from the header) or wrong ?

well, my use case is that I don't know column numbers, but I do now 
column headers, and what missing value is associated with a give 
header. You have to know something! if the header is wrong, you get an 
error, though we may need to decide what wrong means.

In my case, I'm dealing with data that has pre-specified headers (and I 
think missing values that go with them), but in any given file I don't 
know which of those columns is there. I want to read it in, and be able 
to query the result for what data it has.


 Actually, I'd like John to comment on that, hence the CC.

I don't see a CC ,but yes, it would be nice to get his input.

 More  
 generally, wouldn't be useful to push the recarray manipulating  
 functions from matplotlib.mlab to numpy ?

I think so -- or scipy. I 'd really like MPL to be about plotting, and 
only plotting.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.ma.sort failing with bus error

2008-11-25 Thread Charles سمير Doutriaux
Thx Pierre,

don't worry about it it's not a show stopper at all

C.

On Nov 24, 2008, at 12:04 PM, Pierre GM wrote:

 Charles,
 Confirmed on my machine...
 I gonna have to clean ma.sort, as there are indeed some temporaries
 that probably don't need to be created. I must warn you however that I
 won;t have a lot of time to spend on that in the next few days. In any
 case, of course, I'll keep you posted.
 Thx for reporting!


 On Nov 24, 2008, at 12:03 PM, Charles سمير Doutriaux wrote:

 i mistyped the second line of the sample failing script
 it should obviously read:
 a=numpy.ma.ones((16800,60,96),'f')
 not numpy.ma.sort((16800,60,96),'f')

 C.

 On Nov 24, 2008, at 8:40 AM, Charles سمير Doutriaux wrote:

 Hello,

 Using numpy 1.2.1  on a mac os 10.5


 I admit the user was sort of stretching the limits but (on his
 machine)

 import numpy
 a=numpy.ones((16800,60,96),'f')
 numpy.sort(a,axis=0)

 works

 import numpy.ma
 a=numpy.ma.sort((16800,60,96),'f')
 numpy.ma.sort(a,axis=0)

 failed with some malloc error:
 python(435) malloc: *** mmap(size=2097152) failed (error code=12)
 *** error: can't allocate region
 *** set a breakpoint in malloc_error_break to debug
 Bus error

 Since there's no mask I don't really see how much more memory it's
 using. Beside changing 16800 to 15800 still fails (and now that
 should
 be using much less memory)

 Anyhow I would expect i nicer error than a bus error :)

 Thx,

 C



 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://  projects.scipy.org/mailman/listinfo/numpy-discussion


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http:// projects.scipy.org/mailman/listinfo/numpy-discussion

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http:// projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote:
 Ryan,
 FYI,  I've been coding over the last couple of weeks an extension of 
 loadtxt for a better support of masked data, with the option to read 
 column names in a header. Please find an example below (I also have 
 unittest). Most of the work is actually inspired from matplotlib's 
 mlab.csv2rec. It might be worth not duplicating efforts.
 Cheers,
 P.

Absolutely!  Definitely don't want to duplicate effort here.  What I see 
here meets a lot of what I was looking for.  Here are some questions:

1) It looks like the function returns a structured array rather than a 
rec array, so that fields are obtained by doing a dictionary access. 
Since it's a dictionary access, is there any reason that the header 
needs to be munged to replace characters and reserved names?  IIUC, 
csv2rec changes names b/c it returns a rec array, which uses attribute 
lookup and hence all names need to be valid python identifiers.  This is 
not the case for a structured array.

2) Can we avoid the use of seek() in here?  I just posted a patch to 
change the check to readline, which was the only file function used 
previously.  This allowed the direct use of a file-like object returned 
by urllib2.urlopen().

3) In order to avoid breaking backwards compatibility, can we change to 
default for dtype to be float32, and instead use some kind of special 
value ('auto' ?) to use the automatic dtype determination?

I'm currently cooking up some of these changes myself, but thought I 
would see what you thought first.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM

On Nov 25, 2008, at 2:06 PM, Ryan May wrote:

 1) It looks like the function returns a structured array rather than a
 rec array, so that fields are obtained by doing a dictionary access.
 Since it's a dictionary access, is there any reason that the header
 needs to be munged to replace characters and reserved names?  IIUC,
 csv2rec changes names b/c it returns a rec array, which uses attribute
 lookup and hence all names need to be valid python identifiers.   
 This is
 not the case for a structured array.

Personally, I prefer flexible ndarrays to recarrays, hence the output.  
However, I still think that names should be as clean as possible to  
avoid bad surprises down the road.


 2) Can we avoid the use of seek() in here?  I just posted a patch to
 change the check to readline, which was the only file function used
 previously.  This allowed the direct use of a file-like object  
 returned
 by urllib2.urlopen().

I coded that a couple of weeks ago, before you posted your patch and I  
didn't have tme to check it. Yes, we could try getting rid of seek.  
However, we need to find a way to rewind to the beginning of the file  
if the dtypes are not given in input (as we parsed the whole file to  
find the best converter in that case).


 3) In order to avoid breaking backwards compatibility, can we change  
 to
 default for dtype to be float32, and instead use some kind of special
 value ('auto' ?) to use the automatic dtype determination?

I'm not especially concerned w/ backwards compatibility, because we're  
supporting masked values (something that np.loadtxt shouldn't have to  
worry about). Initially, I needed a replacement to the fromfile  
function in the scikits.timeseries.trecords package. I figured it'd be  
easier and more portable to get a function for generic masked arrays,  
that could be adapted afterwards to timeseries. In any case, I was  
more considering the functions I send you to be part of some  
numpy.ma.io module than a replacement to np.loadtxt. I tried to get  
the syntax as close as possible to np.loadtxt and mlab.csv2rec, but  
there'll always be some differences.

So, yes, we could try to use a default dtype=float and yes, we could  
have an extra parameter 'auto'. But is it really that useful ? I'm not  
sure (well, no, I'm sure it's not...)

 I'm currently cooking up some of these changes myself, but thought I
 would see what you thought first.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread John Hunter
On Tue, Nov 25, 2008 at 12:16 PM, Pierre GM [EMAIL PROTECTED] wrote:

 A la mlab.csv2rec ? It could work with a bit more tweaking, basically
 following John Hunter's et al. path. What happens when the column names are
 unknown (read from the header) or wrong ?

 Actually, I'd like John to comment on that, hence the CC. More generally,
 wouldn't be useful to push the recarray manipulating functions from
 matplotlib.mlab to numpy ?

Yes, I've said on a number of occasions I'd like to see these
functions in numpy, since a number of them make more sense as numpy
methods than as stand alone functions.

 What happens when the column names are unknown (read from the header) or 
 wrong ?

I'm not quite sure what you are looking for here.  Either the user
will have to know the correct column name or the column number or you
should raise an error.  I think supporting column names everywhere
they make sense is critical since this is how most people think about
these CSV-like files with column headers.

One other thing that is essential for me is that date support is
included.  Virtually every CSV file I work with has date data in it,
in a variety of formats, and I depend on csv2rec (via
dateutil.parser.parse which mpl ships) to be able to handle it w/o any
extra cognitive overhead, albeit at the expense of some performance
overhead, but my files aren't too big.  I'm not sure how numpy would
handle the date parsing aspect, but this came up in the date datatype
PEP discussion I think.  For me, having to manually specify a date
converter with the proper format string every time I load a CSV file
is probably not viable.

Another feature that is critical to me is to be able to get a
np.recarray back instead of a record array.  I use these all day long,
and the convenience of r.date over r['date'] is too much for me to
give up.

Feel free to ignore these suggestions if they are too burdensome or
not appropriate for numpy -- I'm just letting you know some of the
things I need to see before I personally would stop using mlab.csv2rec
 and use numpy.loadtxt instead.

One last thing, I consider the masked array support in csv2rec
somewhat broken because when using a masked array you cannot get at
the data (eg datetime methods or string methods) directly using the
same interface that regular recarrays use.  Pierre, last I brought
this up you asked for some example code and indicated a willingness to
work on it but I fell behind and never posted it.  The code
illustrating the problem is below.  I'm really not sure what the right
solution is, but the current implementation -- sometimes returning a
plain-vanilla rec array, sometimes returning a masked record array --
with different interfaces is not good.

Perhaps the best solution is to force the user to ask for masked
support, and then always return a masked array whether any of the data
is masked or not.  csv2rec conditionally returns a masked array only
if some of the data are masked, which makes it difficult to use.

JDH

Here is the problem I referred to above -- in f1 none of the rows are
masked and so I can access the object attributes from the rows
directly.  In the 2nd example, row 3 has some missing data so I get an
mrecords recarray back, which does not allow me to directly access the
valid data methods.

from  StringIO import StringIO
import matplotlib.mlab as mlab
f1 = StringIO(\
date,name,age,weight
2008-10-12,'Bill',22,125.
2008-10-13,'Tom',23,135.
2008-10-14,'Sally',23,145.
)

r1 = mlab.csv2rec(f1)
row0 = r1[0]
print row0.date.year, row0.name.upper()

f2 = StringIO(\
date,name,age,weight
2008-10-12,'Bill',22,125.
2008-10-13,'Tom',23,135.
2008-10-14,'',,145.
)

r2 = mlab.csv2rec(f2)
row0 = r2[0]
print row0.date.year, row0.name.upper()
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote:
 On Nov 25, 2008, at 2:06 PM, Ryan May wrote:
 1) It looks like the function returns a structured array rather than a
 rec array, so that fields are obtained by doing a dictionary access.
 Since it's a dictionary access, is there any reason that the header
 needs to be munged to replace characters and reserved names?  IIUC,
 csv2rec changes names b/c it returns a rec array, which uses attribute
 lookup and hence all names need to be valid python identifiers.   
 This is
 not the case for a structured array.
 
 Personally, I prefer flexible ndarrays to recarrays, hence the output.  
 However, I still think that names should be as clean as possible to  
 avoid bad surprises down the road.

Ok, I'm not really partial to this, I just thought it would simplify. 
Your point is valid.

 2) Can we avoid the use of seek() in here?  I just posted a patch to
 change the check to readline, which was the only file function used
 previously.  This allowed the direct use of a file-like object  
 returned
 by urllib2.urlopen().
 
 I coded that a couple of weeks ago, before you posted your patch and I  
 didn't have tme to check it. Yes, we could try getting rid of seek.  
 However, we need to find a way to rewind to the beginning of the file  
 if the dtypes are not given in input (as we parsed the whole file to  
 find the best converter in that case).

What about doing the parsing and type inference in a loop and holding 
onto the already split lines?  Then loop through the lines with the 
converters that were finally chosen?  In addition to making my usecase 
work, this has the benefit of not doing the I/O twice.

 3) In order to avoid breaking backwards compatibility, can we change  
 to
 default for dtype to be float32, and instead use some kind of special
 value ('auto' ?) to use the automatic dtype determination?
 
 I'm not especially concerned w/ backwards compatibility, because we're  
 supporting masked values (something that np.loadtxt shouldn't have to  
 worry about). Initially, I needed a replacement to the fromfile  
 function in the scikits.timeseries.trecords package. I figured it'd be  
 easier and more portable to get a function for generic masked arrays,  
 that could be adapted afterwards to timeseries. In any case, I was  
 more considering the functions I send you to be part of some  
 numpy.ma.io module than a replacement to np.loadtxt. I tried to get  
 the syntax as close as possible to np.loadtxt and mlab.csv2rec, but  
 there'll always be some differences.
 
 So, yes, we could try to use a default dtype=float and yes, we could  
 have an extra parameter 'auto'. But is it really that useful ? I'm not  
 sure (well, no, I'm sure it's not...)

I understand you're not concerned with backwards compatibility, but with 
the exception of missing handling, which is probably specific to masked 
arrays, I was hoping to just add functionality to loadtxt().  Numpy 
doesn't need a separate text reader for most of this and breaking API 
for any of this is likely a non-starter.  So while, yes, having float be 
the default dtype is probably not the most useful, leaving it also 
doesn't break existing code.

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM

On Nov 25, 2008, at 2:26 PM, John Hunter wrote:

 Yes, I've said on a number of occasions I'd like to see these
 functions in numpy, since a number of them make more sense as numpy
 methods than as stand alone functions.

Great. Could we think about getting that on for 1.3x, would you have  
time ? Or should we wait till early jan. ?

 One other thing that is essential for me is that date support is
 included.

As I mentioned in an earlier post, I needed to get a replacement for a  
function in scikits.timeseries, where we do need dates, but I also  
needed something not too specific for numpy.ma. So I thought about  
extracting the conversion methods from the bulk of the function and  
create this new object, StringConverter, that takes care of the  
conversion. If you need to add date support, the simplest is to extend  
your StringConverter to take the date/datetime functions just after  
you import _preview (or numpy.ma.io if we go that path)

  dateparser = dateutil.parser.parse
  # Update the StringConvert mapper, so that date-like columns are  
automatically
  # converted
  _preview.StringConverter.mapper.insert(-1, (dateparser,

datetime.date(2000,  
1, 1)))
That way, if a date is found i one of the column, it'll be converted  
appropiately. Seems to work pretty well for scikits.timeseries, I'll  
try to post that in the next couples of weeks (once I ironed out some  
of the numpy.ma bugs...)

 Another feature that is critical to me is to be able to get a
 np.recarray back instead of a record array.  I use these all day long,
 and the convenience of r.date over r['date'] is too much for me to
 give up.

No problem: just take a view once you got your output. I thought about  
adding yet another parameter that'd take care of that directly, but  
then we end up with far too many keywords...

 One last thing, I consider the masked array support in csv2rec
 somewhat broken because when using a masked array you cannot get at
 the data (eg datetime methods or string methods) directly using the
 same interface that regular recarrays use.

Well, it's more mrecords which is broken. I committed some fix a  
little while back, but it might not be very robust. I need to check  
that w/ your example.

 Perhaps the best solution is to force the user to ask for masked
 support, and then always return a masked array whether any of the data
 is masked or not.  csv2rec conditionally returns a masked array only
 if some of the data are masked, which makes it difficult to use.


Forcing to a flexible masked array would make quite sense if we pushed  
that function in numpy.ma.io. I don't think we should overload  
np.loadtxt too much anyway...


On Nov 25, 2008, at 2:37 PM, Ryan May wrote:

 What about doing the parsing and type inference in a loop and holding
 onto the already split lines?  Then loop through the lines with the
 converters that were finally chosen?  In addition to making my usecase
 work, this has the benefit of not doing the I/O twice.

You mean, filling a list and relooping on it if we need to ? Sounds  
like a plan, but doesn't it create some extra temporaries we may not  
want ?

 I understand you're not concerned with backwards compatibility, but  
 with
 the exception of missing handling, which is probably specific to  
 masked
 arrays, I was hoping to just add functionality to loadtxt().  Numpy
 doesn't need a separate text reader for most of this and breaking API
 for any of this is likely a non-starter.  So while, yes, having  
 float be
 the default dtype is probably not the most useful, leaving it also
 doesn't break existing code.

Depends on how we do it. We could have a  modified np.loadtxt that  
takes some of the ideas of the file I send you (the StringConverter,  
for example), then I could have a numpy.ma.io that would take care of  
the missing data. And something in scikits.timeseries for the dates...

The new np.loadtxt could use the default of the initial one, or we  
could create yet another function (np.loadfromtxt) that would match  
what I was suggesting, and np.loadtxt would be a special stripped  
downcase with dtype=float by default.

thoughts?





___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
 On Nov 25, 2008, at 2:37 PM, Ryan May wrote:
 What about doing the parsing and type inference in a loop and holding
 onto the already split lines?  Then loop through the lines with the
 converters that were finally chosen?  In addition to making my usecase
 work, this has the benefit of not doing the I/O twice.
 
 You mean, filling a list and relooping on it if we need to ? Sounds  
 like a plan, but doesn't it create some extra temporaries we may not  
 want ?

It shouldn't create any *extra* temporaries since we already make a list 
of lists before creating the final array.  It just introduces an extra 
looping step. (I'd reuse the existing list of lists).

 Depends on how we do it. We could have a  modified np.loadtxt that  
 takes some of the ideas of the file I send you (the StringConverter,  
 for example), then I could have a numpy.ma.io that would take care of  
 the missing data. And something in scikits.timeseries for the dates...
 
 The new np.loadtxt could use the default of the initial one, or we  
 could create yet another function (np.loadfromtxt) that would match  
 what I was suggesting, and np.loadtxt would be a special stripped  
 downcase with dtype=float by default.
 
 thoughts?

My personal opinion is that if it doesn't make loadtxt too unwieldly, to 
just add a few of the options to loadtxt() itself.  I'm working on 
tweaking loadtxt() to add the auto dtype and the names, relying heavily 
on your StringConverter class (nice code btw.).  If my understanding of 
StringConverter is correct, tweaking the new loadtxt for ma or 
timeseries would only require passing in modified versions of 
StringConverter.

I'll post that when I'm done and we can see if it looks like too much 
functionality stapled together or not.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM

 It shouldn't create any *extra* temporaries since we already make a  
 list
 of lists before creating the final array.  It just introduces an extra
 looping step. (I'd reuse the existing list of lists).

Cool then, go for it.

  If my understanding of
 StringConverter is correct, tweaking the new loadtxt for ma or
 timeseries would only require passing in modified versions of
 StringConverter.

Nope, we still need to double check whether there's any missing data  
in any field of the line we process, independently of the conversion.  
So there must be some extra loop involved, and I'd need a special  
function in numpy.ma to take care of that. So our options are
* create a new function in numpy.ma and leave np.loadtxt like that
* write a new np.loadtxt incorporating most of the ideas of the code I  
send, but I'd still need to adapt it to support masked values.


 I'll post that when I'm done and we can see if it looks like too much
 functionality stapled together or not.

Sounds like a plan. Wouldn't mind getting more feedback from fellow  
users before we get too deep, however...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote:
 Nope, we still need to double check whether there's any missing data  
 in any field of the line we process, independently of the conversion.  
 So there must be some extra loop involved, and I'd need a special  
 function in numpy.ma to take care of that. So our options are
 * create a new function in numpy.ma and leave np.loadtxt like that
 * write a new np.loadtxt incorporating most of the ideas of the code I  
 send, but I'd still need to adapt it to support masked values.

You couldn't run this loop on the array returned by np.loadtxt() (by 
masking on the appropriate fill value)?

 I'll post that when I'm done and we can see if it looks like too much
 functionality stapled together or not.
 
 Sounds like a plan. Wouldn't mind getting more feedback from fellow  
 users before we get too deep, however...

Agreed.  Anyone?

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM

On Nov 25, 2008, at 3:33 PM, Ryan May wrote:

 You couldn't run this loop on the array returned by np.loadtxt() (by
 masking on the appropriate fill value)?

Yet an extra loop... Doable, yes... But meh.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread John Hunter
On Tue, Nov 25, 2008 at 2:01 PM, Pierre GM [EMAIL PROTECTED] wrote:

 On Nov 25, 2008, at 2:26 PM, John Hunter wrote:

 Yes, I've said on a number of occasions I'd like to see these
 functions in numpy, since a number of them make more sense as numpy
 methods than as stand alone functions.

 Great. Could we think about getting that on for 1.3x, would you have
 time ? Or should we wait till early jan. ?

I wasn't volunteering to do it, just that I support the migration if
someone else wants to do it.

I'm fully committed with mpl already...

JDH
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM
OK then, I'll take care of that over the next few weeks...


On Nov 25, 2008, at 4:56 PM, John Hunter wrote:

 On Tue, Nov 25, 2008 at 2:01 PM, Pierre GM [EMAIL PROTECTED]  
 wrote:

 On Nov 25, 2008, at 2:26 PM, John Hunter wrote:

 Yes, I've said on a number of occasions I'd like to see these
 functions in numpy, since a number of them make more sense as numpy
 methods than as stand alone functions.

 Great. Could we think about getting that on for 1.3x, would you have
 time ? Or should we wait till early jan. ?

 I wasn't volunteering to do it, just that I support the migration if
 someone else wants to do it.

 I'm fully committed with mpl already...

 JDH
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Travis E. Oliphant
John Hunter wrote:
 On Tue, Nov 25, 2008 at 12:16 PM, Pierre GM [EMAIL PROTECTED] wrote:

   
 A la mlab.csv2rec ? It could work with a bit more tweaking, basically
 following John Hunter's et al. path. What happens when the column names are
 unknown (read from the header) or wrong ?

 Actually, I'd like John to comment on that, hence the CC. More generally,
 wouldn't be useful to push the recarray manipulating functions from
 matplotlib.mlab to numpy ?
 

 Yes, I've said on a number of occasions I'd like to see these
 functions in numpy, since a number of them make more sense as numpy
 methods than as stand alone functions.

   
John and I are in agreement here.   The issue has remained somebody 
stepping up and doing the conversions (and fielding the questions and 
the resulting discussion) for the various routines that probably ought 
to go into NumPy.

This would be a great place to get involved if there is a lurker looking 
for a project.

-Travis

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Travis E. Oliphant
Pierre GM wrote:
 OK then, I'll take care of that over the next few weeks...

   
Thanks  Pierre. 

-Travis


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM
Oh don't mention...
However, I'd be quite grateful if you could give an eye to the pb of  
mixing np.scalars and 0d subclasses of ndarray: looks like it's a C  
pb, quite out of my league...

http://scipy.org/scipy/numpy/ticket/826
http://article.gmane.org/gmane.comp.python.numeric.general/26354/match=priority+rules
http://article.gmane.org/gmane.comp.python.numeric.general/25670/match=priority+rules



On Nov 25, 2008, at 5:24 PM, Travis E. Oliphant wrote:

 Pierre GM wrote:
 OK then, I'll take care of that over the next few weeks...


 Thanks  Pierre.

 -Travis


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May

Pierre GM wrote:
Sounds like a plan. Wouldn't mind getting more feedback from fellow  
users before we get too deep, however...


Ok, I've attached, as a first cut, a diff against SVN HEAD that does (I 
think) what I'm looking for.  It passes all of the old tests and passes 
my own quick test.  A more rigorous test suite will follow, but I want 
this out the door before I need to leave for the day.


What this changeset essentially does is just add support for automatic 
dtypes along with supplying/reading names for flexible dtypes.  It 
leverages StringConverter heavily, using a few tweaks so that old 
behavior is kept.  This is by no means a final version.


Probably the biggest change from what I mentioned earlier is that 
instead of dtype='auto', I've used dtype=None to signal the detection 
code, since dtype=='auto' causes problems.


I welcome any and all suggestions here, both on the code and on the 
original idea of adding these capabilities to loadtxt().


Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Index: lib/io.py
===
--- lib/io.py   (revision 6099)
+++ lib/io.py   (working copy)
@@ -233,29 +233,138 @@
 for name in todel:
 os.remove(name)
 
-# Adapted from matplotlib
+def _string_like(obj):
+try: obj + ''
+except (TypeError, ValueError): return False
+return True
 
-def _getconv(dtype):
-typ = dtype.type
-if issubclass(typ, np.bool_):
-return lambda x: bool(int(x))
-if issubclass(typ, np.integer):
-return lambda x: int(float(x))
-elif issubclass(typ, np.floating):
-return float
-elif issubclass(typ, np.complex):
-return complex
+def str2bool(value):
+
+Tries to transform a string supposed to represent a boolean to a boolean.
+
+Raises
+--
+ValueError
+If the string is not 'True' or 'False' (case independent)
+
+value = value.upper()
+if value == 'TRUE':
+return True
+elif value == 'FALSE':
+return False
 else:
-return str
+return int(bool(value))
 
+class StringConverter(object):
+
+Factory class for function transforming a string into another object (int,
+float).
 
-def _string_like(obj):
-try: obj + ''
-except (TypeError, ValueError): return 0
-return 1
+After initialization, an instance can be called to transform a string 
+into another object. If the string is recognized as representing a missing
+value, a default value is returned.
 
+Parameters
+--
+dtype : dtype, optional
+Input data type, used to define a basic function and a default value
+for missing data. For example, when `dtype` is float, the :attr:`func`
+attribute is set to ``float`` and the default value to `np.nan`.
+missing_values : sequence, optional
+Sequence of strings indicating a missing value.
+
+Attributes
+--
+func : function
+Function used for the conversion
+default : var
+Default value to return when the input corresponds to a missing value.
+mapper : sequence of tuples
+Sequence of tuples (function, default value) to evaluate in order.
+
+
+from numpy.core import nan # To avoid circular import
+mapper = [(str2bool, None),
+  (lambda x: int(float(x)), -1),
+  (float, nan),
+  (complex, nan+0j),
+  (str, '???')]
+
+def __init__(self, dtype=None, missing_values=None):
+if dtype is None:
+self.func = str2bool
+self.default = None
+self._status = 0
+else:
+dtype = np.dtype(dtype).type
+self.func,self.default,self._status = self._get_from_dtype(dtype)
+
+# Store the list of strings corresponding to missing values.
+if missing_values is None:
+self.missing_values = []
+else:
+self.missing_values = set(list(missing_values) + [''])
+
+def __call__(self, value):
+if value in self.missing_values:
+return self.default
+return self.func(value)
+
+def upgrade(self, value):
+
+Tries to find the best converter for `value`, by testing different
+converters in order.
+The order in which the converters are tested is read from the
+:attr:`_status` attribute of the instance.
+
+try:
+self.__call__(value)
+except ValueError:
+_statusmax = len(self.mapper)
+if self._status == _statusmax:
+raise ValueError(Could not find a valid conversion function)
+elif self._status  _statusmax - 1:
+self._status += 1
+(self.func, self.default) = self.mapper[self._status]
+self.upgrade(value)
+
+def _get_from_dtype(self, dtype):
+
+Sets the :attr:`func` 

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM

Ryan,
Quick comments:

* I already have some unittests for StringConverter, check the file I  
attach.


* Your str2bool will probably mess things up in upgrade compared to  
the one JDH had written (the one I send you): you don't wanna use  
int(bool(value)), as it'll always give you 0 or 1 when you might need  
a ValueError


* Your locked version of update won't probably work either, as you  
force the converter to output a string (you set the status to largest  
possible, that's the one that outputs strings). Why don't you set the  
status to the current one (make a tmp one if needed).


* I'd probably get rid of StringConverter._get_from_dtype, as it is  
not needed outside the __init__. You may wanna stick to the original  
__init__.



# pylint disable-msg=E1101, W0212, W0621

import os
import tempfile
import numpy as np
import numpy.ma as ma

from numpy.ma.testutils import *

from StringIO import StringIO

from _preview import *


class TestStringConverter(TestCase):
Test Stringconverter
#
def test_upgrade(self):
Tests the upgrade method.
converter = StringConverter()
assert_equal(converter._status, 0)
converter.upgrade('0')
assert_equal(converter._status, 1)
converter.upgrade('0.')
assert_equal(converter._status, 2)
converter.upgrade('0j')
assert_equal(converter._status, 3)
converter.upgrade('a')
assert_equal(converter._status, len(converter.mapper)-1)
#
def test_missing(self):
Tests the use of missing values.
converter = StringConverter(missing_values=('missing','missed'))
converter.upgrade('0')
assert_equal(converter('0'), 0)
assert_equal(converter(''), converter.default)
assert_equal(converter('missing'), converter.default)
assert_equal(converter('missed'), converter.default)
try:
converter('miss')
except ValueError:
pass



class TestLineReader(TestCase):
Tests the LineReader class
#
def test_spacedelimiter(self):
Tests the use of space as delimiter.
data = StringIO(0 1\n2   3\n4 5   6)
reader = LineReader(data)
nbfields = [len(line) for line in reader]
assert_equal(nbfields, [2, 2, 3])
#
def test_get_first_row(self):
Tests the access of the first row.
data = StringIO(0 1\n2   3\n4 5   6)
reader = LineReader(data)
assert_equal(reader.get_first_valid_row(), ['0', '1'])



class TestLoadTxt(TestCase):
Test the `loadtxt` function.
#
def setUp(self):
Pre-processing and initialization.
data = 0 1\n2 3
(self.fhdw, self.fhnw) = tempfile.mkstemp()
(self.fhdwo, self.fhnwo) = tempfile.mkstemp()
os.write(self.fhdw, A B\n)
os.write(self.fhdwo, data)
os.write(self.fhdw, data)
os.close(self.fhdw)
os.close(self.fhdwo)
#
def tearDown(self):
Post-processing.
os.remove(self.fhnw)
os.remove(self.fhnwo)
#
def test_noheader(self):
Tests loadtxt in absence of an header.
data = self.fhnwo
# No dtype
test = loadtxt(data)
assert_equal(test.shape, (1,))
assert_equal(test.item(), (2, 3))
assert_equal(test.dtype.names, ['0', '1'])
# w/ basic dtype
test = loadtxt(data, dtype=np.float)
control = ma.array([[0, 1], [2, 3]], mask=False)
assert_equal(test, control)
# w/ flexible dtype
dtype = [('A', np.int), ('B', np.float)]
test = loadtxt(data, dtype=dtype)
control = ma.array([(0, 1), (2, 3)], mask=(False, False), dtype=dtype)
assert_equal(test, control)
# w/ descriptor
descriptor = {'names':('A', 'B'), 'formats':(np.int, np.float)}
test = loadtxt(data, dtype=descriptor)
control = ma.array([(0, 1), (2, 3)], mask=(False, False), dtype=dtype)
assert_equal(test, control)
# w/ names
test = loadtxt(data, names=a,b)
dtype = [('a', np.int), ('b', np.int)]
assert_equal(test, np.array([(0, 1), (2, 3)], dtype=dtype))
assert_equal(test['a'].dtype, np.dtype(np.int))
#
def test_with_noheader_with_missing(self):
Tests `loadtxt` on a file w/o header, but w/ missig values.
data = StringIO(0 1\n2  )
test = loadtxt(data, dtype=float)
assert_equal(test, [[0, 1], [2, 3]])
assert_equal(test.mask, [[0, 0], [0, 1]])
#
def test_with_header(self):
Tests `loadtxt` on a file w/ header.
data = self.fhnw
control = ma.array([(0, 1), (2, 3)],
   dtype=[('a', np.int), ('b', np.int)])
# No dtype
test = loadtxt(data)
assert_equal(test.dtype.names, ['a', 'b'])
assert_equal(test, control)
# W dtype: should fail, as there's already a header
dtype = [('A', np.float), ('B', np.int)]
try:
  

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Charles R Harris
On Tue, Nov 25, 2008 at 5:00 PM, Pierre GM [EMAIL PROTECTED] wrote:
snip

 All, another question:
 What's the best way to have some kind of sandbox for code like the one Ryan
 is writing ? So that we can try it, modify it, without commiting anything to
 SVN yet ?


Probably make a branch and do commits there. If you don't want to hassle
with a merge, just copy the file over to the trunk when you are done and
commit it from there, then remove the branch. Instructions on making
branches are at http://projects.scipy.org/scipy/numpy/wiki/MakingBranches .

snip

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Problems building numpy on solaris 10 x86

2008-11-25 Thread Peter Norton
Back in the beginning of the summer, I jumped through a lot of hoops to
build numpy+scipy on solaris, 64-bit with gcc. I received a lot of help from
David C., and ended up, by some very ugly hacking, building an acceptable
numpy+scipy+matplotlib trio for use at my company.

However, I'm back at it again trying to build the same tools in both a
32-bit abi and a 64-bit ABI. I'm starting with the 32-bit build, because I
suspect it'd be simpler (less trouble adding things like -m64 and other such
flags). However, I've run into a very basic problem right at the get-go.
This time instead of bothering David at the beginning of my build, I was
hoping that other people may have experience to contribute to resolving my
issues.

Here is my build environment:

1) gcc-4.3.1
2) Solaris 10 update 3
3) sunperf libraries (for blas+lapack support)

I can provide more detail since that's not a very specific list.

Anyway, when I try building numpy-1.2.1 after setting up my site.cfg and
build-related environment this is what I get:


Setting the site.cfg
Running from numpy source directory.
F2PY Version 2_5972
non-existing path in 'numpy/core': 'code_generators/array_api_order.txt'
[continues...]
scons: Reading SConscript files ...

scons: warning: Ignoring missing SConscript
'build/scons/numpy/core/SConscript'
File
/usr/local/python-2.5.1/lib/python2.5/site-packages/numscons-0.9.4-py2.5.egg/numscons/core/numpyenv.py,
line 108, in DistutilsSConscript
scons: done reading SConscript files.
scons: Building targets ...
scons: *** [Errno 2] No such file or directory:
'numpy/core/../../build/scons/numpy/core/sconsign.dblite'
scons: building terminated because of errors.
error: Error while executing scons command. See above for more information.
If you think it is a problem in numscons, you can also try executing the
scons
command with --log-level option for more detailed output of what numscons is
doing, for example --log-level=0; the lowest the level is, the more detailed
the output it.
[etc.]

then similar errors repeat themselves over and over including ignoreing
missing SConscript, and no sconsign.dblite file, until the build bombs out.

I've got numscons installed from pypi:
 import numscons.version
 numscons.version.VERSION
'0.9.4'

Can anyone get me on the right track here?

Thanks,

-Peter
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Problems building numpy on solaris 10 x86

2008-11-25 Thread Charles R Harris
On Tue, Nov 25, 2008 at 4:54 PM, Peter Norton 
[EMAIL PROTECTED] wrote:

 Back in the beginning of the summer, I jumped through a lot of hoops to
 build numpy+scipy on solaris, 64-bit with gcc. I received a lot of help from
 David C., and ended up, by some very ugly hacking, building an acceptable
 numpy+scipy+matplotlib trio for use at my company.

 However, I'm back at it again trying to build the same tools in both a
 32-bit abi and a 64-bit ABI. I'm starting with the 32-bit build, because I
 suspect it'd be simpler (less trouble adding things like -m64 and other such
 flags). However, I've run into a very basic problem right at the get-go.
 This time instead of bothering David at the beginning of my build, I was
 hoping that other people may have experience to contribute to resolving my
 issues.

 Here is my build environment:

 1) gcc-4.3.1
 2) Solaris 10 update 3
 3) sunperf libraries (for blas+lapack support)

 I can provide more detail since that's not a very specific list.

 Anyway, when I try building numpy-1.2.1 after setting up my site.cfg and
 build-related environment this is what I get:


 Setting the site.cfg
 Running from numpy source directory.
 F2PY Version 2_5972
 non-existing path in 'numpy/core': 'code_generators/array_api_order.txt'
 [continues...]
 scons: Reading SConscript files ...

 scons: warning: Ignoring missing SConscript
 'build/scons/numpy/core/SConscript'
 File
 /usr/local/python-2.5.1/lib/python2.5/site-packages/numscons-0.9.4-py2.5.egg/numscons/core/numpyenv.py,
 line 108, in DistutilsSConscript
 scons: done reading SConscript files.
 scons: Building targets ...
 scons: *** [Errno 2] No such file or directory:
 'numpy/core/../../build/scons/numpy/core/sconsign.dblite'
 scons: building terminated because of errors.
 error: Error while executing scons command. See above for more information.
 If you think it is a problem in numscons, you can also try executing the
 scons
 command with --log-level option for more detailed output of what numscons
 is
 doing, for example --log-level=0; the lowest the level is, the more
 detailed
 the output it.
 [etc.]

 then similar errors repeat themselves over and over including ignoreing
 missing SConscript, and no sconsign.dblite file, until the build bombs out.

 I've got numscons installed from pypi:
  import numscons.version
  numscons.version.VERSION
 '0.9.4'

 Can anyone get me on the right track here?


What happens if you go the usual python setup.py {build,install} route?

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote:
 Ryan,
 Quick comments:
 
 * I already have some unittests for StringConverter, check the file I 
 attach.

Ok, great.

 * Your str2bool will probably mess things up in upgrade compared to the 
 one JDH had written (the one I send you): you don't wanna use 
 int(bool(value)), as it'll always give you 0 or 1 when you might need a 
 ValueError

Ok, I wasn't sure.  I was trying to merge what the old code used with 
the new str2bool you supplied.  That's probably not all that necessary.

 * Your locked version of update won't probably work either, as you force 
 the converter to output a string (you set the status to largest 
 possible, that's the one that outputs strings). Why don't you set the 
 status to the current one (make a tmp one if needed).

Looking at the code, it looks like mapper is only used in the upgrade() 
method. My goal by setting status to the largest possible is to lock the 
converter to the supplied function.  That way for the user supplied 
converters, the StringConverter doesn't try to upgrade away from it.  My 
thinking was that if the user supplied converter function fails, the 
user should know. (Though I got this wrong the first time.)

 * I'd probably get rid of StringConverter._get_from_dtype, as it is not 
 needed outside the __init__. You may wanna stick to the original __init__.

Done.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM

On Nov 25, 2008, at 10:02 PM, Ryan May wrote:
 Pierre GM wrote:

 * Your locked version of update won't probably work either, as you  
 force
 the converter to output a string (you set the status to largest
 possible, that's the one that outputs strings). Why don't you set the
 status to the current one (make a tmp one if needed).

 Looking at the code, it looks like mapper is only used in the  
 upgrade()
 method. My goal by setting status to the largest possible is to lock  
 the
 converter to the supplied function.  That way for the user supplied
 converters, the StringConverter doesn't try to upgrade away from  
 it.  My
 thinking was that if the user supplied converter function fails, the
 user should know. (Though I got this wrong the first time.)


Then, define a _locked attribute in StringConverter, and prevent  
upgrade to run if self._locked is True.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote:
 On Nov 25, 2008, at 10:02 PM, Ryan May wrote:
 Pierre GM wrote:
 * Your locked version of update won't probably work either, as you  
 force
 the converter to output a string (you set the status to largest
 possible, that's the one that outputs strings). Why don't you set the
 status to the current one (make a tmp one if needed).
 Looking at the code, it looks like mapper is only used in the  
 upgrade()
 method. My goal by setting status to the largest possible is to lock  
 the
 converter to the supplied function.  That way for the user supplied
 converters, the StringConverter doesn't try to upgrade away from  
 it.  My
 thinking was that if the user supplied converter function fails, the
 user should know. (Though I got this wrong the first time.)

 
 Then, define a _locked attribute in StringConverter, and prevent  
 upgrade to run if self._locked is True.

Sure if you're into logic and sound design.  I was going more for 
hackish and obtuse.

(No seriously, I don't know why I didn't think of that.)

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Minimum dtype

2008-11-25 Thread Ryan May
Hi,

I'm running on a 64-bit machine, and see the following:

 numpy.array(64.6).dtype
dtype('float64')

 numpy.array(64).dtype
dtype('int64')

Is there any function/setting to make these default to 32-bit types 
except where necessary? I don't mean by specifying dtype=numpy.float32 
or dtype=numpy.int32.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Minimum dtype

2008-11-25 Thread Robert Kern
On Tue, Nov 25, 2008 at 21:57, Ryan May [EMAIL PROTECTED] wrote:
 Hi,

 I'm running on a 64-bit machine, and see the following:

  numpy.array(64.6).dtype
 dtype('float64')

  numpy.array(64).dtype
 dtype('int64')

 Is there any function/setting to make these default to 32-bit types
 except where necessary?

Nope.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Problems building numpy on solaris 10 x86

2008-11-25 Thread David Cournapeau
Charles R Harris wrote:


 What happens if you go the usual python setup.py {build,install} route?

Won't go far since it does not handle sunperf.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May

Pierre GM wrote:

On Nov 25, 2008, at 10:02 PM, Ryan May wrote:

Pierre GM wrote:
* Your locked version of update won't probably work either, as you  
force

the converter to output a string (you set the status to largest
possible, that's the one that outputs strings). Why don't you set the
status to the current one (make a tmp one if needed).
Looking at the code, it looks like mapper is only used in the  
upgrade()
method. My goal by setting status to the largest possible is to lock  
the

converter to the supplied function.  That way for the user supplied
converters, the StringConverter doesn't try to upgrade away from  
it.  My

thinking was that if the user supplied converter function fails, the
user should know. (Though I got this wrong the first time.)


Updated patch attached.  This includes:
 * Updated docstring
 * New tests
 * Fixes for previous issues
 * Fixes to make new tests actually work

I appreciate any and all feedback.

Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Index: numpy/lib/io.py
===
--- numpy/lib/io.py (revision 6107)
+++ numpy/lib/io.py (working copy)
@@ -233,29 +233,136 @@
 for name in todel:
 os.remove(name)
 
-# Adapted from matplotlib
+def _string_like(obj):
+try: obj + ''
+except (TypeError, ValueError): return False
+return True
 
-def _getconv(dtype):
-typ = dtype.type
-if issubclass(typ, np.bool_):
-return lambda x: bool(int(x))
-if issubclass(typ, np.integer):
-return lambda x: int(float(x))
-elif issubclass(typ, np.floating):
-return float
-elif issubclass(typ, np.complex):
-return complex
+def str2bool(value):
+
+Tries to transform a string supposed to represent a boolean to a boolean.
+
+Raises
+--
+ValueError
+If the string is not 'True' or 'False' (case independent)
+
+value = value.upper()
+if value == 'TRUE':
+return True
+elif value == 'FALSE':
+return False
 else:
-return str
+raise ValueError(Invalid boolean)
 
+class StringConverter(object):
+
+Factory class for function transforming a string into another object (int,
+float).
 
-def _string_like(obj):
-try: obj + ''
-except (TypeError, ValueError): return 0
-return 1
+After initialization, an instance can be called to transform a string 
+into another object. If the string is recognized as representing a missing
+value, a default value is returned.
 
+Parameters
+--
+dtype : dtype, optional
+Input data type, used to define a basic function and a default value
+for missing data. For example, when `dtype` is float, the :attr:`func`
+attribute is set to ``float`` and the default value to `np.nan`.
+missing_values : sequence, optional
+Sequence of strings indicating a missing value.
+
+Attributes
+--
+func : function
+Function used for the conversion
+default : var
+Default value to return when the input corresponds to a missing value.
+mapper : sequence of tuples
+Sequence of tuples (function, default value) to evaluate in order.
+
+
+from numpy.core import nan # To avoid circular import
+mapper = [(str2bool, None),
+  (int, -1), #Needs to be int so that it can fail and promote
+ #to float
+  (float, nan),
+  (complex, nan+0j),
+  (str, '???')]
+
+def __init__(self, dtype=None, missing_values=None):
+self._locked = False
+if dtype is None:
+self.func = str2bool
+self.default = None
+self._status = 0
+else:
+dtype = np.dtype(dtype).type
+if issubclass(dtype, np.bool_):
+(self.func, self.default, self._status) = (str2bool, 0, 0)
+elif issubclass(dtype, np.integer):
+#Needs to be int(float(x)) so that floating point values will
+#be coerced to int when specifid by dtype
+(self.func, self.default, self._status) = (lambda x: 
int(float(x)), -1, 1)
+elif issubclass(dtype, np.floating):
+(self.func, self.default, self._status) = (float, np.nan, 2)
+elif issubclass(dtype, np.complex):
+(self.func, self.default, self._status) = (complex, np.nan + 
0j, 3)
+else:
+(self.func, self.default, self._status) = (str, '???', -1)
+
+# Store the list of strings corresponding to missing values.
+if missing_values is None:
+self.missing_values = []
+else:
+self.missing_values = set(list(missing_values) + [''])
+
+def __call__(self, value):
+if value in self.missing_values:
+return self.default
+return 

[Numpy-discussion] ANN: SciPy 0.7.0b1 (beta release)

2008-11-25 Thread Jarrod Millman
I'm pleased to announce the first beta release of SciPy 0.7.0.

SciPy is a package of tools for science and engineering for Python.
It includes modules for statistics, optimization, integration, linear
algebra, Fourier transforms, signal and image processing, ODE solvers,
and more.

This beta release comes almost one year after the 0.6.0 release and
contains many new features, numerous bug-fixes, improved test
coverage, and better documentation.  Please note that SciPy 0.7.0b1
requires Python 2.4 or greater and NumPy 1.2.0 or greater.

For information, please see the release notes:
http://sourceforge.net/project/shownotes.php?group_id=27747release_id=642769

You can download the release from here:
http://sourceforge.net/project/showfiles.php?group_id=27747package_id=19531release_id=642769

Thank you to everybody who contributed to this release.

Enjoy,

-- 
Jarrod Millman
Computational Infrastructure for Research Labs
10 Giannini Hall, UC Berkeley
phone: 510.643.4014
http://cirl.berkeley.edu/
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] 2D phase unwrapping

2008-11-25 Thread Nadav Horesh

Is there a 2D phase unwrapping for python?
I read a presentation by GERI (http://www.ljmu.ac.uk/GERI) that their code is 
implemented in scipy, but I could not find it.

  Nadav.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Problems building numpy on solaris 10 x86

2008-11-25 Thread David Cournapeau
On Wed, Nov 26, 2008 at 8:54 AM, Peter Norton
[EMAIL PROTECTED] wrote:


 scons: warning: Ignoring missing SConscript
 'build/scons/numpy/core/SConscript'
 File
 /usr/local/python-2.5.1/lib/python2.5/site-packages/numscons-0.9.4-py2.5.egg/numscons/core/numpyenv.py,
 line 108, in DistutilsSConscript
 scons: done reading SConscript files.
 scons: Building targets ...
 scons: *** [Errno 2] No such file or directory:

It could be considered a bug because the error message is bad: the
problem really is the missing scons script (it is not so easy to
handle because scons is in a different process than distutils, so it
is difficult to get useful information back from the scons process).

Which version of numpy are you using ?

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion