Re: [Numpy-discussion] genloadtxt: second serving

2008-12-05 Thread Manuel Metz
Pierre GM wrote:
 On Dec 4, 2008, at 7:22 AM, Manuel Metz wrote:
 Will loadtxt in that case remain as is? Or will the _faulttolerantconv
 class be used?
 
 No idea, we need to discuss it. There's a problem with  
 _faulttolerantconv: using np.nan as default value will not work in  
 Python2.6 if the output is to be int, as an exception will be raised.  

Okay, that's something I did not check. If numpy.nan is converted to 0, 
it's basically useless -- 0 might be a valid number in the data and can 
not be distinguished from nan in that case. Here masked arrays is the 
only sensible approach. So the faulttolerantconv (ftc) class is 
applicable to floats and complex numbers only.
   It might nevertheless be useful to use the ftc class since (i) it 
results in almost no performance loss and (ii) at the same time you get 
at least a minimum fault tolerance, which can be very useful for many 
applications.

   I personally will switch to AstroAsciiData (thanks Jarrod for 
pointing this out), because that seems to be exactly what I need!

Manuel

 Therefore, we'd need to change the default to something else when  
 defining _faulttolerantconv. The easiest would be to define a class  
 and set the argument at instantiation, but then we're going back  
 dangerously close to StringConverter...



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] subclassing ndarray

2008-12-05 Thread Steve Schmerler
Hi all

I'm subclassing ndarray following [1] and I'd like to know if i'm doing
it right. My goals are

- ndarray subclass MyArray with additional methods
- replacement for np.array, np.asarray on module level returning MyArray
  instances
- expose new methods as functions on module level

import numpy as np

class MyArray(np.ndarray):  
  
def __new__(cls, arr, **kwargs):   
return np.asarray(arr, **kwargs).view(dtype=arr.dtype, type=cls)

# define new methods here ...   
  
def print_shape(self):  
  
print self.shape
  

  
# replace np.array()
  
def array(*args, **kwargs): 
  
return MyArray(np.array(*args, **kwargs))   
  

  
# replace np.asarray()
def asarray(*args, **kwargs):   
  
return MyArray(*args, **kwargs) 
  

  
# expose array method as function   
  
def ps(a):  
  
asarray(a).print_shape() 

Would that work?

PS: I found a little error in [1]:

In section __new__ and __init__, the class def should read

class C(object):
def __new__(cls, *args):
+   print 'cls is:, cls
print 'Args in __new__:', args
return object.__new__(cls, *args)
def __init__(self, *args):
+   print 'self is:, self
print 'Args in __init__:', args


[1] http://docs.scipy.org/doc/numpy/user/basics.subclassing.html

best,
steve
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] checksum on numpy float array

2008-12-05 Thread Francesc Alted
A Friday 05 December 2008, Brennan Williams escrigué:
 Robert Kern wrote:
  On Thu, Dec 4, 2008 at 18:54, Brennan Williams
 
  [EMAIL PROTECTED] wrote:
  Thanks
 
  [EMAIL PROTECTED] wrote:
  I didn't check what this does behind the scenes, but try this
 
  import hashlib #standard python library
  import numpy as np
 
  m = hashlib.md5()
  m.update(np.array(range(100)))
  m.update(np.array(range(200)))
 
  I would recommend doing this on the strings before you make arrays
  from them. You don't know if the network cut out in the middle of
  an 8-byte double.
 
  Of course, sending the lengths and other metadata first, then the
  data would let you check without needing to do expensivish hashes
  or checksums. If truncation is your problem rather than corruption,
  then that would be sufficient. You may also consider using the NPY
  format in numpy 1.2 to implement that.

 Thanks for the ideas. I'm definitely going to add some more basic
 checks on lengths etc as well.
 Unfortunately the problem is happening at a client site  so  (a) I
 can't reproduce it and (b) most of the
 time they can't reproduce it either. This is a Windows Python app
 running on Citrix reading/writing data
 to a Linux networked drive.

Another possibility would be to use HDF5 as a data container.  It 
supports the fletcher32 filter [1] which basically computes a chuksum 
for evey data chunk written to disk and then always check that the data 
read satifies the checksum kept on-disk.  So, if the HDF5 layer doesn't 
complain, you are basically safe.

There are at least two usable HDF5 interfaces for Python and NumPy: 
PyTables[2] and h5py [3].  PyTables does have support for that right 
out-of-the-box.  Not sure about h5py though (a quick search in docs 
doesn't reveal nothing).

[1] http://rfc.sunsite.dk/rfc/rfc1071.html
[2] http://www.pytables.org
[3] http://h5py.alfven.org

Hope it helps,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] checksum on numpy float array

2008-12-05 Thread Andrew Collette

 Another possibility would be to use HDF5 as a data container.  It 
 supports the fletcher32 filter [1] which basically computes a chuksum 
 for evey data chunk written to disk and then always check that the data 
 read satifies the checksum kept on-disk.  So, if the HDF5 layer doesn't 
 complain, you are basically safe.
 
 There are at least two usable HDF5 interfaces for Python and NumPy: 
 PyTables[2] and h5py [3].  PyTables does have support for that right 
 out-of-the-box.  Not sure about h5py though (a quick search in docs 
 doesn't reveal nothing).
 
 [1] http://rfc.sunsite.dk/rfc/rfc1071.html
 [2] http://www.pytables.org
 [3] http://h5py.alfven.org
 
 Hope it helps,
 

Just to confirm that h5py does in fact have fletcher32; it's one of the
options you can specify when creating a dataset, although it could use
better documentation:

http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.create_dataset

Like other checksums, fletcher32 provides error-detection but not
error-correction.  You'll still need to throw away data which can't be
read.  However, I believe that you can still read sections of the
dataset which aren't corrupted.

Andrew Collette

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] genloadtxt : last call

2008-12-05 Thread Pierre GM

All,
Here's the latest version of genloadtxt, with some recent corrections.  
With just a couple of tweaking, we end up with some decent speed: it's  
still slower than np.loadtxt, but only 15% so according to the test at  
the end of the package.


And so, now what ? Should I put the module in numpy.lib.io ? Elsewhere ?

Thx for any comment and suggestions.


Proposal : 
Here's an extension to np.loadtxt, designed to take missing values into account.





import itertools
import numpy as np
import numpy.ma as ma


def _is_string_like(obj):

Check whether obj behaves like a string.

try:
obj + ''
except (TypeError, ValueError):
return False
return True

def _to_filehandle(fname, flag='r', return_opened=False):

Returns the filehandle corresponding to a string or a file.
If the string ends in '.gz', the file is automatically unzipped.

Parameters
--
fname : string, filehandle
Name of the file whose filehandle must be returned.
flag : string, optional
Flag indicating the status of the file ('r' for read, 'w' for write).
return_opened : boolean, optional
Whether to return the opening status of the file.

if _is_string_like(fname):
if fname.endswith('.gz'):
import gzip
fhd = gzip.open(fname, flag)
elif fname.endswith('.bz2'):
import bz2
fhd = bz2.BZ2File(fname)
else:
fhd = file(fname, flag)
opened = True
elif hasattr(fname, 'seek'):
fhd = fname
opened = False
else:
raise ValueError('fname must be a string or file handle')
if return_opened:
return fhd, opened
return fhd


def flatten_dtype(ndtype):

Unpack a structured data-type.


names = ndtype.names
if names is None:
return [ndtype]
else:
types = []
for field in names:
(typ, _) = ndtype.fields[field]
flat_dt = flatten_dtype(typ)
types.extend(flat_dt)
return types


def nested_masktype(datatype):

Construct the dtype of a mask for nested elements.


names = datatype.names
if names:
descr = []
for name in names:
(ndtype, _) = datatype.fields[name]
descr.append((name, nested_masktype(ndtype)))
return descr
# Is this some kind of composite a la (np.float,2)
elif datatype.subdtype:
mdescr = list(datatype.subdtype)
mdescr[0] = np.dtype(bool)
return tuple(mdescr)
else:
return np.bool



class LineSplitter:

Defines a function to split a string at a given delimiter or at given places.

Parameters
--
comment : {'#', string}
Character used to mark the beginning of a comment.
delimiter : var, optional
If a string, character used to delimit consecutive fields.
If an integer or a sequence of integers, width(s) of each field.
autostrip : boolean, optional
Whether to strip each individual fields


def autostrip(self, method):
Wrapper to strip each member of the output of `method`.
return lambda input: [_.strip() for _ in method(input)]
#
def __init__(self, delimiter=None, comments='#', autostrip=True):
self.comments = comments
# Delimiter is a character
if (delimiter is None) or _is_string_like(delimiter):
delimiter = delimiter or None
_handyman = self._delimited_splitter
# Delimiter is a list of field widths
elif hasattr(delimiter, '__iter__'):
_handyman = self._variablewidth_splitter
idx = np.cumsum([0]+list(delimiter))
delimiter = [slice(i,j) for (i,j) in zip(idx[:-1], idx[1:])]
# Delimiter is a single integer
elif int(delimiter):
(_handyman, delimiter) = (self._fixedwidth_splitter, int(delimiter))
else:
(_handyman, delimiter) = (self._delimited_splitter, None)
self.delimiter = delimiter
if autostrip:
self._handyman = self.autostrip(_handyman)
else:
self._handyman = _handyman
#
def _delimited_splitter(self, line):
line = line.split(self.comments)[0].strip()
if not line:
return []
return line.split(self.delimiter)
#
def _fixedwidth_splitter(self, line):
line = line.split(self.comments)[0]
if not line:
return []
fixed = self.delimiter
slices = [slice(i, i+fixed) for i in range(len(line))[::fixed]]
return [line[s] for s in slices]
#
def _variablewidth_splitter(self, line):
line = line.split(self.comments)[0]
if not line:
return []
slices = self.delimiter
return [line[s] for s in slices]
#
def __call__(self, line):
return self._handyman(line)



class 

Re: [Numpy-discussion] ANNOUNCE: EPD with Py2.5 version 4.0.30002 RC2 available for testing

2008-12-05 Thread Darren Dale
On Mon, Dec 1, 2008 at 10:30 AM, Darren Dale [EMAIL PROTECTED] wrote:



 On Mon, Dec 1, 2008 at 3:12 AM, Gael Varoquaux 
 [EMAIL PROTECTED] wrote:

 On Mon, Dec 01, 2008 at 12:44:10PM +0900, David Cournapeau wrote:
  On Mon, Dec 1, 2008 at 7:00 AM, Darren Dale [EMAIL PROTECTED] wrote:
   I tried installing 4.0.300x on a machine running 64-bit windows vista
 home
   edition and ran into problems with PyQt and some related packages. So
 I
   uninstalled all the python-related software, EPD took over 30 minutes
 to
   uninstall, and tried to install EPD 4.1 beta.

  My guess is that EPD is only 32 bits installer, so that you run it on
  WOW (Windows in Windows) on windows 64, which is kind of slow (but
  usable for most tasks).

 On top of that, Vista is not supported with EPD. I had a chat with the
 EPD guys about that, and they say it does work with Vista... most of the
 time. They don't really understand the failures, and haven't had time to
 investigate much, because so far professionals and labs are simply
 avoiding Vista. Hopefully someone from the EPD team will give a more
 accurate answer
 soon.


 Thanks Gael and David. I would avoid windows altogether if I could. When I
 bought a new laptop I had the option to pay extra to downgrade to XP pro, I
 should have done some more research before I settled for Vista. In the
 meantime I'll borrow an XP machine when I need to build python package
 installers for windows.

 Hopefully a solution can be found at some point for python and Vista.
 Losing compatibility on such a major platform will become increasingly
 problematic.


I just wanted to follow up, it looks like the Vista installation issues have
been ironed out with the release of python-2.6.1. I was able to install
32-bit python-2.6.1 from the msi file distributed at python.org in a
straight-forward manner, no need to mess around with user account controls
or other such nonsense. I even have setuptools working with python 2.6, I
built and installed a setuptools msi without much trouble (distutils just
doesnt like setuptools version numbering).

One pleasant surprise: python-2.6 is built with visual C++ 2008, which has a
free express edition available so building python extension modules might be
a little more convenient than it was in the past.

Darren
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion