Re: [Numpy-discussion] genloadtxt: second serving
Pierre GM wrote: On Dec 4, 2008, at 7:22 AM, Manuel Metz wrote: Will loadtxt in that case remain as is? Or will the _faulttolerantconv class be used? No idea, we need to discuss it. There's a problem with _faulttolerantconv: using np.nan as default value will not work in Python2.6 if the output is to be int, as an exception will be raised. Okay, that's something I did not check. If numpy.nan is converted to 0, it's basically useless -- 0 might be a valid number in the data and can not be distinguished from nan in that case. Here masked arrays is the only sensible approach. So the faulttolerantconv (ftc) class is applicable to floats and complex numbers only. It might nevertheless be useful to use the ftc class since (i) it results in almost no performance loss and (ii) at the same time you get at least a minimum fault tolerance, which can be very useful for many applications. I personally will switch to AstroAsciiData (thanks Jarrod for pointing this out), because that seems to be exactly what I need! Manuel Therefore, we'd need to change the default to something else when defining _faulttolerantconv. The easiest would be to define a class and set the argument at instantiation, but then we're going back dangerously close to StringConverter... ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] subclassing ndarray
Hi all I'm subclassing ndarray following [1] and I'd like to know if i'm doing it right. My goals are - ndarray subclass MyArray with additional methods - replacement for np.array, np.asarray on module level returning MyArray instances - expose new methods as functions on module level import numpy as np class MyArray(np.ndarray): def __new__(cls, arr, **kwargs): return np.asarray(arr, **kwargs).view(dtype=arr.dtype, type=cls) # define new methods here ... def print_shape(self): print self.shape # replace np.array() def array(*args, **kwargs): return MyArray(np.array(*args, **kwargs)) # replace np.asarray() def asarray(*args, **kwargs): return MyArray(*args, **kwargs) # expose array method as function def ps(a): asarray(a).print_shape() Would that work? PS: I found a little error in [1]: In section __new__ and __init__, the class def should read class C(object): def __new__(cls, *args): + print 'cls is:, cls print 'Args in __new__:', args return object.__new__(cls, *args) def __init__(self, *args): + print 'self is:, self print 'Args in __init__:', args [1] http://docs.scipy.org/doc/numpy/user/basics.subclassing.html best, steve ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] checksum on numpy float array
A Friday 05 December 2008, Brennan Williams escrigué: Robert Kern wrote: On Thu, Dec 4, 2008 at 18:54, Brennan Williams [EMAIL PROTECTED] wrote: Thanks [EMAIL PROTECTED] wrote: I didn't check what this does behind the scenes, but try this import hashlib #standard python library import numpy as np m = hashlib.md5() m.update(np.array(range(100))) m.update(np.array(range(200))) I would recommend doing this on the strings before you make arrays from them. You don't know if the network cut out in the middle of an 8-byte double. Of course, sending the lengths and other metadata first, then the data would let you check without needing to do expensivish hashes or checksums. If truncation is your problem rather than corruption, then that would be sufficient. You may also consider using the NPY format in numpy 1.2 to implement that. Thanks for the ideas. I'm definitely going to add some more basic checks on lengths etc as well. Unfortunately the problem is happening at a client site so (a) I can't reproduce it and (b) most of the time they can't reproduce it either. This is a Windows Python app running on Citrix reading/writing data to a Linux networked drive. Another possibility would be to use HDF5 as a data container. It supports the fletcher32 filter [1] which basically computes a chuksum for evey data chunk written to disk and then always check that the data read satifies the checksum kept on-disk. So, if the HDF5 layer doesn't complain, you are basically safe. There are at least two usable HDF5 interfaces for Python and NumPy: PyTables[2] and h5py [3]. PyTables does have support for that right out-of-the-box. Not sure about h5py though (a quick search in docs doesn't reveal nothing). [1] http://rfc.sunsite.dk/rfc/rfc1071.html [2] http://www.pytables.org [3] http://h5py.alfven.org Hope it helps, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] checksum on numpy float array
Another possibility would be to use HDF5 as a data container. It supports the fletcher32 filter [1] which basically computes a chuksum for evey data chunk written to disk and then always check that the data read satifies the checksum kept on-disk. So, if the HDF5 layer doesn't complain, you are basically safe. There are at least two usable HDF5 interfaces for Python and NumPy: PyTables[2] and h5py [3]. PyTables does have support for that right out-of-the-box. Not sure about h5py though (a quick search in docs doesn't reveal nothing). [1] http://rfc.sunsite.dk/rfc/rfc1071.html [2] http://www.pytables.org [3] http://h5py.alfven.org Hope it helps, Just to confirm that h5py does in fact have fletcher32; it's one of the options you can specify when creating a dataset, although it could use better documentation: http://h5py.alfven.org/docs/guide/hl.html#h5py.highlevel.Group.create_dataset Like other checksums, fletcher32 provides error-detection but not error-correction. You'll still need to throw away data which can't be read. However, I believe that you can still read sections of the dataset which aren't corrupted. Andrew Collette ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] genloadtxt : last call
All, Here's the latest version of genloadtxt, with some recent corrections. With just a couple of tweaking, we end up with some decent speed: it's still slower than np.loadtxt, but only 15% so according to the test at the end of the package. And so, now what ? Should I put the module in numpy.lib.io ? Elsewhere ? Thx for any comment and suggestions. Proposal : Here's an extension to np.loadtxt, designed to take missing values into account. import itertools import numpy as np import numpy.ma as ma def _is_string_like(obj): Check whether obj behaves like a string. try: obj + '' except (TypeError, ValueError): return False return True def _to_filehandle(fname, flag='r', return_opened=False): Returns the filehandle corresponding to a string or a file. If the string ends in '.gz', the file is automatically unzipped. Parameters -- fname : string, filehandle Name of the file whose filehandle must be returned. flag : string, optional Flag indicating the status of the file ('r' for read, 'w' for write). return_opened : boolean, optional Whether to return the opening status of the file. if _is_string_like(fname): if fname.endswith('.gz'): import gzip fhd = gzip.open(fname, flag) elif fname.endswith('.bz2'): import bz2 fhd = bz2.BZ2File(fname) else: fhd = file(fname, flag) opened = True elif hasattr(fname, 'seek'): fhd = fname opened = False else: raise ValueError('fname must be a string or file handle') if return_opened: return fhd, opened return fhd def flatten_dtype(ndtype): Unpack a structured data-type. names = ndtype.names if names is None: return [ndtype] else: types = [] for field in names: (typ, _) = ndtype.fields[field] flat_dt = flatten_dtype(typ) types.extend(flat_dt) return types def nested_masktype(datatype): Construct the dtype of a mask for nested elements. names = datatype.names if names: descr = [] for name in names: (ndtype, _) = datatype.fields[name] descr.append((name, nested_masktype(ndtype))) return descr # Is this some kind of composite a la (np.float,2) elif datatype.subdtype: mdescr = list(datatype.subdtype) mdescr[0] = np.dtype(bool) return tuple(mdescr) else: return np.bool class LineSplitter: Defines a function to split a string at a given delimiter or at given places. Parameters -- comment : {'#', string} Character used to mark the beginning of a comment. delimiter : var, optional If a string, character used to delimit consecutive fields. If an integer or a sequence of integers, width(s) of each field. autostrip : boolean, optional Whether to strip each individual fields def autostrip(self, method): Wrapper to strip each member of the output of `method`. return lambda input: [_.strip() for _ in method(input)] # def __init__(self, delimiter=None, comments='#', autostrip=True): self.comments = comments # Delimiter is a character if (delimiter is None) or _is_string_like(delimiter): delimiter = delimiter or None _handyman = self._delimited_splitter # Delimiter is a list of field widths elif hasattr(delimiter, '__iter__'): _handyman = self._variablewidth_splitter idx = np.cumsum([0]+list(delimiter)) delimiter = [slice(i,j) for (i,j) in zip(idx[:-1], idx[1:])] # Delimiter is a single integer elif int(delimiter): (_handyman, delimiter) = (self._fixedwidth_splitter, int(delimiter)) else: (_handyman, delimiter) = (self._delimited_splitter, None) self.delimiter = delimiter if autostrip: self._handyman = self.autostrip(_handyman) else: self._handyman = _handyman # def _delimited_splitter(self, line): line = line.split(self.comments)[0].strip() if not line: return [] return line.split(self.delimiter) # def _fixedwidth_splitter(self, line): line = line.split(self.comments)[0] if not line: return [] fixed = self.delimiter slices = [slice(i, i+fixed) for i in range(len(line))[::fixed]] return [line[s] for s in slices] # def _variablewidth_splitter(self, line): line = line.split(self.comments)[0] if not line: return [] slices = self.delimiter return [line[s] for s in slices] # def __call__(self, line): return self._handyman(line) class
Re: [Numpy-discussion] ANNOUNCE: EPD with Py2.5 version 4.0.30002 RC2 available for testing
On Mon, Dec 1, 2008 at 10:30 AM, Darren Dale [EMAIL PROTECTED] wrote: On Mon, Dec 1, 2008 at 3:12 AM, Gael Varoquaux [EMAIL PROTECTED] wrote: On Mon, Dec 01, 2008 at 12:44:10PM +0900, David Cournapeau wrote: On Mon, Dec 1, 2008 at 7:00 AM, Darren Dale [EMAIL PROTECTED] wrote: I tried installing 4.0.300x on a machine running 64-bit windows vista home edition and ran into problems with PyQt and some related packages. So I uninstalled all the python-related software, EPD took over 30 minutes to uninstall, and tried to install EPD 4.1 beta. My guess is that EPD is only 32 bits installer, so that you run it on WOW (Windows in Windows) on windows 64, which is kind of slow (but usable for most tasks). On top of that, Vista is not supported with EPD. I had a chat with the EPD guys about that, and they say it does work with Vista... most of the time. They don't really understand the failures, and haven't had time to investigate much, because so far professionals and labs are simply avoiding Vista. Hopefully someone from the EPD team will give a more accurate answer soon. Thanks Gael and David. I would avoid windows altogether if I could. When I bought a new laptop I had the option to pay extra to downgrade to XP pro, I should have done some more research before I settled for Vista. In the meantime I'll borrow an XP machine when I need to build python package installers for windows. Hopefully a solution can be found at some point for python and Vista. Losing compatibility on such a major platform will become increasingly problematic. I just wanted to follow up, it looks like the Vista installation issues have been ironed out with the release of python-2.6.1. I was able to install 32-bit python-2.6.1 from the msi file distributed at python.org in a straight-forward manner, no need to mess around with user account controls or other such nonsense. I even have setuptools working with python 2.6, I built and installed a setuptools msi without much trouble (distutils just doesnt like setuptools version numbering). One pleasant surprise: python-2.6 is built with visual C++ 2008, which has a free express edition available so building python extension modules might be a little more convenient than it was in the past. Darren ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion