Re: [Numpy-discussion] ANNOUNCE: EPD with Py2.5 version 4.0.30002 RC2 available for testing
On Mon, Dec 01, 2008 at 12:44:10PM +0900, David Cournapeau wrote: On Mon, Dec 1, 2008 at 7:00 AM, Darren Dale [EMAIL PROTECTED] wrote: I tried installing 4.0.300x on a machine running 64-bit windows vista home edition and ran into problems with PyQt and some related packages. So I uninstalled all the python-related software, EPD took over 30 minutes to uninstall, and tried to install EPD 4.1 beta. My guess is that EPD is only 32 bits installer, so that you run it on WOW (Windows in Windows) on windows 64, which is kind of slow (but usable for most tasks). On top of that, Vista is not supported with EPD. I had a chat with the EPD guys about that, and they say it does work with Vista... most of the time. They don't really understand the failures, and haven't had time to investigate much, because so far professionals and labs are simply avoiding Vista. Hopefully someone from the EPD team will give a more accurate answer soon. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] optimising single value functions for array calculations
Hello, I am developing a module which bases its calculations on another specialised module. My module uses numpy arrays a lot. The problem is that the other module I am building upon, does not work with (whole) arrays but with single values. Therefore, I am currently forces to loop over the array: ### a = numpy.arange(100) b = numpy.arange(100,200) for i in range(0,a.size): a[i] = myfunc(a[i])* b[i] ### The results come out well. But the problem is that this way of calculation is very ineffiecent and takes time. May anyone give me a hint on how I can improve my code without having to modify the package I am building upon. I do not want to change it a lot because I would always have to run behind the chnages in the other package. To summarise: How to I make a calculation function array-aware? Thanks in advance, Timmie ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] optimising single value functions for array calculations
Hello Timmie, numpy.vectorize(myfunc) should do what you want. Cheers, Emmanuelle Hello, I am developing a module which bases its calculations on another specialised module. My module uses numpy arrays a lot. The problem is that the other module I am building upon, does not work with (whole) arrays but with single values. Therefore, I am currently forces to loop over the array: ### a = numpy.arange(100) b = numpy.arange(100,200) for i in range(0,a.size): a[i] = myfunc(a[i])* b[i] ### The results come out well. But the problem is that this way of calculation is very ineffiecent and takes time. May anyone give me a hint on how I can improve my code without having to modify the package I am building upon. I do not want to change it a lot because I would always have to run behind the chnages in the other package. To summarise: How to I make a calculation function array-aware? Thanks in advance, Timmie ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] optimising single value functions for array calculations
I does not solve the slowness problem. I think I read on the list about an experimental code for fast vectorization. Nadav. -הודעה מקורית- מאת: [EMAIL PROTECTED] בשם Emmanuelle Gouillart נשלח: ב 01-דצמבר-08 12:28 אל: Discussion of Numerical Python נושא: Re: [Numpy-discussion] optimising single value functions for array calculations Hello Timmie, numpy.vectorize(myfunc) should do what you want. Cheers, Emmanuelle Hello, I am developing a module which bases its calculations on another specialised module. My module uses numpy arrays a lot. The problem is that the other module I am building upon, does not work with (whole) arrays but with single values. Therefore, I am currently forces to loop over the array: ### a = numpy.arange(100) b = numpy.arange(100,200) for i in range(0,a.size): a[i] = myfunc(a[i])* b[i] ### The results come out well. But the problem is that this way of calculation is very ineffiecent and takes time. May anyone give me a hint on how I can improve my code without having to modify the package I am building upon. I do not want to change it a lot because I would always have to run behind the chnages in the other package. To summarise: How to I make a calculation function array-aware? Thanks in advance, Timmie ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion winmail.dat___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] optimising single value functions for array calculations
2008/12/1 Timmie [EMAIL PROTECTED]: Hello, I am developing a module which bases its calculations on another specialised module. My module uses numpy arrays a lot. The problem is that the other module I am building upon, does not work with (whole) arrays but with single values. Therefore, I am currently forces to loop over the array: ### a = numpy.arange(100) b = numpy.arange(100,200) for i in range(0,a.size): a[i] = myfunc(a[i])* b[i] ### Hi, Safe from using numpy functions inside myfunc(), numpy has no way of optimizing your computation. vectorize() will help you to have a clean interface, but it will not enhance speed. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] memmap dtype issue
For a long time now, numpy's memmap has me puzzled by its behavior. When I use memmap straightforward on a file it seems to work fine, but whenever I try to do a memmap using a dtype it seems to gobble up the whole file into memory. This, of course, makes the use of memmap futile. I would expect that the result of such an operation would give me a true memmap and that the data would be converted to dtype on the fly. I've seen this behavior in version version 1.04, 1.1.1 and still in 1.2.1. I'm working on Windows haven't tried it on Linux. Am I doing something wrong? Are my expectations wrong? Or is this an issue somewhere deeper in numpy? I looked at the memmap.py and it seems to me that most of the work is delegated to numpy.ndarray.__new__. Something wrong there maybe? Can somebody help please? Thanks! Regards, Wim Bakker ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] optimising single value functions for array calculations
2008/12/1 Nadav Horesh [EMAIL PROTECTED]: I does not solve the slowness problem. I think I read on the list about an experimental code for fast vectorization. The choices are basically weave, fast_vectorize (http://projects.scipy.org/scipy/scipy/ticket/727), ctypes, cython or f2py. Any I left out? Ilan's fast_vectorize should have been included in SciPy a while ago already. Volunteers for patch review? Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] optimising single value functions fo r array calculations
Hi, thanks for all your answers. I will certainly test it. numpy.vectorize(myfunc) should do what you want. Just to add a better example based on a recent discussion here on this list [1]: myfunc(x): res = math.sin(x) return res a = numpy.arange(1,20) = myfunc(a) will not work. = myfunc need to have a possibility to pass single values to math.sin either through interation (see my inital email) or through other options. (I know that numpy has a array aware sinus but wanted to use it as an example here.) My orriginal problem evolves from here timeseries computing [2]. Well, I will test and report back further. Thanks again and until soon, Timmie [1]: http://thread.gmane.org/gmane.comp.python.numeric.general/26417/focus=26418 [2]: http://thread.gmane.org/gmane.comp.python.scientific.user/18253 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] memmap dtype issue
Wim Bakker wrote: For a long time now, numpy's memmap has me puzzled by its behavior. When I use memmap straightforward on a file it seems to work fine, but whenever I try to do a memmap using a dtype it seems to gobble up the whole file into memory. I don't understand your question. From my experience, the memmap is working fine. Please post and example that illustrates your point. This, of course, makes the use of memmap futile. I would expect that the result of such an operation would give me a true memmap and that the data would be converted to dtype on the fly. There is no conversion on the fly when you use memmap. You construct an array of the same data-type as is in the file and then manipulate portions of it as needed. Am I doing something wrong? Are my expectations wrong? My guess is that your expectations are not accurate, but example code would help sort it out. Best regards, -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANNOUNCE: EPD with Py2.5 version 4.0.30002 RC2 available for testing
On Mon, Dec 1, 2008 at 3:12 AM, Gael Varoquaux [EMAIL PROTECTED] wrote: On Mon, Dec 01, 2008 at 12:44:10PM +0900, David Cournapeau wrote: On Mon, Dec 1, 2008 at 7:00 AM, Darren Dale [EMAIL PROTECTED] wrote: I tried installing 4.0.300x on a machine running 64-bit windows vista home edition and ran into problems with PyQt and some related packages. So I uninstalled all the python-related software, EPD took over 30 minutes to uninstall, and tried to install EPD 4.1 beta. My guess is that EPD is only 32 bits installer, so that you run it on WOW (Windows in Windows) on windows 64, which is kind of slow (but usable for most tasks). On top of that, Vista is not supported with EPD. I had a chat with the EPD guys about that, and they say it does work with Vista... most of the time. They don't really understand the failures, and haven't had time to investigate much, because so far professionals and labs are simply avoiding Vista. Hopefully someone from the EPD team will give a more accurate answer soon. Thanks Gael and David. I would avoid windows altogether if I could. When I bought a new laptop I had the option to pay extra to downgrade to XP pro, I should have done some more research before I settled for Vista. In the meantime I'll borrow an XP machine when I need to build python package installers for windows. Hopefully a solution can be found at some point for python and Vista. Losing compatibility on such a major platform will become increasingly problematic. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] np.loadtxt : yet a new implementation...
All, Please find attached to this message another implementation of np.loadtxt, which focuses on missing values. It's basically a combination of John Hunter's et al mlab.csv2rec, Ryan May's patches and pieces of code I'd been working on over the last few weeks. Besides some helper classes (StringConverter to convert a string into something else, NameValidator to check names..._), you'll find 3 functions: * `genloadtxt` is the base function that makes all the work. It outputs 2 arrays, one for the data (missing values being substituted by the appropriate default) and one for the mask. It would go in np.lib.io * `loadtxt` would replace the current np.loadtxt. It outputs a ndarray, where missing data being filled. It would also go in np.lib.io * `mloadtxt` would go into np.ma.io (to be created) and renamed `loadtxt`. Right now, I needed a different name to avoid conflicts. It combines the outputs of `genloadtxt` into a single masked array. You'll also several series of tests, that you can use as examples. Please give it a try and send me some feedback (bugs, wishes, suggestions). I'd like it to make the 1.3.0 release (I need some of the functionalities to improve the corresponding function in scikits.timeseries, currently fubar...) P. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.loadtxt : yet a new implementation...
And now for the tests: Proposal : Here's an extension to np.loadtxt, designed to take missing values into account. from genload_proposal import * from numpy.ma.testutils import * import StringIO class TestLineSplitter(TestCase): # def test_nodelimiter(self): Test LineSplitter w/o delimiter strg = 1 2 3 4 5 # test test = LineSplitter(' ')(strg) assert_equal(test, ['1', '2', '3', '4', '5']) test = LineSplitter()(strg) assert_equal(test, ['1', '2', '3', '4', '5']) # def test_delimiter(self): Test LineSplitter on delimiter strg = 1,2,3,4,,5 test = LineSplitter(',')(strg) assert_equal(test, ['1', '2', '3', '4', '', '5']) # strg = 1,2,3,4,,5 # test test = LineSplitter(',')(strg) assert_equal(test, ['1', '2', '3', '4', '', '5']) # strg = 1 2 3 4 5 # test test = LineSplitter(' ')(strg) assert_equal(test, ['1', '2', '3', '4', '5']) # def test_fixedwidth(self): Test LineSplitter w/ fixed-width fields strg = 1 2 3 4 5 # test test = LineSplitter(3)(strg) assert_equal(test, ['1', '2', '3', '4', '', '5', '']) # strg = 1 3 4 5 6# test test = LineSplitter((3,6,6,3))(strg) assert_equal(test, ['1', '3', '4 5', '6']) # strg = 1 3 4 5 6# test test = LineSplitter((6,6,9))(strg) assert_equal(test, ['1', '3 4', '5 6']) # strg = 1 3 4 5 6# test test = LineSplitter(20)(strg) assert_equal(test, ['1 3 4 5 6']) # strg = 1 3 4 5 6# test test = LineSplitter(30)(strg) assert_equal(test, ['1 3 4 5 6']) class TestStringConverter(TestCase): Test StringConverter # def test_creation(self): Test creation of a StringConverter converter = StringConverter(int, -9) assert_equal(converter._status, 1) assert_equal(converter.default, -9) # def test_upgrade(self): Tests the upgrade method. converter = StringConverter() assert_equal(converter._status, 0) converter.upgrade('0') assert_equal(converter._status, 1) converter.upgrade('0.') assert_equal(converter._status, 2) converter.upgrade('0j') assert_equal(converter._status, 3) converter.upgrade('a') assert_equal(converter._status, len(converter._mapper)-1) # def test_missing(self): Tests the use of missing values. converter = StringConverter(missing_values=('missing','missed')) converter.upgrade('0') assert_equal(converter('0'), 0) assert_equal(converter(''), converter.default) assert_equal(converter('missing'), converter.default) assert_equal(converter('missed'), converter.default) try: converter('miss') except ValueError: pass # def test_upgrademapper(self): Tests updatemapper import dateutil.parser import datetime dateparser = dateutil.parser.parse StringConverter.upgrade_mapper(dateparser, datetime.date(2000,1,1)) convert = StringConverter(dateparser, datetime.date(2000, 1, 1)) test = convert('2001-01-01') assert_equal(test, datetime.datetime(2001, 01, 01, 00, 00, 00)) class TestLoadTxt(TestCase): # def test_record(self): Test w/ explicit dtype data = StringIO.StringIO('1 2\n3 4') #data.seek(0) test = loadtxt(data, dtype=[('x', np.int32), ('y', np.int32)]) control = np.array([(1, 2), (3, 4)], dtype=[('x', 'i4'), ('y', 'i4')]) assert_equal(test, control) # data = StringIO.StringIO('M 64.0 75.0\nF 25.0 60.0') #data.seek(0) descriptor = {'names': ('gender','age','weight'), 'formats': ('S1', 'i4', 'f4')} control = np.array([('M', 64.0, 75.0), ('F', 25.0, 60.0)], dtype=descriptor) test = loadtxt(data, dtype=descriptor) assert_equal(test, control) def test_array(self): Test outputing a standard ndarray data = StringIO.StringIO('1 2\n3 4') control = np.array([[1,2],[3,4]], dtype=int) test = loadtxt(data, dtype=int) assert_array_equal(test, control) # data.seek(0) control = np.array([[1,2],[3,4]], dtype=float) test = np.loadtxt(data, dtype=float) assert_array_equal(test, control) def test_1D(self): Test squeezing to 1D control = np.array([1, 2, 3, 4], int) # data = StringIO.StringIO('1\n2\n3\n4\n') test = loadtxt(data, dtype=int) assert_array_equal(test, control) # data = StringIO.StringIO('1,2,3,4\n') test = loadtxt(data, dtype=int, delimiter=',')
Re: [Numpy-discussion] np.loadtxt : yet a new implementation...
2008/12/1 Pierre GM [EMAIL PROTECTED]: Please find attached to this message another implementation of Struggling to comply! Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.loadtxt : yet a new implementation...
Well, looks like the attachment is too big, so here's the implementation. The tests will come in another message. Proposal : Here's an extension to np.loadtxt, designed to take missing values into account. import itertools import numpy as np import numpy.ma as ma def _is_string_like(obj): Check whether obj behaves like a string. try: obj + '' except (TypeError, ValueError): return False return True def _to_filehandle(fname, flag='r', return_opened=False): Returns the filehandle corresponding to a string or a file. If the string ends in '.gz', the file is automatically unzipped. Parameters -- fname : string, filehandle Name of the file whose filehandle must be returned. flag : string, optional Flag indicating the status of the file ('r' for read, 'w' for write). return_opened : boolean, optional Whether to return the opening status of the file. if _is_string_like(fname): if fname.endswith('.gz'): import gzip fhd = gzip.open(fname, flag) elif fname.endswith('.bz2'): import bz2 fhd = bz2.BZ2File(fname) else: fhd = file(fname, flag) opened = True elif hasattr(fname, 'seek'): fhd = fname opened = False else: raise ValueError('fname must be a string or file handle') if return_opened: return fhd, opened return fhd def flatten_dtype(ndtype): Unpack a structured data-type. names = ndtype.names if names is None: return [ndtype] else: types = [] for field in names: (typ, _) = ndtype.fields[field] flat_dt = flatten_dtype(typ) types.extend(flat_dt) return types def nested_masktype(datatype): Construct the dtype of a mask for nested elements. names = datatype.names if names: descr = [] for name in names: (ndtype, _) = datatype.fields[name] descr.append((name, nested_masktype(ndtype))) return descr # Is this some kind of composite a la (np.float,2) elif datatype.subdtype: mdescr = list(datatype.subdtype) mdescr[0] = np.dtype(bool) return tuple(mdescr) else: return np.bool class LineSplitter: Defines a function to split a string at a given delimiter or at given places. Parameters -- comment : {'#', string} Character used to mark the beginning of a comment. delimiter : var def __init__(self, delimiter=None, comments='#'): self.comments = comments # Delimiter is a character if delimiter is None: self._isfixed = False self.delimiter = None elif _is_string_like(delimiter): self._isfixed = False self.delimiter = delimiter.strip() or None # Delimiter is a list of field widths elif hasattr(delimiter, '__iter__'): self._isfixed = True idx = np.cumsum([0]+list(delimiter)) self.slices = [slice(i,j) for (i,j) in zip(idx[:-1], idx[1:])] # Delimiter is a single integer elif int(delimiter): self._isfixed = True self.slices = None self.delimiter = delimiter else: self._isfixed = False self.delimiter = None # def __call__(self, line): # Strip the comments line = line.split(self.comments)[0] if not line: return [] # Fixed-width fields if self._isfixed: # Fields have different widths if self.slices is None: fixed = self.delimiter slices = [slice(i, i+fixed) for i in range(len(line))[::fixed]] else: slices = self.slices return [line[s].strip() for s in slices] else: return [s.strip() for s in line.split(self.delimiter)] Splits the line at each current delimiter. Comments are stripped beforehand. class NameValidator: Validates a list of strings to use as field names. The strings are stripped of any non alphanumeric character, and spaces are replaced by `_`. During instantiation, the user can define a list of names to exclude, as well as a list of invalid characters. Names in the exclude list are appended a '_' character. Once an instance has been created, it can be called with a list of names and a list of valid names will be created. The `__call__` method accepts an optional keyword, `default`, that sets the default name in case of ambiguity. By default, `default = 'f'`, so that names will default to `f0`, `f1` Parameters -- excludelist : sequence, optional A list of names to
Re: [Numpy-discussion] np.loadtxt : yet a new implementation...
On Mon, Dec 1, 2008 at 12:21 PM, Pierre GM [EMAIL PROTECTED] wrote: Well, looks like the attachment is too big, so here's the implementation. The tests will come in another message.\ It looks like I am doing something wrong -- trying to parse a CSV file with dates formatted like '2008-10-14', with:: import datetime, sys import dateutil.parser StringConverter.upgrade_mapper(dateutil.parser.parse, default=datetime.date(1900,1,1)) r = loadtxt(sys.argv[1], delimiter=',', names=True) print r.dtype I get the following:: Traceback (most recent call last): File genload_proposal.py, line 734, in ? r = loadtxt(sys.argv[1], delimiter=',', names=True) File genload_proposal.py, line 711, in loadtxt (output, _) = genloadtxt(fname, **kwargs) File genload_proposal.py, line 646, in genloadtxt rows[i] = tuple([conv(val) for (conv, val) in zip(converters, vals)]) File genload_proposal.py, line 385, in __call__ raise ValueError(Cannot convert string '%s' % value) ValueError: Cannot convert string '2008-10-14' In debug mode, I see the following where the error occurs ipdb vals ('2008-10-14', '116.26', '116.40', '103.14', '104.08', '70749800', '104.08') ipdb converters [__main__.StringConverter instance at 0xa35fa6c, __main__.StringConverter instance at 0xa35ff2c, __main__.StringConverter instance at 0xa35ff8c, __main__.StringConverter instance at 0xa35ffec, __main__.StringConverter instance at 0xa15406c, __main__.StringConverter instance at 0xa1540cc, __main__.StringConverter instance at 0xa15412c] It looks like my registry of a custom converter isn't working. Here is what the _mapper looks like:: In [23]: StringConverter._mapper Out[23]: [(type 'numpy.bool_', function str2bool at 0xa2b8bc4, None), (type 'numpy.integer', type 'int', -1), (type 'numpy.floating', type 'float', -NaN), (type 'complex', type 'complex', (-NaN+0j)), (type 'numpy.object_', function parse at 0x8cf1534, datetime.date(1900, 1, 1)), (type 'numpy.string_', type 'str', '???')] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Fwd: np.loadtxt : yet a new implementation...
(Sorry about that, I pressed Reply instead of Reply all. Not my day for emails...) On Dec 1, 2008, at 1:54 PM, John Hunter wrote: It looks like I am doing something wrong -- trying to parse a CSV file with dates formatted like '2008-10-14', with:: import datetime, sys import dateutil.parser StringConverter.upgrade_mapper(dateutil.parser.parse, default=datetime.date(1900,1,1)) r = loadtxt(sys.argv[1], delimiter=',', names=True) John, The problem you have is that the default dtype is 'float' (for backwards compatibility w/ the original np.loadtxt). What you want is to automatically change the dtype according to the content of your file: you should use dtype=None r = loadtxt(sys.argv[1], delimiter=',', names=True, dtype=None) As you'll want a recarray, we could make a np.records.loadtxt function where dtype=None would be the default... ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fwd: np.loadtxt : yet a new implementation...
On Mon, Dec 1, 2008 at 1:14 PM, Pierre GM [EMAIL PROTECTED] wrote: The problem you have is that the default dtype is 'float' (for backwards compatibility w/ the original np.loadtxt). What you want is to automatically change the dtype according to the content of your file: you should use dtype=None r = loadtxt(sys.argv[1], delimiter=',', names=True, dtype=None) As you'll want a recarray, we could make a np.records.loadtxt function where dtype=None would be the default... As you'll want a recarray, we could make a np.records.loadtxt function where dtype=None would be the default... OK, that worked great. I do think some a default impl in np.rec which returned a recarray would be nice. It might also be nice to have a method like np.rec.fromcsv which defaults to a delimiter=',', names=True and dtype=None. Since csv is one of the most common data interchange format in the world, it would be nice to have some obvious function that works with it with little or no customization required. Fernando and I have taught a scientific computing course on a number of occasions, and on the last round we taught to undergrads. Most of these students have little or no programming, for many the concept of an array is something they struggle with, dtypes are a difficult concept, but we found that they responded very well to our csv2rec example, because with no syntactic cruft they were able to load a file and do some stats on the columns, and I would like to see that ease of use preserved. JDH ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fwd: np.loadtxt : yet a new implementation...
On Dec 1, 2008, at 2:26 PM, John Hunter wrote OK, that worked great. I do think some a default impl in np.rec which returned a recarray would be nice. It might also be nice to have a method like np.rec.fromcsv which defaults to a delimiter=',', names=True and dtype=None. Since csv is one of the most common data interchange format in the world, it would be nice to have some obvious function that works with it with little or no customization required. Quite agreed. Personally, I'd ditch the default dtype=float in favor of dtype=None, but compatibility is an issue. However, if we all agree on genloadtxt, we can use tailored-made version in different modules, like you suggest. There's an extra issue for which we have an solution I'm not completely satisfied with: names=True. It might be simpler for basic user not to set names=True, and have the first header recognized as names or not if needed (by processing the first line after the others, and using it as header if it's found to be a list of names, or inserting it back at the beginning otherwise)... ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] fromiter typo?
Says it takes a default dtype arg, but doesn't act like it's an optional arg: fromiter (iterator or generator, dtype=None) Construct an array from an iterator or a generator. Only handles 1-dimensional cases. By default the data-type is determined from the objects returned from the iterator. --- 20 z = fromiter (y) TypeError: function takes at least 2 arguments (1 given) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fromiter typo?
Mon, 01 Dec 2008 14:43:11 -0500, Neal Becker wrote: Says it takes a default dtype arg, but doesn't act like it's an optional arg: fromiter (iterator or generator, dtype=None) Construct an array from an iterator or a generator. Only handles 1-dimensional cases. By default the data-type is determined from the objects returned from the iterator. --- 20 z = fromiter (y) TypeError: function takes at least 2 arguments (1 given) The docstring is correct in 1.2.1 and in the documentation; I suppose you have an older version of Numpy. -- Pauli Virtanen ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.loadtxt : yet a new implementation...
Hi Pierre 2008/12/1 Pierre GM [EMAIL PROTECTED]: * `genloadtxt` is the base function that makes all the work. It outputs 2 arrays, one for the data (missing values being substituted by the appropriate default) and one for the mask. It would go in np.lib.io I see the code length increased from 200 lines to 800. This made me wonder about the execution time: initial benchmarks suggest a 3x slow-down. Could this be a problem for loading large text files? If so, should we consider keeping both versions around, or by default bypassing all the extra hooks? Regards Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.loadtxt : yet a new implementation...
Stéfan van der Walt wrote: Hi Pierre 2008/12/1 Pierre GM [EMAIL PROTECTED]: * `genloadtxt` is the base function that makes all the work. It outputs 2 arrays, one for the data (missing values being substituted by the appropriate default) and one for the mask. It would go in np.lib.io I see the code length increased from 200 lines to 800. This made me wonder about the execution time: initial benchmarks suggest a 3x slow-down. Could this be a problem for loading large text files? If so, should we consider keeping both versions around, or by default bypassing all the extra hooks? I've wondered about this being an issue. On one hand, you hate to make existing code noticeably slower. On the other hand, if speed is important to you, why are you using ascii I/O? I personally am not entirely against having two versions of loadtxt-like functions. However, the idea seems a little odd, seeing as how loadtxt was already supposed to be the swiss army knife of text reading. I'm seeing a similar slowdown with Pierre's version of the code. The version of loadtxt that I cobbled together with the StringConverter class (and no missing value support) shows about a 50% slowdown, so clearly there's a performance penalty for trying to make a generic function that can be all things to all people. On the other hand, this approach reduces code duplication. I'm not really opinionated on what the right approach is here. My only opinion is that this functionality *really* needs to be in numpy in some fashion. For my own use case, with the old version, I could read a text file and by hand separate out columns and mask values. Now, I open a file and get a structured array with an automatically detected dtype (names and types!) plus masked values. My $0.02. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.loadtxt : yet a new implementation...
2008/12/1 Ryan May [EMAIL PROTECTED]: I've wondered about this being an issue. On one hand, you hate to make existing code noticeably slower. On the other hand, if speed is important to you, why are you using ascii I/O? More I than O! But I think numpy.fromfile, once fixed up, could fill this niche nicely. I personally am not entirely against having two versions of loadtxt-like functions. However, the idea seems a little odd, seeing as how loadtxt was already supposed to be the swiss army knife of text reading. I haven't investigated the code in too much detail, but wouldn't it be possible to implement the current set of functionality in a base-class, which is then specialised to add the rest? That way, one could always instantiate TextReader yourself for some added speed. I'm not really opinionated on what the right approach is here. My only opinion is that this functionality *really* needs to be in numpy in some fashion. For my own use case, with the old version, I could read a text file and by hand separate out columns and mask values. Now, I open a file and get a structured array with an automatically detected dtype (names and types!) plus masked values. That's neat! Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.loadtxt : yet a new implementation...
I agree, genloadtxt is a bit blotted, and it's not a surprise it's slower than the initial one. I think that in order to be fair, comparisons must be performed with matplotlib.mlab.csv2rec, that implements as well the autodetection of the dtype. I'm quite in favor of keeping a lite version around. On Dec 1, 2008, at 4:47 PM, Stéfan van der Walt wrote: I haven't investigated the code in too much detail, but wouldn't it be possible to implement the current set of functionality in a base-class, which is then specialised to add the rest? That way, one could always instantiate TextReader yourself for some added speed. Well, one of the issues is that we need to keep the function compatible w/ urllib.urlretrieve (Ryan, am I right?), which means not being able to go back to the beginning of a file (no call to .seek). Another issue comes from the possibility to define the dtype automatically: you need to keep track of the converters, then have to do a second loop on the data. Those converters are likely the bottleneck, as you need to check whether each value can be interpreted as missing or not and respond appropriately. I thought about creating a base class, with a specific subclass taking care of the missing values. I found out it would have duplicated a lot of code In any case, I think that's secondary: we can always optimize pieces of the code afterwards. I'd like more feedback on corner cases and usage... ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug in ma.masked_all()?
Pierre, ma.masked_all does not seem to work with fancy dtypes and more then one dimension: In [1]:import numpy as np In [2]:dt = np.dtype({'names': ['a', 'b'], 'formats': ['f', 'f']}) In [3]:x = np.ma.masked_all((2,), dtype=dt) In [4]:x Out[4]: masked_array(data = [(--, --) (--, --)], mask = [(True, True) (True, True)], fill_value=(1.000200408773e+20, 1.000200408773e+20)) In [5]:x = np.ma.masked_all((2,2), dtype=dt) --- TypeError Traceback (most recent call last) /home/efiring/ipython console in module() /usr/local/lib/python2.5/site-packages/numpy/ma/extras.pyc in masked_all(shape, dtype) 78 79 a = masked_array(np.empty(shape, dtype), --- 80 mask=np.ones(shape, bool)) 81 return a 82 /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in __new__(cls, data, mask, dtype, copy, subok, ndmin, fill_value, keep_mask, hard_mask, flag, shrink, **options) 1304 except TypeError: 1305 mask = np.array([tuple([m]*len(mdtype)) for m in mask], - 1306 dtype=mdtype) 1307 # Make sure the mask and the data have the same shape 1308 if mask.shape != _data.shape: TypeError: expected a readable buffer object - Eric ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.loadtxt : yet a new implementation...
Stéfan van der Walt wrote: important to you, why are you using ascii I/O? ascii I/O is slow, so that's a reason in itself to want it not to be slower! More I than O! But I think numpy.fromfile, once fixed up, could fill this niche nicely. I agree -- for the simple cases, fromfile() could work very well -- perhaps it could even be used to speed up some special cases of loadtxt. But is anyone working on fromfile()? By the way, I think overloading fromfile() for text files is a bit misleading for users -- I propose we have a fromtextfile() or something instead. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.loadtxt : yet a new implementation...
Pierre GM wrote: Another issue comes from the possibility to define the dtype automatically: Does all that get bypassed if the dtype(s) is specified? Is it still slow in that case? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] fast way to convolve a 2d array with 1d filter
Hi, I need to convolve a 1d filter with 8 coefficients with a 2d array of the shape (6,7). I can use convolve to perform the operation for each row. This will involve a for loop with a counter 6. I wonder there is an fast way to do this in numpy without using for loop. Does anyone know how to do it? Thanks Frank _ Access your email online and on the go with Windows Live Hotmail. http://windowslive.com/Explore/Hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_access_112008___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ANN: HDF5 for Python 1.0
= Announcing HDF5 for Python (h5py) 1.0 = What is h5py? - HDF5 for Python (h5py) is a general-purpose Python interface to the Hierarchical Data Format library, version 5. HDF5 is a versatile, mature scientific software library designed for the fast, flexible storage of enormous amounts of data. From a Python programmer's perspective, HDF5 provides a robust way to store data, organized by name in a tree-like fashion. You can create datasets (arrays on disk) hundreds of gigabytes in size, and perform random-access I/O on desired sections. Datasets are organized in a filesystem-like hierarchy using containers called groups, and accesed using the tradional POSIX /path/to/resource syntax. This is the fourth major release of h5py, and represents the end of the unstable (0.X.X) design phase. Why should I use it? H5py provides a simple, robust read/write interface to HDF5 data from Python. Existing Python and NumPy concepts are used for the interface; for example, datasets on disk are represented by a proxy class that supports slicing, and has dtype and shape attributes. HDF5 groups are are presented using a dictionary metaphor, indexed by name. A major design goal of h5py is interoperability; you can read your existing data in HDF5 format, and create new files that any HDF5- aware program can understand. No Python-specific extensions are used; you're free to implement whatever file structure your application desires. Almost all HDF5 features are available from Python, including things like compound datatypes (as used with NumPy recarray types), HDF5 attributes, hyperslab and point-based I/O, and more recent features in HDF 1.8 like resizable datasets and recursive iteration over entire files. The foundation of h5py is a near-complete wrapping of the HDF5 C API. HDF5 identifiers are first-class objects which participate in Python reference counting, and expose the C API via methods. This low-level interface is also made available to Python programmers, and is exhaustively documented. See the Quick-Start Guide for a longer introduction with code examples: http://h5py.alfven.org/docs/guide/quick.html Where to get it --- * Main website, documentation: http://h5py.alfven.org * Downloads, bug tracker: http://h5py.googlecode.com * The HDF group website also contains a good introduction: http://www.hdfgroup.org/HDF5/doc/H5.intro.html Requires * UNIX-like platform (Linux or Mac OS-X); Windows version is in progress. * Python 2.5 or 2.6 * NumPy 1.0.3 or later (1.1.0 or later recommended) * HDF5 1.6.5 or later, including 1.8. Some features only available when compiled against HDF5 1.8. * Optionally, Cython (see cython.org) if you want to use custom install options. You'll need version 0.9.8.1.1 or later. About this version -- Version 1.0 follows version 0.3.1 as the latest public release. The major design phase (which began in May of 2008) is now over; the design of the high-level API will be supported as-is for the rest of the 1.X series, with minor enhancements. This is the first version to support Python 2.6, and the first to use Cython for the low-level interface. The license remains 3-clause BSD. ** This project is NOT affiliated with The HDF Group. ** Thanks -- Thanks to D. Dale, E. Lawrence and other for their continued support and comments. Also thanks to the PyTables project, for inspiration and generously providing their code to the community, and to everyone at the HDF Group for creating such a useful piece of software. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in ma.masked_all()?
On Dec 1, 2008, at 6:09 PM, Eric Firing wrote: Pierre, ma.masked_all does not seem to work with fancy dtypes and more then one dimension: Eric, Should be fixed in SVN (r6130). There were indeed problems with nested dtypes. Tricky beasts they are. Thanks for reporting! ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: HDF5 for Python 1.0
Requires * UNIX-like platform (Linux or Mac OS-X); Windows version is in progress I installed version 0.3.0 back in August on WindowsXP, and as far as I remember there were no problems at all with the install, and all tests pass. I thought the interface was really easy to use. But after trying it out I realized that my matlab is too old to understand the generated hdf5 files in an easy-to-use way, and I had to go back to csv-files. Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-user] os x, intel compilers mkl, and fink python
On 28-Nov-08, at 5:38 PM, Gideon Simpson wrote: Has anyone gotten the combination of OS X with a fink python distribution to successfully build numpy/scipy with the intel compilers and the mkl? If so, how'd you do it? IIRC David Cournapeau has had some success building numpy with MKL on OS X, but I doubt it was the fink distribution. Is there a reason you prefer fink's python rather than the Python.org universal framework build? Also, which particular python version (2.4, 2.5, 2.6? I know fink typically has a couple). David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast way to convolve a 2d array with 1d filter
Hi Frank 2008/12/2 frank wang [EMAIL PROTECTED]: I need to convolve a 1d filter with 8 coefficients with a 2d array of the shape (6,7). I can use convolve to perform the operation for each row. This will involve a for loop with a counter 6. I wonder there is an fast way to do this in numpy without using for loop. Does anyone know how to do it? Since 6x7 is quite small, so you can afford this trick: a) Pad the 6,7 array to 6,14. b) Flatten the array c) Perform convolution d) Unflatten array e) Take out valid values Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast way to convolve a 2d array with 1d filter
This is what I thought to do. However, I am not sure whether this is a fast way to do it and also I want to find a more generous way to do it. I thought there may be a more elegant way to do it. Thanks Frank Date: Tue, 2 Dec 2008 07:42:27 +0200 From: [EMAIL PROTECTED] To: numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] fast way to convolve a 2d array with 1d filter Hi Frank 2008/12/2 frank wang [EMAIL PROTECTED]: I need to convolve a 1d filter with 8 coefficients with a 2d array of the shape (6,7). I can use convolve to perform the operation for each row. This will involve a for loop with a counter 6. I wonder there is an fast way to do this in numpy without using for loop. Does anyone know how to do it? Since 6x7 is quite small, so you can afford this trick: a) Pad the 6,7 array to 6,14. b) Flatten the array c) Perform convolution d) Unflatten array e) Take out valid values Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion _ Get more done, have more fun, and stay more connected with Windows Mobile®. http://clk.atdmt.com/MRT/go/119642556/direct/01/___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in ma.masked_all()?
Pierre, Your change fixed masked_all for the example I gave, but I think it introduced a new failure in zeros: dt = np.dtype([((' Pressure, Digiquartz [db]', 'P'), 'f4'), ((' Depth [salt water, m]', 'D'), 'f4'), ((' Temperature [ITS-90, deg C]', 'T'), 'f4'), ((' Descent Rate [m/s]', 'w'), 'f4'), ((' Salinity [PSU]', 'S'), 'f4'), ((' Density [sigma-theta, Kg/m^3]', 'sigtheta'), 'f4'), ((' Potential Temperature [ITS-90, deg C]', 'theta'), 'f4')]) np.ma.zeros((2,2), dt) results in: ValueErrorTraceback (most recent call last) /home/efiring/ipython console in module() /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in __call__(self, a, *args, **params) 4533 # 4534 def __call__(self, a, *args, **params): - 4535 return self._func.__call__(a, *args, **params).view(MaskedArray) 4536 4537 arange = _convert2ma('arange') /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in __array_finalize__(self, obj) 1548 odtype = obj.dtype 1549 if odtype.names: - 1550 _mask = getattr(obj, '_mask', make_mask_none(obj.shape, odtype)) 1551 else: 1552 _mask = getattr(obj, '_mask', nomask) /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in make_mask_none(newshape, dtype) 921 result = np.zeros(newshape, dtype=MaskType) 922 else: -- 923 result = np.zeros(newshape, dtype=make_mask_descr(dtype)) 924 return result 925 /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in make_mask_descr(ndtype) 819 if not isinstance(ndtype, np.dtype): 820 ndtype = np.dtype(ndtype) -- 821 return np.dtype(_make_descr(ndtype)) 822 823 def get_mask(a): /usr/local/lib/python2.5/site-packages/numpy/ma/core.pyc in _make_descr(datatype) 806 descr = [] 807 for name in names: -- 808 (ndtype, _) = datatype.fields[name] 809 descr.append((name, _make_descr(ndtype))) 810 return descr ValueError: too many values to unpack ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast way to convolve a 2d array with 1d filter
On Mon, Dec 1, 2008 at 11:14 PM, frank wang [EMAIL PROTECTED] wrote: This is what I thought to do. However, I am not sure whether this is a fast way to do it and also I want to find a more generous way to do it. I thought there may be a more elegant way to do it. Thanks Frank Well, for just the one matrix not much will speed it up. If you have lots of matrices and the coefficients are fixed, then you can set up a convolution matrix whose columns are the coefficients shifted appropriately. Then just do a matrix multiply. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion