Re: [Numpy-discussion] How to start at line # x when using numpy.memmap
On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin jlcon...@gmail.com wrote: On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen p...@iki.fi wrote: Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: I would like to use numpy's memmap on some data files I have. The first 12 or so lines of the files contain text (header information) and the remainder has the numerical data. Is there a way I can tell memmap to skip a specified number of lines instead of a number of bytes? First use standard Python I/O functions to determine the number of bytes to skip at the beginning and the number of data items. Then pass in `offset` and `shape` parameters to numpy.memmap. Thanks for that suggestion. However, I'm unfamiliar with the I/O functions you are referring to. Can you point me to do the documentation? Thanks again, Jeremy ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion this might get you started: import numpy as np # make some fake data with 12 header lines. with open('test.mm', 'w') as fhw: print fhw, \n.join('header' for i in range(12)) np.arange(100, dtype=np.uint).tofile(fhw) # use normal python io to determine of offset after 12 lines. with open('test.mm') as fhr: for i in range(12): fhr.readline() offset = fhr.tell() # use the offset in your call to np.memmap. a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset) assert all(a == np.arange(100)) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to start at line # x when using numpy.memmap
On Fri, Aug 19, 2011 at 9:09 AM, Jeremy Conlin jlcon...@gmail.com wrote: On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen bpede...@gmail.com wrote: On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin jlcon...@gmail.com wrote: On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen p...@iki.fi wrote: Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: I would like to use numpy's memmap on some data files I have. The first 12 or so lines of the files contain text (header information) and the remainder has the numerical data. Is there a way I can tell memmap to skip a specified number of lines instead of a number of bytes? First use standard Python I/O functions to determine the number of bytes to skip at the beginning and the number of data items. Then pass in `offset` and `shape` parameters to numpy.memmap. Thanks for that suggestion. However, I'm unfamiliar with the I/O functions you are referring to. Can you point me to do the documentation? Thanks again, Jeremy ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion this might get you started: import numpy as np # make some fake data with 12 header lines. with open('test.mm', 'w') as fhw: print fhw, \n.join('header' for i in range(12)) np.arange(100, dtype=np.uint).tofile(fhw) # use normal python io to determine of offset after 12 lines. with open('test.mm') as fhr: for i in range(12): fhr.readline() offset = fhr.tell() # use the offset in your call to np.memmap. a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset) Thanks, that looks good. I tried it, but it doesn't get the correct data. I really don't understand what is going on. A simple code and sample data is attached if anyone has a chance to look at it. Thanks, Jeremy ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion in that case, i would use: np.loadtxt('tmp.dat', skiprows=12) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Memmap with multiprocessing
On Wed, Apr 27, 2011 at 4:07 PM, Christoph Gohlke cgoh...@uci.edu wrote: I don't think this was working correctly in numpy 1.4 either. The underlying problem seems to be that instance attributes of ndarray subtypes get lost during pickling: import pickle import numpy as np class aarray(np.ndarray): def __new__(subtype): self = np.ndarray.__new__(subtype, (1,)) self.attr = 'attr' return self def __array_finalize__(self, obj): self.attr = getattr(obj, 'attr', None) a = aarray() b = pickle.loads(a.dumps()) assert a.attr == b.attr, (a.attr, b.attr) AssertionError: ('attr', None) Christoph possibly related to this ticket: http://projects.scipy.org/numpy/ticket/1452 On 4/26/2011 2:21 PM, Ralf Gommers wrote: On Mon, Apr 25, 2011 at 1:16 PM, Thiago Franco Moraes totonixs...@gmail.com wrote: Hi, Has anyone confirmed if this is a bug? Should I post this in the bug tracker? I see the same thing with recent master. Something very strange is going on in the memmap.__array_finalize__ method under Windows. Can you file a bug? Ralf Thanks! On Tue, Apr 19, 2011 at 9:01 PM, Thiago Franco de Moraes totonixs...@gmail.com wrote: Hi all, I'm having a error using memmap objects shared among processes created by the multprocessing module. This error only happen in Windows with numpy 1.5 or above, in numpy 1.4.1 it doesn't happen, In Linux and Mac OS X it doesn't happen. This error is demonstrated by this little example script here https://gist.github.com/929168 , and the traceback is bellow (betweentraceback tags): traceback Process Process-1: Traceback (most recent call last): File C:\Python26\Lib\multiprocessing\process.py, line 232, in _bootstrap self.run() File C:\Python26\Lib\multiprocessing\process.py, line 88, in run self._target(*self._args, **self._kwargs) File C:\Documents and Settings\phamorim\Desktop\test.py, line 7, in print_ma trix print matrix File C:\Python26\Lib\site-packages\numpy\core\numeric.py, line 1379, in arra y_str return array2string(a, max_line_width, precision, suppress_small, ' ', , s tr) File C:\Python26\Lib\site-packages\numpy\core\arrayprint.py, line 309, in ar ray2string separator, prefix) File C:\Python26\Lib\site-packages\numpy\core\arrayprint.py, line 189, in _a rray2string data = _leading_trailing(a) File C:\Python26\Lib\site-packages\numpy\core\arrayprint.py, line 162, in _l eading_trailing min(len(a), _summaryEdgeItems))] File C:\Python26\Lib\site-packages\numpy\core\memmap.py, line 257, in __arra y_finalize__ self.filename = obj.filename AttributeError: 'memmap' object has no attribute 'filename' Exception AttributeError: AttributeError('NoneType' object has no attribute 'te ll',) inbound method memmap.__del__ of memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16) ignored /traceback I don't know if it's a bug, but I thought it's is import to report because the version 1.4.1 was working and 1.5.0 and above was not. Thanks! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] moving window product
hi, is there a way to take the product along a 1-d array in a moving window? -- similar to convolve, with product in place of sum? currently, i'm column_stacking the array with offsets of itself into window_size columns and then taking the product at axis 1. like:: w = np.column_stack(a[i:-window_size+i] for i in range(0, window_size)) window_product = np.product(w, axis=1) but then there are the edge effects/array size issues--like those handled in np.convolve. it there something in numpy/scipy that addresses this. or that does the column_stacking with an offset? thanks, -brent ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] moving window product
On Mon, Mar 21, 2011 at 11:19 AM, Keith Goodman kwgood...@gmail.com wrote: On Mon, Mar 21, 2011 at 10:10 AM, Brent Pedersen bpede...@gmail.com wrote: hi, is there a way to take the product along a 1-d array in a moving window? -- similar to convolve, with product in place of sum? currently, i'm column_stacking the array with offsets of itself into window_size columns and then taking the product at axis 1. like:: w = np.column_stack(a[i:-window_size+i] for i in range(0, window_size)) window_product = np.product(w, axis=1) but then there are the edge effects/array size issues--like those handled in np.convolve. it there something in numpy/scipy that addresses this. or that does the column_stacking with an offset? The Bottleneck package has a fast moving window sum (bn.move_sum and bn.move_nansum). You could use that along with a = np.random.rand(5) a.prod() 0.015877866878931741 np.exp(np.log(a).sum()) 0.015877866878931751 Or you could use strides or scipy.ndimage as in https://github.com/kwgoodman/bottleneck/blob/master/bottleneck/slow/move.py ah yes, of course. thank you. def moving_product(a, window_size, mode=same): return np.exp(np.convolve(np.log(a), np.ones(window_size), mode)) i'll have a closer look at your strided version in bottleneck as well. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] moving window product
On Mon, Mar 21, 2011 at 11:57 AM, Keith Goodman kwgood...@gmail.com wrote: On Mon, Mar 21, 2011 at 10:34 AM, Brent Pedersen bpede...@gmail.com wrote: On Mon, Mar 21, 2011 at 11:19 AM, Keith Goodman kwgood...@gmail.com wrote: On Mon, Mar 21, 2011 at 10:10 AM, Brent Pedersen bpede...@gmail.com wrote: hi, is there a way to take the product along a 1-d array in a moving window? -- similar to convolve, with product in place of sum? currently, i'm column_stacking the array with offsets of itself into window_size columns and then taking the product at axis 1. like:: w = np.column_stack(a[i:-window_size+i] for i in range(0, window_size)) window_product = np.product(w, axis=1) but then there are the edge effects/array size issues--like those handled in np.convolve. it there something in numpy/scipy that addresses this. or that does the column_stacking with an offset? The Bottleneck package has a fast moving window sum (bn.move_sum and bn.move_nansum). You could use that along with a = np.random.rand(5) a.prod() 0.015877866878931741 np.exp(np.log(a).sum()) 0.015877866878931751 Or you could use strides or scipy.ndimage as in https://github.com/kwgoodman/bottleneck/blob/master/bottleneck/slow/move.py ah yes, of course. thank you. def moving_product(a, window_size, mode=same): return np.exp(np.convolve(np.log(a), np.ones(window_size), mode)) i'll have a closer look at your strided version in bottleneck as well. I don't know what size problem you are working on or if speed is an issue, but here are some timings: a = np.random.rand(100) window_size = 1000 timeit np.exp(np.convolve(np.log(a), np.ones(window_size), 'same')) 1 loops, best of 3: 889 ms per loop timeit np.exp(bn.move_sum(np.log(a), window_size)) 10 loops, best of 3: 82.5 ms per loop Most all that time is spent in np.exp(np.log(a)): timeit bn.move_sum(a, window_size) 100 loops, best of 3: 3.72 ms per loop So I assume if I made a bn.move_prod the time would be around 200x compared to convolve. BTW, you could do the exp inplace: timeit b = bn.move_sum(np.log(a), window_size); np.exp(b, b) 10 loops, best of 3: 76.3 ms per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion my current use-case is to do this 24 times on arrays of about 200K elements. file IO is the major bottleneck. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 2D binning
On Tue, Jun 1, 2010 at 1:51 PM, Wes McKinney wesmck...@gmail.com wrote: On Tue, Jun 1, 2010 at 4:49 PM, Zachary Pincus zachary.pin...@yale.edu wrote: Hi Can anyone think of a clever (non-lopping) solution to the following? A have a list of latitudes, a list of longitudes, and list of data values. All lists are the same length. I want to compute an average of data values for each lat/lon pair. e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then data[1001] = (data[1001] + data[2001])/2 Looping is going to take wa to long. As a start, are the equal lat/lon pairs exactly equal (i.e. either not floating-point, or floats that will always compare equal, that is, the floating-point bit-patterns will be guaranteed to be identical) or approximately equal to float tolerance? If you're in the approx-equal case, then look at the KD-tree in scipy for doing near-neighbors queries. If you're in the exact-equal case, you could consider hashing the lat/ lon pairs or something. At least then the looping is O(N) and not O(N^2): import collections grouped = collections.defaultdict(list) for lt, ln, da in zip(lat, lon, data): grouped[(lt, ln)].append(da) averaged = dict((ltln, numpy.mean(da)) for ltln, da in grouped.items()) Is that fast enough? Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion This is a pretty good example of the group-by problem that will hopefully work its way into a future edition of NumPy. Given that, a good approach would be to produce a unique key from the lat and lon vectors, and pass that off to the groupby routine (when it exists). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion meanwhile groupby from itertools will work but might be a bit slower since it'll have to convert every row to tuple and group in a list. import numpy as np import itertools # fake data N = 1 lats = np.repeat(180 * (np.random.ranf(N/ 250) - 0.5), 250) lons = np.repeat(360 * (np.random.ranf(N/ 250) - 0.5), 250) np.random.shuffle(lats) np.random.shuffle(lons) vals = np.arange(N) # inds = np.lexsort((lons, lats)) sorted_lats = lats[inds] sorted_lons = lons[inds] sorted_vals = vals[inds] llv = np.array((sorted_lats, sorted_lons, sorted_vals)).T for (lat, lon), group in itertools.groupby(llv, lambda row: tuple(row[:2])): group_vals = [g[-1] for g in group] print lat, lon, np.mean(group_vals) # make sure the mean for the last lat/lon from the loop matches the mean # for that lat/lon from original data. tests_idx, = np.where((lats == lat) (lons == lon)) assert np.mean(vals[tests_idx]) == np.mean(group_vals) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [Patch] Fix memmap pickling
On Mon, May 24, 2010 at 3:37 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Mon, May 24, 2010 at 03:33:09PM -0700, Brent Pedersen wrote: On Mon, May 24, 2010 at 3:25 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: Memmapped arrays don't pickle right. I know that to get them to really pickle and restore identically, we would need some effort. However, in the current status, pickling and restoring a memmapped array leads to tracebacks that seem like they could be avoided. I am attaching a patch with a test that shows the problem, and a fix. Should I create a ticket, or is this light-enough to be applied immediatly? also check this: http://projects.scipy.org/numpy/ticket/1452 still needs work. Does look good. Is there an ETA for your patch to be applied? Right now this bug is making code crash when memmapped arrays are used (eg multiprocessing), so a hot fix can be useful, without removing any merit to your work that addresses the underlying problem. Cheers, Gaël ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion gael, not sure about ETA of application. i think the main remaining problem (other than more tests) is py3 support--as charris points out in the ticket. i have a start which shadows numpy's __getitem__, but havent fixed all the bugs--and not sure that's a good idea. my original patch was quite simple as well, but once it starts supporting all versions and more edge cases ... ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [Patch] Fix memmap pickling
On Mon, May 24, 2010 at 3:25 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: Memmapped arrays don't pickle right. I know that to get them to really pickle and restore identically, we would need some effort. However, in the current status, pickling and restoring a memmapped array leads to tracebacks that seem like they could be avoided. I am attaching a patch with a test that shows the problem, and a fix. Should I create a ticket, or is this light-enough to be applied immediatly? Cheers, Gaël ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion also check this: http://projects.scipy.org/numpy/ticket/1452 still needs work. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] faster code
On Sun, May 16, 2010 at 12:14 PM, Davide Lasagna lasagnadav...@gmail.com wrote: Hi all, What is the fastest and lowest memory consumption way to compute this? y = np.arange(2**24) bases = y[1:] + y[:-1] Actually it is already quite fast, but i'm not sure whether it is occupying some temporary memory is the summation. Any help is appreciated. Cheers Davide ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion how about something like this? may have off-by-1 somewhere. bases = np.arange(1, 2*2**24-1, 2) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] patch to pickle np.memmap
On Tue, Apr 13, 2010 at 8:59 PM, Brent Pedersen bpede...@gmail.com wrote: On Tue, Apr 13, 2010 at 8:52 PM, Brent Pedersen bpede...@gmail.com wrote: hi, i posted a patch to allow pickling of np.memmap objects. http://projects.scipy.org/numpy/ticket/1452 currently, it always returns 'r' for the mode. is that the best thing to do there? any other changes? -brent and i guess it should (but does not with that patch) correctly handle: a = np.memmap(...) b = a[2:] cPickle.dumps(b) any thoughts? it still always loads in mode='r', but i updated the patch to handle slicing so it works like this: import numpy as np a = np.memmap('t.bin', mode='w+', shape=(10,)) a[:] = np.arange(10) a memmap([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8) np.loads(a[1:4].dumps()) memmap([1, 2, 3], dtype=uint8) np.loads(a[-2:].dumps()) memmap([8, 9], dtype=uint8) np.loads(a[-4:-1].dumps()) memmap([6, 7, 8], dtype=uint8) http://projects.scipy.org/numpy/ticket/1452 -brent ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] patch to pickle np.memmap
hi, i posted a patch to allow pickling of np.memmap objects. http://projects.scipy.org/numpy/ticket/1452 currently, it always returns 'r' for the mode. is that the best thing to do there? any other changes? -brent ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] patch to pickle np.memmap
On Tue, Apr 13, 2010 at 8:52 PM, Brent Pedersen bpede...@gmail.com wrote: hi, i posted a patch to allow pickling of np.memmap objects. http://projects.scipy.org/numpy/ticket/1452 currently, it always returns 'r' for the mode. is that the best thing to do there? any other changes? -brent and i guess it should (but does not with that patch) correctly handle: a = np.memmap(...) b = a[2:] cPickle.dumps(b) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Name of the file associated with a memmap
On Mon, Apr 12, 2010 at 1:59 PM, Robert Kern robert.k...@gmail.com wrote: On Mon, Apr 12, 2010 at 15:52, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Apr 12, 2010 at 2:37 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Apr 12, 2010 at 1:55 PM, Brent Pedersen bpede...@gmail.com wrote: On Mon, Apr 12, 2010 at 9:49 AM, Robert Kern robert.k...@gmail.com wrote: On Mon, Apr 12, 2010 at 04:03, Nadav Horesh nad...@visionsense.com wrote: Is there a way to get the file-name given a memmap array object? Not at this time. This would be very useful, though, so patches are welcome. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion this sounded easy, so i gave it a shot: http://projects.scipy.org/numpy/ticket/1451 i think that works. Looks OK, I applied it. Could you add some documentation? And maybe the filename should be the whole path? Thoughts? Yes, that would help. While you are looking at it, you may want to consider recording some of the other information that is computed in or provided to __new__, like offset. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion copying what i asked in the ticket: where should i write the docs? in the file itself or through the doc editor? also re path, since it can be a file-like, that would have to be something like: if isinstance(filename, basestring): filename = os.path.abspath(filename) self.filename = filename ok with that? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Name of the file associated with a memmap
On Mon, Apr 12, 2010 at 2:46 PM, Robert Kern robert.k...@gmail.com wrote: On Mon, Apr 12, 2010 at 16:43, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Mon, Apr 12, 2010 at 04:39:23PM -0500, Robert Kern wrote: where should i write the docs? in the file itself or through the doc editor? also re path, since it can be a file-like, that would have to be something like: if isinstance(filename, basestring): filename = os.path.abspath(filename) self.filename = filename ok with that? In the case of file object, we should grab the filename from it. Whether the filename argument to the constructor was a file name or a file object, self.filename should always be the file name, IMO. +1. Once this is in, would it make it possible/desirable to have memmapped arrays pickle (I know that it would require work, I am just asking). You need some more information from the constructor arguments, but yes. anything other than offset and mode? -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Name of the file associated with a memmap
On Mon, Apr 12, 2010 at 3:08 PM, Robert Kern robert.k...@gmail.com wrote: On Mon, Apr 12, 2010 at 17:00, Brent Pedersen bpede...@gmail.com wrote: On Mon, Apr 12, 2010 at 2:46 PM, Robert Kern robert.k...@gmail.com wrote: On Mon, Apr 12, 2010 at 16:43, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Mon, Apr 12, 2010 at 04:39:23PM -0500, Robert Kern wrote: where should i write the docs? in the file itself or through the doc editor? also re path, since it can be a file-like, that would have to be something like: if isinstance(filename, basestring): filename = os.path.abspath(filename) self.filename = filename ok with that? In the case of file object, we should grab the filename from it. Whether the filename argument to the constructor was a file name or a file object, self.filename should always be the file name, IMO. +1. Once this is in, would it make it possible/desirable to have memmapped arrays pickle (I know that it would require work, I am just asking). You need some more information from the constructor arguments, but yes. anything other than offset and mode? I think that's about it. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion i added a new patch to the ticket and updated the docs here: http://docs.scipy.org/numpy/docs/numpy.core.memmap.memmap/ the preview is somehow rendering the sections in a different order than they appear in the RST, not sure what's going on there. -b ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Name of the file associated with a memmap
On Mon, Apr 12, 2010 at 3:31 PM, Brent Pedersen bpede...@gmail.com wrote: On Mon, Apr 12, 2010 at 3:08 PM, Robert Kern robert.k...@gmail.com wrote: On Mon, Apr 12, 2010 at 17:00, Brent Pedersen bpede...@gmail.com wrote: On Mon, Apr 12, 2010 at 2:46 PM, Robert Kern robert.k...@gmail.com wrote: On Mon, Apr 12, 2010 at 16:43, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Mon, Apr 12, 2010 at 04:39:23PM -0500, Robert Kern wrote: where should i write the docs? in the file itself or through the doc editor? also re path, since it can be a file-like, that would have to be something like: if isinstance(filename, basestring): filename = os.path.abspath(filename) self.filename = filename ok with that? In the case of file object, we should grab the filename from it. Whether the filename argument to the constructor was a file name or a file object, self.filename should always be the file name, IMO. +1. Once this is in, would it make it possible/desirable to have memmapped arrays pickle (I know that it would require work, I am just asking). You need some more information from the constructor arguments, but yes. anything other than offset and mode? I think that's about it. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion i added a new patch to the ticket and updated the docs here: http://docs.scipy.org/numpy/docs/numpy.core.memmap.memmap/ the preview is somehow rendering the sections in a different order than they appear in the RST, not sure what's going on there. -b Charles, thanks for committing, i just added another patch for just the tests which i forgot to include when i diffed last time. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Utility function to find array items are in ascending order
On Tue, Feb 9, 2010 at 7:42 AM, Vishal Rana ranavis...@gmail.com wrote: Hi, Is there any utility function to find if values in the array are in ascending or descending order. Example: arr = [1, 2, 4, 6] should return true arr2 = [1, 0, 2, -2] should return false Thanks Vishal ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion i dont know if there's a utility function, but i'd use: np.all(a[1:] = a[:-1]) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] documenting optional out parameter
hi, i've seen this section: http://docs.scipy.org/numpy/Questions+Answers/#the-out-argument should _all_ functions with an optional out parameter have exactly that text? so if i find a docstring with reasonable, but different doc for out, should it be changed to that? and if a docstring of a function with an optional out that needs review does not have the out parameter documented should it be marked as 'Needs Work'? thanks, -brentp ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodmankwgood...@gmail.com wrote: On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Jun 5, 2009 at 12:53 PM, josef.p...@gmail.com wrote: On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais bbl...@bryant.edu wrote: Hello, I have a vectorizing problem that I don't see an obvious way to solve. What I have is a vector like: obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) and a matrix T=zeros((6,6)) and what I want in T is a count of all of the transitions in obs, e.g. T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the sequence 3-4 only happens once, etc... I can do it unvectorized like: for o1,o2 in zip(obs[:-1],obs[1:]): T[o1,o2]+=1 which gives the correct answer from above, which is: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) but I thought there would be a better way. I tried: o1=obs[:-1] o2=obs[1:] T[o1,o2]+=1 but this doesn't give a count, it just yields 1's at the transition points, like: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 1., 0., 0., 1.], [ 0., 1., 0., 1., 0., 0.], [ 0., 0., 1., 0., 1., 0.], [ 0., 0., 0., 1., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) Is there a clever way to do this? I could write a quick Cython solution, but I wanted to keep this as an all-numpy implementation if I can. histogram2d or its imitation, there was a discussion on histogram2d a short time ago obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) obs2 = obs - 1 trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) re = np.array([[ 0., 0., 0., 0., 0., 0.], ... [ 0., 0., 3., 0., 0., 1.], ... [ 0., 3., 0., 1., 0., 0.], ... [ 0., 0., 2., 0., 1., 0.], ... [ 0., 0., 0., 2., 0., 0.], ... [ 0., 0., 0., 0., 1., 0.]]) np.all(re == trans) True trans array([[0, 0, 0, 0, 0, 0], [0, 0, 3, 0, 0, 1], [0, 3, 0, 1, 0, 0], [0, 0, 2, 0, 1, 0], [0, 0, 0, 2, 0, 0], [0, 0, 0, 0, 1, 0]]) or h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) re array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) h array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) np.all(re == h) True There's no way my list method can beat that. But by adding import psyco psyco.full() I get a total speed up of a factor of 15 when obs is length 1. Actually, it is faster: histogram: h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) 100 loops, best of 3: 4.14 ms per loop lists: timeit test(obs3, T3) 1000 loops, best of 3: 1.32 ms per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion here's a go: import numpy as np import random from itertools import groupby def test1(obs, T): for o1,o2 in zip(obs[:-1],obs[1:]): T[o1][o2] += 1 return T def test2(obs, T): s = zip(obs[:-1], obs[1:]) for idx, g in groupby(sorted(s)): T[idx] = len(list(g)) return T obs = [random.randint(0, 5) for z in range(1)] print test2(obs, np.zeros((6, 6))) print test1(obs, np.zeros((6, 6))) ## In [10]: timeit test1(obs, np.zeros((6, 6))) 100 loops, best of 3: 18.8 ms per loop In [11]: timeit test2(obs, np.zeros((6, 6))) 100 loops, best of 3: 6.91 ms per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 1:27 PM, Keith Goodmankwgood...@gmail.com wrote: On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen bpede...@gmail.com wrote: On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodmankwgood...@gmail.com wrote: On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Jun 5, 2009 at 12:53 PM, josef.p...@gmail.com wrote: On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais bbl...@bryant.edu wrote: Hello, I have a vectorizing problem that I don't see an obvious way to solve. What I have is a vector like: obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) and a matrix T=zeros((6,6)) and what I want in T is a count of all of the transitions in obs, e.g. T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the sequence 3-4 only happens once, etc... I can do it unvectorized like: for o1,o2 in zip(obs[:-1],obs[1:]): T[o1,o2]+=1 which gives the correct answer from above, which is: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) but I thought there would be a better way. I tried: o1=obs[:-1] o2=obs[1:] T[o1,o2]+=1 but this doesn't give a count, it just yields 1's at the transition points, like: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 1., 0., 0., 1.], [ 0., 1., 0., 1., 0., 0.], [ 0., 0., 1., 0., 1., 0.], [ 0., 0., 0., 1., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) Is there a clever way to do this? I could write a quick Cython solution, but I wanted to keep this as an all-numpy implementation if I can. histogram2d or its imitation, there was a discussion on histogram2d a short time ago obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) obs2 = obs - 1 trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) re = np.array([[ 0., 0., 0., 0., 0., 0.], ... [ 0., 0., 3., 0., 0., 1.], ... [ 0., 3., 0., 1., 0., 0.], ... [ 0., 0., 2., 0., 1., 0.], ... [ 0., 0., 0., 2., 0., 0.], ... [ 0., 0., 0., 0., 1., 0.]]) np.all(re == trans) True trans array([[0, 0, 0, 0, 0, 0], [0, 0, 3, 0, 0, 1], [0, 3, 0, 1, 0, 0], [0, 0, 2, 0, 1, 0], [0, 0, 0, 2, 0, 0], [0, 0, 0, 0, 1, 0]]) or h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) re array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) h array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) np.all(re == h) True There's no way my list method can beat that. But by adding import psyco psyco.full() I get a total speed up of a factor of 15 when obs is length 1. Actually, it is faster: histogram: h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) 100 loops, best of 3: 4.14 ms per loop lists: timeit test(obs3, T3) 1000 loops, best of 3: 1.32 ms per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion here's a go: import numpy as np import random from itertools import groupby def test1(obs, T): for o1,o2 in zip(obs[:-1],obs[1:]): T[o1][o2] += 1 return T def test2(obs, T): s = zip(obs[:-1], obs[1:]) for idx, g in groupby(sorted(s)): T[idx] = len(list(g)) return T obs = [random.randint(0, 5) for z in range(1)] print test2(obs, np.zeros((6, 6))) print test1(obs, np.zeros((6, 6))) ## In [10]: timeit test1(obs, np.zeros((6, 6))) 100 loops, best of 3: 18.8 ms per loop In [11]: timeit test2(obs, np.zeros((6, 6))) 100 loops, best of 3: 6.91 ms per loop Nice! Try adding import psyco psyco.full() to test1. Or is that cheating? it is if you're running 64bit. :-) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 2:01 PM, Keith Goodmankwgood...@gmail.com wrote: On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen bpede...@gmail.com wrote: On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodmankwgood...@gmail.com wrote: On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Jun 5, 2009 at 12:53 PM, josef.p...@gmail.com wrote: On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais bbl...@bryant.edu wrote: Hello, I have a vectorizing problem that I don't see an obvious way to solve. What I have is a vector like: obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) and a matrix T=zeros((6,6)) and what I want in T is a count of all of the transitions in obs, e.g. T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the sequence 3-4 only happens once, etc... I can do it unvectorized like: for o1,o2 in zip(obs[:-1],obs[1:]): T[o1,o2]+=1 which gives the correct answer from above, which is: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) but I thought there would be a better way. I tried: o1=obs[:-1] o2=obs[1:] T[o1,o2]+=1 but this doesn't give a count, it just yields 1's at the transition points, like: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 1., 0., 0., 1.], [ 0., 1., 0., 1., 0., 0.], [ 0., 0., 1., 0., 1., 0.], [ 0., 0., 0., 1., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) Is there a clever way to do this? I could write a quick Cython solution, but I wanted to keep this as an all-numpy implementation if I can. histogram2d or its imitation, there was a discussion on histogram2d a short time ago obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) obs2 = obs - 1 trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) re = np.array([[ 0., 0., 0., 0., 0., 0.], ... [ 0., 0., 3., 0., 0., 1.], ... [ 0., 3., 0., 1., 0., 0.], ... [ 0., 0., 2., 0., 1., 0.], ... [ 0., 0., 0., 2., 0., 0.], ... [ 0., 0., 0., 0., 1., 0.]]) np.all(re == trans) True trans array([[0, 0, 0, 0, 0, 0], [0, 0, 3, 0, 0, 1], [0, 3, 0, 1, 0, 0], [0, 0, 2, 0, 1, 0], [0, 0, 0, 2, 0, 0], [0, 0, 0, 0, 1, 0]]) or h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) re array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) h array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) np.all(re == h) True There's no way my list method can beat that. But by adding import psyco psyco.full() I get a total speed up of a factor of 15 when obs is length 1. Actually, it is faster: histogram: h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) 100 loops, best of 3: 4.14 ms per loop lists: timeit test(obs3, T3) 1000 loops, best of 3: 1.32 ms per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion here's a go: import numpy as np import random from itertools import groupby def test1(obs, T): for o1,o2 in zip(obs[:-1],obs[1:]): T[o1][o2] += 1 return T def test2(obs, T): s = zip(obs[:-1], obs[1:]) for idx, g in groupby(sorted(s)): T[idx] = len(list(g)) return T obs = [random.randint(0, 5) for z in range(1)] print test2(obs, np.zeros((6, 6))) print test1(obs, np.zeros((6, 6))) ## In [10]: timeit test1(obs, np.zeros((6, 6))) 100 loops, best of 3: 18.8 ms per loop In [11]: timeit test2(obs, np.zeros((6, 6))) 100 loops, best of 3: 6.91 ms per loop Wait, you tested the list method with an array. Try timeit test1(obs, np.zeros((6, 6)).tolist()) Probably best to move the array/list creation out of the timeit loop. Then my method won't have to pay the cost of converting to a list :) ah right, your test1 is faster than test2 that way. i'm just stoked to find out about ndimage.historgram. using this: histogram((obs[:-1])*6 +obs[1:], 0, 36, 36).reshape(6,6) it gives the exact result as test1/test2. -b ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy
Re: [Numpy-discussion] loadtxt slow
On Sun, Mar 1, 2009 at 11:29 AM, Michael Gilbert michael.s.gilb...@gmail.com wrote: On Sun, 1 Mar 2009 16:12:14 -0500 Gideon Simpson wrote: So I have some data sets of about 16 floating point numbers stored in text files. I find that loadtxt is rather slow. Is this to be expected? Would it be faster if it were loading binary data? i have run into this as well. loadtxt uses a python list to allocate memory for the data it reads in, so once you get to about 1/4th of your available memory, it will start allocating the updated list (every time it reads a new value from your data file) in swap instead of main memory, which is rediculously slow (in fact it causes my system to be quite unresponsive and a jumpy cursor). i have rewritten loadtxt to be smarter about allocating memory, but it is slower overall and doesn't support all of the original arguments/options (yet). i have some ideas to make it smarter/more efficient, but have not had the time to work on it recently. i will send the current version to the list tomorrow when i have access to the system that it is on. best wishes, mike ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion to address the slowness, i use wrappers around savetxt/loadtxt that save/load a .npy file along with/instead of the .txt file. -- and the loadtxt wrapper checks if the .npy is up-to-date. code here: http://rafb.net/p/dGBJjg80.html of course it's still slow the first time. i look forward to your speedups. -brentp ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt issues
On Tue, Feb 10, 2009 at 9:40 PM, A B python6...@gmail.com wrote: Hi, How do I write a loadtxt command to read in the following file and store each data point as the appropriate data type: 12|h|34.5|44.5 14552|bbb|34.5|42.5 Do the strings have to be read in separately from the numbers? Why would anyone use 'S10' instead of 'string'? dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4', 'S4','f4', 'f4')} a = loadtxt(sample_data.txt, dtype=dt) gives ValueError: need more than 1 value to unpack I can do a = loadtxt(sample_data.txt, dtype=string) but can't use 'string' instead of S4 and all my data is read into strings. Seems like all the examples on-line use either numeric or textual input, but not both. Thanks. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion works for me but not sure i understand the problem, did you try setting the delimiter? import numpy as np from cStringIO import StringIO txt = StringIO(\ 12|h|34.5|44.5 14552|bbb|34.5|42.5) dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4', 'S4','f4', 'f4')} a = np.loadtxt(txt, dtype=dt, delimiter=|) print a.dtype ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Comparison of arrays
On Mon, Feb 9, 2009 at 6:02 AM, Neil neilcrigh...@gmail.com wrote: I have two integer arrays of different shape, e.g. a array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) b array([ 3, 4, 5, 6, 7, 8, 9, 10]) How can I extract the values that belong to the array a exclusively i.e. array([1,2]) ? You could also use numpy.setmember1d to get a boolean mask that selects the values: In [21]: a = np.array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) In [22]: b = np.array([ 3, 4, 5, 6, 7, 8, 9, 10]) In [23]: ismember = np.setmember1d(a,b) In [24]: a[~ismember] Out[24]: array([1, 2]) Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion there's also np.setdiff1d() which does the above in a single line. -brent ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] genfromtxt view with object dtype
hi, i am using genfromtxt, with a dtype like this: [('seqid', '|S24'), ('source', '|S16'), ('type', '|S16'), ('start', 'i4'), ('end', 'i4'), ('score', 'f8'), ('strand', '|S1'), ('phase', 'i4'), ('attrs', '|O4')] where i'm having problems with the attrs column which i'd like to be a dict. i can specify a convertor to parse a string into a dict, and it is correctly converted to a dict, but then in io.py it tries to take a view() of that dtype and it gives the error: A = np.genfromtxt(fname, **kwargs) File /usr/lib/python2.5/site-packages/numpy/lib/io.py, line 922, in genfromtxt output = rows.view(dtype) TypeError: Cannot change data-type for object array. is there anyway around this or must that col be kept as a string? it seems like genfromtxt expects you to specify either a dtype _or_ a convertor, not both. thanks, -brent ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Renaming a field of an object array
On Wed, Feb 4, 2009 at 2:50 PM, Pierre GM pgmdevl...@gmail.com wrote: All, I'm a tad puzzled by the following behavior (I'm trying to correct a bug in genfromtxt): I'm creating an empty structured ndarray, using np.object as dtype. a = np.empty(1,dtype=[('',np.object)]) array([(None,)], dtype=[('f0', '|O4')]) Now, I'd like to rename the field: a.view([('NAME',np.object)]) TypeError: Cannot change data-type for object array. I understand why I can't change the *type* of the field, but not why I can't change its name that way. What would be an option that wouldn't involve creating a new array ? Thx in advance. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion hi, i was looking at this as well. the code in arrayobject.c doesnt match the error string. i changed the code to do what the error string says and seems thing to work. i think the if-block below it should also use xor (not changed in this patch), but i'm not a c programmer so i may be missing something obvious. svn diff numpy/core/src/arrayobject.c Index: numpy/core/src/arrayobject.c === --- numpy/core/src/arrayobject.c(revision 6338) +++ numpy/core/src/arrayobject.c(working copy) @@ -6506,9 +6506,16 @@ PyErr_SetString(PyExc_TypeError, invalid data-type for array); return -1; } -if (PyDataType_FLAGCHK(newtype, NPY_ITEM_HASOBJECT) || -PyDataType_FLAGCHK(newtype, NPY_ITEM_IS_POINTER) || -PyDataType_FLAGCHK(self-descr, NPY_ITEM_HASOBJECT) || +if (PyDataType_FLAGCHK(newtype, NPY_ITEM_HASOBJECT) ^ +PyDataType_FLAGCHK(self-descr, NPY_ITEM_HASOBJECT)) { +PyErr_SetString(PyExc_TypeError, \ +Cannot change data-type for object \ +array.); +Py_DECREF(newtype); +return -1; +} + +if (PyDataType_FLAGCHK(newtype, NPY_ITEM_IS_POINTER) || PyDataType_FLAGCHK(self-descr, NPY_ITEM_IS_POINTER)) { PyErr_SetString(PyExc_TypeError, \ Cannot change data-type for object \ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genfromtxt view with object dtype
On Wed, Feb 4, 2009 at 8:51 PM, Pierre GM pgmdevl...@gmail.com wrote: OK, Brent, try r6341. I fixed genfromtxt for cases like yours (explicit dtype involving a np.object). Note that the fix won't work if the dtype is nested and involves np.objects (as we would hit the pb of renaming fields we observed...). Let me know how it goes. P. that fixes it. thanks again pierre! -b On Feb 4, 2009, at 4:03 PM, Brent Pedersen wrote: On Wed, Feb 4, 2009 at 9:36 AM, Pierre GM pgmdevl...@gmail.com wrote: On Feb 4, 2009, at 12:09 PM, Brent Pedersen wrote: hi, i am using genfromtxt, with a dtype like this: [('seqid', '|S24'), ('source', '|S16'), ('type', '|S16'), ('start', 'i4'), ('end', 'i4'), ('score', 'f8'), ('strand', '|S1'), ('phase', 'i4'), ('attrs', '|O4')] Brent, Please post a simple, self-contained example with a few lines of the file you want to load. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion hi pierre, here is an example. thanks, -brent ## import numpy as np from cStringIO import StringIO gffstr = \ ##gff-version 3 1\tucb\tgene\t2234602\t2234702\t.\t-\t. \tID = grape_1_2234602_2234702 ;match = EVM_prediction_supercontig_1.248,EVM_prediction_supercontig_1.248.mRNA 1\tucb\tgene\t2300292\t2302123\t.\t+\t. \tID=grape_1_2300292_2302123;match=EVM_prediction_supercontig_244.8 1\tucb\tgene\t2303615\t2303967\t.\t+\t. \tID=grape_1_2303615_2303967;match=EVM_prediction_supercontig_244.8 1\tucb\tgene\t2303616\t2303966\t.\t+\t. \tParent=grape_1_2303615_2303967 1\tucb\tgene\t3596400\t3596503\t.\t-\t. \tID=grape_1_3596400_3596503;match=evm.TU.supercontig_167.27 1\tucb\tgene\t3600651\t3600977\t.\t-\t. \tmatch=evm.model.supercontig_1217.1,evm.model.supercontig_1217.1.mRNA dtype = {'names' : ('seqid', 'source', 'type', 'start', 'end', 'score', 'strand', 'phase', 'attrs') , 'formats': ['S24', 'S16', 'S16', 'i4', 'i4', 'f8', 'S1', 'i4', 'S128']} #OK with S128 for attrs print np.genfromtxt(StringIO(gffstr), dtype = dtype) def _attr(kvstr): pairs = [kv.split(=) for kv in kvstr.split(;)] return dict(pairs) # change S128 to object to have col attrs as dictionary dtype['formats'][-1] = 'O' converters = {8: _attr } #NOT OK print np.genfromtxt(StringIO(gffstr), dtype = dtype, converters=converters) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] genfromtxt
hi, i'm using the new genfromtxt stuff in numpy svn, looks great pierre any who contributed. is there a way to have the header commented and still be able to have it recognized as the header? e.g. #gender age weight M 21 72.10 F 35 58.33 M 33 21.99 if i use np.loadtxt or genfromtxt, it tries to use the 2nd row (first non-commented as the header). and i get an array like: array([('M', 21, 72.094), ('F', 35, 58.328), ('M', 33, 21.988)], dtype=[('f0', '|S1'), ('f1', 'i4'), ('f2', 'f8')]) when i want: array([('M', 21, 72.094), ('F', 35, 58.328), ('M', 33, 21.988)], dtype=[('gender', '|S1'), ('age', 'i4'), ('weight', 'f8')]) i can get that by uncommenting the header, but it's useful to have that for other code. is there anyway to do that? thanks, -brent ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genfromtxt
On Wed, Jan 21, 2009 at 9:39 PM, Pierre GM pgmdevl...@gmail.com wrote: Brent, Mind trying r6330 and let me know if it works for you ? Make sure that you use names=True to detect a header. P. yes, works perfectly. thanks! -brent ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy array change notifier?
On Mon, Oct 27, 2008 at 1:56 PM, Robert Kern [EMAIL PROTECTED] wrote: On Mon, Oct 27, 2008 at 15:54, Erik Tollerud [EMAIL PROTECTED] wrote: Is there any straightforward way of notifying on change of a numpy array that leaves the numpy arrays still efficient? Not currently, no. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion out of curiosity, would something like this affect efficiency (and/or work): class Notify(numpy.ndarray): def __setitem__(self, *args): self.notify(*args) return super(Notify, self).__setitem__(*args) def notify(self, *args): print 'notify:', args with also overriding setslice? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] array gymnastics
On Thu, Sep 11, 2008 at 6:03 PM, Alan Jackson [EMAIL PROTECTED] wrote: There has got to be a simple way to do this, but I'm just not seeing it. a = array([[1,2,3,4,5,6], [7,8,9,10,11,12]]) b = array([21,22,23,24,25,26]) What I want to end up with is : c = array([[1,7,21], [2,8,22], .. [6,12,26]]) -- --- | Alan K. Jackson| To see a World in a Grain of Sand | | [EMAIL PROTECTED] | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | --- ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion hi, i think numpy.column_stack((a[0], a[1], b)) does what you want. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] a good polygon class?
hi, http://pypi.python.org/pypi/Shapely supports the array interface, and has all the geos geometry operations: http://gispython.org/shapely/manual.html#contains On Tue, Aug 19, 2008 at 8:31 AM, mark [EMAIL PROTECTED] wrote: Hello List - I am looking for a good polygon class. My main interest it to be able to figure out if a point is inside or outside the polygon, which can have any shape (well, as long as it is a polygon). Any suggestions? Thanks, Mark ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion