[Numpy-discussion] fast duplicate of array
Suppose x and y are conformable 2d arrays. I now want x to become a duplicate of y. I could create a new array: x = y.copy() or I could assign the values of y to x: x[:,:] = y As expected the latter is faster (no array creation). Are there better ways? Thanks, Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast duplicate of array
2010/1/23 Alan G Isaac ais...@american.edu: Suppose x and y are conformable 2d arrays. I now want x to become a duplicate of y. I could create a new array: x = y.copy() or I could assign the values of y to x: x[:,:] = y As expected the latter is faster (no array creation). Are there better ways? If both arrays are C contiguous, or more generally contiguous blocks of memory with the same strided structure, you might get faster copying by flattening them first, so that it can go in a single memcpy(). For really large arrays that use complete pages, some low-level hackery involving memmap() might be able to make a shared copy-on-write copy at almost no cost until you start modifying one array or the other. But both of these tricks are intended for the regime where copying the data is the expensive part, not fabricating the array object; for that, I'm not sure you can accelerate things much. Anne ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast duplicate of array
On 1/23/2010 5:01 PM, Anne Archibald wrote: If both arrays are C contiguous, or more generally contiguous blocks of memory with the same strided structure, you might get faster copying by flattening them first, so that it can go in a single memcpy(). I may misuderstand this. Did you just mean x.flat = y.flat ? If so, I find that to be *much* slower. Thanks, Alan x = np.random.random((1000,1000)) y = x.copy() t0 = time.clock() for t in range(1000): x = y.copy() print(time.clock() - t0) t0 = time.clock() for t in range(1000): x[:,:] = y print(time.clock() - t0) t0 = time.clock() for t in range(1000): x.flat = y.flat print(time.clock() - t0) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast duplicate of array
On Sat, Jan 23, 2010 at 2:31 PM, Alan G Isaac ais...@american.edu wrote: On 1/23/2010 5:01 PM, Anne Archibald wrote: If both arrays are C contiguous, or more generally contiguous blocks of memory with the same strided structure, you might get faster copying by flattening them first, so that it can go in a single memcpy(). I may misuderstand this. Did you just mean x.flat = y.flat ? If so, I find that to be *much* slower. Thanks, Alan x = np.random.random((1000,1000)) y = x.copy() t0 = time.clock() for t in range(1000): x = y.copy() print(time.clock() - t0) t0 = time.clock() for t in range(1000): x[:,:] = y print(time.clock() - t0) t0 = time.clock() for t in range(1000): x.flat = y.flat print(time.clock() - t0) I don't know what a view is, but it is fast: x = y.view() def speed(): import numpy as np import time x = np.random.random((1000,1000)) y = x.copy() t0 = time.clock() for t in range(1000): x = y.copy() print(time.clock() - t0) t0 = time.clock() for t in range(1000): x[:,:] = y print(time.clock() - t0) t0 = time.clock() for t in range(1000): x.flat = y.flat print(time.clock() - t0) t0 = time.clock() for t in range(1000): x = y.view() print(time.clock() - t0) speed() 1.3 2.07 15.0 0.01 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast duplicate of array
On Sat, Jan 23, 2010 at 4:00 PM, Keith Goodman kwgood...@gmail.com wrote: On Sat, Jan 23, 2010 at 2:31 PM, Alan G Isaac ais...@american.edu wrote: On 1/23/2010 5:01 PM, Anne Archibald wrote: If both arrays are C contiguous, or more generally contiguous blocks of memory with the same strided structure, you might get faster copying by flattening them first, so that it can go in a single memcpy(). I may misuderstand this. Did you just mean x.flat = y.flat ? If so, I find that to be *much* slower. Thanks, Alan x = np.random.random((1000,1000)) y = x.copy() t0 = time.clock() for t in range(1000): x = y.copy() print(time.clock() - t0) t0 = time.clock() for t in range(1000): x[:,:] = y print(time.clock() - t0) t0 = time.clock() for t in range(1000): x.flat = y.flat print(time.clock() - t0) I don't know what a view is, but it is fast: x = y.view() In this case x isn't a copy of y, it is a reference to the same data in memory. It is fast because no copying is done. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast duplicate of array
2010/1/23 Alan G Isaac ais...@american.edu: On 1/23/2010 5:01 PM, Anne Archibald wrote: If both arrays are C contiguous, or more generally contiguous blocks of memory with the same strided structure, you might get faster copying by flattening them first, so that it can go in a single memcpy(). I may misuderstand this. Did you just mean x.flat = y.flat ? No, .flat constructs an iterator that traverses the object as if it were flat. I had in mind accessing the underlying data through views that were flat: In [3]: x = np.random.random((1000,1000)) In [4]: y = np.random.random((1000,1000)) In [5]: xf = x.view() In [6]: xf.shape = (-1,) In [7]: yf = y.view() In [8]: yf.shape = (-1,) In [9]: yf[:] = xf[:] This may still use a loop instead of a memcpy(), in which case you'd want to look for an explicit memcpy()-based implementation, but when manipulating multidimensional arrays you have (in principle, anyway) nested loops which may not be executed in the cache-optimal order. Ideally numpy would automatically notice when operations can be done on flattened versions of arrays and get rid of some of the looping and indexing, but I wouldn't count on it. At one point I remember finding that the loops were reordered not for cache coherence but to make the inner loop over the biggest dimension (to minimize looping overhead). Anne If so, I find that to be *much* slower. Thanks, Alan x = np.random.random((1000,1000)) y = x.copy() t0 = time.clock() for t in range(1000): x = y.copy() print(time.clock() - t0) t0 = time.clock() for t in range(1000): x[:,:] = y print(time.clock() - t0) t0 = time.clock() for t in range(1000): x.flat = y.flat print(time.clock() - t0) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast duplicate of array
On 1/23/2010 6:00 PM, Keith Goodman wrote: x = y.view() Thanks, but I'm not looking for a view. And I need x to own its data. Alan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast duplicate of array
On 1/23/2010 7:29 PM, Anne Archibald wrote: I had in mind accessing the underlying data through views that were flat: In [3]: x = np.random.random((1000,1000)) In [4]: y = np.random.random((1000,1000)) In [5]: xf = x.view() In [6]: xf.shape = (-1,) In [7]: yf = y.view() In [8]: yf.shape = (-1,) In [9]: yf[:] = xf[:] Yup, that's a bit faster. Thanks, Alan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion