Re: [Numpy-discussion] argsort speed
On Fri, Feb 21, 2014 at 11:09 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Feb 21, 2014 at 10:35 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: On Mon, Feb 17, 2014 at 11:40 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor jtaylor.deb...@googlemail.com wrote: On 17.02.2014 15:18, Francesc Alted wrote: On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote: On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. Thanks, both. I also gain a second with mergesort. matlab would be nicer in my case, it returns both. I still need to use the argsort to index into the array to also get the sorted array. Many years ago I needed something similar, so I made some functions for sorting and argsorting in one single shot. Maybe you want to reuse them. Here it is an example of the C implementation: https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 and here the Cython wrapper for all of them: https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 Francesc that doesn't really make a big difference if the data is randomly distributed. the sorting operation is normally much more expensive than latter applying the indices: In [1]: d = np.arange(1000) In [2]: np.random.shuffle(d) In [3]: %timeit np.argsort(d) 1 loops, best of 3: 1.99 s per loop In [4]: idx = np.argsort(d) In [5]: %timeit d[idx] 1 loops, best of 3: 213 ms per loop But if your data is not random it can make a difference as even quicksort can be a lot faster then. timsort would be a nice addition to numpy, it performs very well for partially sorted data. Unfortunately its quite complicated to implement. Quicksort and shellsort gain speed by having simple inner loops. I have the impression that timsort is optimal when compares and memory access are expensive, but I haven't seen any benchmarks for native types in contiguous memory. I found some benchmarks for continuous memory here: https://github.com/swenson/sort/ https://github.com/gfx/cpp-TimSort The first one seems the best, it probably can be directly reused in numpy. The only issue is that it only sorts the array, but does not provide argsort. I'm impressed by the heapsort time. Heapsort is the slowest of the numpy sorts. So either the heapsort implementation is better than ours or the other sort are worse ;) Partially sorted sequence are pretty common, so timsort might be a worthy addition. Last time I looked, JDK was using timsort for sorting objects, and quicksort for native types. Another sort is dual pivot quicksort that I've heard some good things about. Adding indirect sorts isn't so difficult once the basic sort is available. Since the memory access tends to be larger as it gets randomly accessed, timsort might be a good choice for that. Indeed, I think one has to be very careful about these benchmarks, since it highly depends on the structure of the arrays being sorted. I've been looking into this a bit, since I need some fast algorithm in Fortran, that returns indices that sort the array. So far I use quicksort, but this Timsort might perform better for partially sorted arrays, which is typically my use case. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Mon, Feb 17, 2014 at 11:40 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor jtaylor.deb...@googlemail.com wrote: On 17.02.2014 15:18, Francesc Alted wrote: On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote: On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. Thanks, both. I also gain a second with mergesort. matlab would be nicer in my case, it returns both. I still need to use the argsort to index into the array to also get the sorted array. Many years ago I needed something similar, so I made some functions for sorting and argsorting in one single shot. Maybe you want to reuse them. Here it is an example of the C implementation: https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 and here the Cython wrapper for all of them: https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 Francesc that doesn't really make a big difference if the data is randomly distributed. the sorting operation is normally much more expensive than latter applying the indices: In [1]: d = np.arange(1000) In [2]: np.random.shuffle(d) In [3]: %timeit np.argsort(d) 1 loops, best of 3: 1.99 s per loop In [4]: idx = np.argsort(d) In [5]: %timeit d[idx] 1 loops, best of 3: 213 ms per loop But if your data is not random it can make a difference as even quicksort can be a lot faster then. timsort would be a nice addition to numpy, it performs very well for partially sorted data. Unfortunately its quite complicated to implement. Quicksort and shellsort gain speed by having simple inner loops. I have the impression that timsort is optimal when compares and memory access are expensive, but I haven't seen any benchmarks for native types in contiguous memory. I found some benchmarks for continuous memory here: https://github.com/swenson/sort/ https://github.com/gfx/cpp-TimSort The first one seems the best, it probably can be directly reused in numpy. The only issue is that it only sorts the array, but does not provide argsort. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Fri, Feb 21, 2014 at 10:35 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote: On Mon, Feb 17, 2014 at 11:40 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor jtaylor.deb...@googlemail.com wrote: On 17.02.2014 15:18, Francesc Alted wrote: On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote: On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. Thanks, both. I also gain a second with mergesort. matlab would be nicer in my case, it returns both. I still need to use the argsort to index into the array to also get the sorted array. Many years ago I needed something similar, so I made some functions for sorting and argsorting in one single shot. Maybe you want to reuse them. Here it is an example of the C implementation: https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 and here the Cython wrapper for all of them: https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 Francesc that doesn't really make a big difference if the data is randomly distributed. the sorting operation is normally much more expensive than latter applying the indices: In [1]: d = np.arange(1000) In [2]: np.random.shuffle(d) In [3]: %timeit np.argsort(d) 1 loops, best of 3: 1.99 s per loop In [4]: idx = np.argsort(d) In [5]: %timeit d[idx] 1 loops, best of 3: 213 ms per loop But if your data is not random it can make a difference as even quicksort can be a lot faster then. timsort would be a nice addition to numpy, it performs very well for partially sorted data. Unfortunately its quite complicated to implement. Quicksort and shellsort gain speed by having simple inner loops. I have the impression that timsort is optimal when compares and memory access are expensive, but I haven't seen any benchmarks for native types in contiguous memory. I found some benchmarks for continuous memory here: https://github.com/swenson/sort/ https://github.com/gfx/cpp-TimSort The first one seems the best, it probably can be directly reused in numpy. The only issue is that it only sorts the array, but does not provide argsort. I'm impressed by the heapsort time. Heapsort is the slowest of the numpy sorts. So either the heapsort implementation is better than ours or the other sort are worse ;) Partially sorted sequence are pretty common, so timsort might be a worthy addition. Last time I looked, JDK was using timsort for sorting objects, and quicksort for native types. Another sort is dual pivot quicksort that I've heard some good things about. Adding indirect sorts isn't so difficult once the basic sort is available. Since the memory access tends to be larger as it gets randomly accessed, timsort might be a good choice for that. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote: On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. Thanks, both. I also gain a second with mergesort. matlab would be nicer in my case, it returns both. I still need to use the argsort to index into the array to also get the sorted array. Many years ago I needed something similar, so I made some functions for sorting and argsorting in one single shot. Maybe you want to reuse them. Here it is an example of the C implementation: https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 and here the Cython wrapper for all of them: https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 Francesc Josef I seem unable to find the code for ndarray.sort, so I can't check. I have tried to grep it tring all possible combinations of def ndarray, self.sort, etc. Where is it? /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Mon, Feb 17, 2014 at 9:18 AM, Francesc Alted franc...@continuum.io wrote: On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote: On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. Thanks, both. I also gain a second with mergesort. matlab would be nicer in my case, it returns both. I still need to use the argsort to index into the array to also get the sorted array. Many years ago I needed something similar, so I made some functions for sorting and argsorting in one single shot. Maybe you want to reuse them. Here it is an example of the C implementation: https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 and here the Cython wrapper for all of them: https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 Thanks Francesc That would be very useful to have. However, I don't speak C, and it would require too much maintenance time if I were to include that in statsmodels. I'm trying to concentrate more on stats (and my backlog there), and leave other parts to developers that know those better. Josef Francesc Josef I seem unable to find the code for ndarray.sort, so I can't check. I have tried to grep it tring all possible combinations of def ndarray, self.sort, etc. Where is it? /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On 17.02.2014 15:18, Francesc Alted wrote: On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote: On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. Thanks, both. I also gain a second with mergesort. matlab would be nicer in my case, it returns both. I still need to use the argsort to index into the array to also get the sorted array. Many years ago I needed something similar, so I made some functions for sorting and argsorting in one single shot. Maybe you want to reuse them. Here it is an example of the C implementation: https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 and here the Cython wrapper for all of them: https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 Francesc that doesn't really make a big difference if the data is randomly distributed. the sorting operation is normally much more expensive than latter applying the indices: In [1]: d = np.arange(1000) In [2]: np.random.shuffle(d) In [3]: %timeit np.argsort(d) 1 loops, best of 3: 1.99 s per loop In [4]: idx = np.argsort(d) In [5]: %timeit d[idx] 1 loops, best of 3: 213 ms per loop But if your data is not random it can make a difference as even quicksort can be a lot faster then. timsort would be a nice addition to numpy, it performs very well for partially sorted data. Unfortunately its quite complicated to implement. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor jtaylor.deb...@googlemail.com wrote: On 17.02.2014 15:18, Francesc Alted wrote: On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote: On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. Thanks, both. I also gain a second with mergesort. matlab would be nicer in my case, it returns both. I still need to use the argsort to index into the array to also get the sorted array. Many years ago I needed something similar, so I made some functions for sorting and argsorting in one single shot. Maybe you want to reuse them. Here it is an example of the C implementation: https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 and here the Cython wrapper for all of them: https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 Francesc that doesn't really make a big difference if the data is randomly distributed. the sorting operation is normally much more expensive than latter applying the indices: In [1]: d = np.arange(1000) In [2]: np.random.shuffle(d) In [3]: %timeit np.argsort(d) 1 loops, best of 3: 1.99 s per loop In [4]: idx = np.argsort(d) In [5]: %timeit d[idx] 1 loops, best of 3: 213 ms per loop But if your data is not random it can make a difference as even quicksort can be a lot faster then. timsort would be a nice addition to numpy, it performs very well for partially sorted data. Unfortunately its quite complicated to implement. Quicksort and shellsort gain speed by having simple inner loops. I have the impression that timsort is optimal when compares and memory access are expensive, but I haven't seen any benchmarks for native types in contiguous memory. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] argsort speed
currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
My guess; First of all, you are actually manipulating twice as much data as opposed to an inplace sort. Moreover, an inplace sort gains locality as it is being sorted, whereas the argsort is continuously making completely random memory accesses. -Original Message- From: josef.p...@gmail.com Sent: Sunday, February 16, 2014 11:43 PM To: Discussion of Numerical Python Subject: [Numpy-discussion] argsort speed currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. I seem unable to find the code for ndarray.sort, so I can't check. I have tried to grep it tring all possible combinations of def ndarray, self.sort, etc. Where is it? /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On 17 February 2014 00:12, Daπid davidmen...@gmail.com wrote: I seem unable to find the code for ndarray.sort, so I can't check. I have tried to grep it tring all possible combinations of def ndarray, self.sort, etc. Where is it? Nevermind, it is in core/src/multiarray/methods.c ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: My guess; First of all, you are actually manipulating twice as much data as opposed to an inplace sort. Moreover, an inplace sort gains locality as it is being sorted, whereas the argsort is continuously making completely random memory accesses. -Original Message- From: josef.p...@gmail.com Sent: Sunday, February 16, 2014 11:43 PM To: Discussion of Numerical Python Subject: [Numpy-discussion] argsort speed currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. I was using np.sort, inplace sort is only a little bit faster It looks like sorting first, and then argsorting is faster than argsort alon pvals.sort() sortind = np.argsort(pvals) replacing the inplace sort in the above reduces speed only a bit -- import time use_master = True if use_master: import sys sys.path.insert(0, rE:\Josef\!reps\numpy\dist\Programs\Python27\Lib\site-packages) import numpy as np print np.__version__ =, np.__version__ n = 5300 pvals = np.random.rand(n**2) t0 = time.time() p = np.sort(pvals) t1 = time.time() sortind = np.argsort(pvals) t2 = time.time() pvals.sort() sortind = np.argsort(pvals) t3 = time.time() print t1 - t0, t2 - t1, t3 - t2 print (t2 - t1) * 1. / (t1 - t0), (t3 - t2) * 1. / (t1 - t0) np.__version__ = 1.9.0.dev-2868dc4 3.91900014877 9.556218 4.92900013924 2.43863219163 1.2577187936 Josef Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Sun, Feb 16, 2014 at 6:15 PM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: My guess; First of all, you are actually manipulating twice as much data as opposed to an inplace sort. Moreover, an inplace sort gains locality as it is being sorted, whereas the argsort is continuously making completely random memory accesses. -Original Message- From: josef.p...@gmail.com Sent: Sunday, February 16, 2014 11:43 PM To: Discussion of Numerical Python Subject: [Numpy-discussion] argsort speed currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. I was using np.sort, inplace sort is only a little bit faster It looks like sorting first, and then argsorting is faster than argsort alon pvals.sort() sortind = np.argsort(pvals) Ok, that was useless, that won't be anything I want. Josef replacing the inplace sort in the above reduces speed only a bit -- import time use_master = True if use_master: import sys sys.path.insert(0, rE:\Josef\!reps\numpy\dist\Programs\Python27\Lib\site-packages) import numpy as np print np.__version__ =, np.__version__ n = 5300 pvals = np.random.rand(n**2) t0 = time.time() p = np.sort(pvals) t1 = time.time() sortind = np.argsort(pvals) t2 = time.time() pvals.sort() sortind = np.argsort(pvals) t3 = time.time() print t1 - t0, t2 - t1, t3 - t2 print (t2 - t1) * 1. / (t1 - t0), (t3 - t2) * 1. / (t1 - t0) np.__version__ = 1.9.0.dev-2868dc4 3.91900014877 9.556218 4.92900013924 2.43863219163 1.2577187936 Josef Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote: On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. Thanks, both. I also gain a second with mergesort. matlab would be nicer in my case, it returns both. I still need to use the argsort to index into the array to also get the sorted array. Josef I seem unable to find the code for ndarray.sort, so I can't check. I have tried to grep it tring all possible combinations of def ndarray, self.sort, etc. Where is it? /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Sun, Feb 16, 2014 at 4:18 PM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 6:15 PM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: My guess; First of all, you are actually manipulating twice as much data as opposed to an inplace sort. Moreover, an inplace sort gains locality as it is being sorted, whereas the argsort is continuously making completely random memory accesses. -Original Message- From: josef.p...@gmail.com Sent: Sunday, February 16, 2014 11:43 PM To: Discussion of Numerical Python Subject: [Numpy-discussion] argsort speed currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. I was using np.sort, inplace sort is only a little bit faster It looks like sorting first, and then argsorting is faster than argsort alon pvals.sort() sortind = np.argsort(pvals) Ok, that was useless, that won't be anything I want. I think locality is the most important thing. The argsort routines used to move both the indexes and the array to argsort, the new ones only move the indexes. It is a tradeoff, twice as many moves vs locality. It's probably possible to invent an algorithm that mixes the two. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Sun, Feb 16, 2014 at 7:13 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Feb 16, 2014 at 4:18 PM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 6:15 PM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: My guess; First of all, you are actually manipulating twice as much data as opposed to an inplace sort. Moreover, an inplace sort gains locality as it is being sorted, whereas the argsort is continuously making completely random memory accesses. -Original Message- From: josef.p...@gmail.com Sent: Sunday, February 16, 2014 11:43 PM To: Discussion of Numerical Python Subject: [Numpy-discussion] argsort speed currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. I was using np.sort, inplace sort is only a little bit faster It looks like sorting first, and then argsorting is faster than argsort alon pvals.sort() sortind = np.argsort(pvals) Ok, that was useless, that won't be anything I want. I think locality is the most important thing. The argsort routines used to move both the indexes and the array to argsort, the new ones only move the indexes. It is a tradeoff, twice as many moves vs locality. It's probably possible to invent an algorithm that mixes the two. If that's the way it is, then that's the way it is. I just never realized that argsort can take so long, since I usually use only smaller arrays. I was surprised that argsort took almost 10 out of around 12 seconds in my function, and this after I cleaned up my code and removed duplicate and avoidable argsorts. Josef snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Sun, Feb 16, 2014 at 4:12 PM, Daπid davidmen...@gmail.com wrote: On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop Interesting. I get slightly different results quicksort 1 loops, best of 3: 3.25 s per loop 1 loops, best of 3: 6.16 s per loop mergesort 1 loops, best of 3: 3.99 s per loop 1 loops, best of 3: 6.97 s per loop heapsort 1 loops, best of 3: 10.1 s per loop 1 loops, best of 3: 29.3 s per loop Possibly faster memory here. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort
Hi, Thanks everybody for all the answers that make perfect sense when axis=0. Now suppose I want to sort the array in such a way that each row is sorted individually. Then I suppose I should do this: from numpy import * v = array([[4,3], [1,12], [23,7], [11,6], [8,9]]) idx = argsort(v, axis=1) idx is then [[1 0] [0 1] [1 0] [1 0] [0 1]] which makes sense, since these are the indices in an order that would sort each row. But when I try a[idx, variuos_additional_arguments] I just get strange results. Anybody that can point me towards the correct solution. Best regards, Mads On 01/15/2013 09:53 PM, eat wrote: Hi, On Tue, Jan 15, 2013 at 1:50 PM, Mads Ipsen madsip...@gmail.com mailto:madsip...@gmail.com wrote: Hi, I simply can't understand this. I'm trying to use argsort to produce indices that can be used to sort an array: from numpy import * indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]]) args = argsort(indices, axis=0) print indices[args] gives: [[[ 1 12] [ 4 3]] [[ 4 3] [11 6]] [[ 8 9] [23 7]] [[11 6] [ 8 9]] [[23 7] [ 1 12]]] I thought this should produce a sorted version of the indices array. Any help is appreciated. Perhaps these three different point of views will help you a little bit more to move on: In []: x Out[]: array([[ 4, 3], [ 1, 12], [23, 7], [11, 6], [ 8, 9]]) In []: ind= x.argsort(axis= 0) In []: ind Out[]: array([[1, 0], [0, 3], [4, 2], [3, 4], [2, 1]]) In []: x[ind[:, 0]] Out[]: array([[ 1, 12], [ 4, 3], [ 8, 9], [11, 6], [23, 7]]) In []: x[ind[:, 1]] Out[]: array([[ 4, 3], [11, 6], [23, 7], [ 8, 9], [ 1, 12]]) In []: x[ind, [0, 1]] Out[]: array([[ 1, 3], [ 4, 6], [ 8, 7], [11, 9], [23, 12]]) -eat Best regards, Mads -- +-+ | Mads Ipsen | +--+--+ | Gåsebæksvej 7, 4. tv | | | DK-2500 Valby| phone:+45-29716388 tel:%2B45-29716388 | | Denmark | email:mads.ip...@gmail.com mailto:mads.ip...@gmail.com | +--+--+ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- +-+ | Mads Ipsen | +--+--+ | Gåsebæksvej 7, 4. tv | | | DK-2500 Valby| phone: +45-29716388 | | Denmark | email: mads.ip...@gmail.com | +--+--+ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort
On Wed, Jan 16, 2013 at 9:30 AM, Mads Ipsen madsip...@gmail.com wrote: Hi, Thanks everybody for all the answers that make perfect sense when axis=0. Now suppose I want to sort the array in such a way that each row is sorted individually. Then I suppose I should do this: from numpy import * v = array([[4,3], [1,12], [23,7], [11,6], [8,9]]) idx = argsort(v, axis=1) idx is then [[1 0] [0 1] [1 0] [1 0] [0 1]] which makes sense, since these are the indices in an order that would sort each row. But when I try a[idx, variuos_additional_arguments] I just get strange results. Anybody that can point me towards the correct solution. Please have a look at the documentation again. If idx has indices for the second axis, you need to put it into the second place. http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing [~] |4 idx0 = np.arange(v.shape[0])[:,np.newaxis] [~] |5 idx0 array([[0], [1], [2], [3], [4]]) [~] |7 v[idx0, idx] array([[ 3, 4], [ 1, 12], [ 7, 23], [ 6, 11], [ 8, 9]]) -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] argsort
Hi, I simply can't understand this. I'm trying to use argsort to produce indices that can be used to sort an array: from numpy import * indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]]) args = argsort(indices, axis=0) print indices[args] gives: [[[ 1 12] [ 4 3]] [[ 4 3] [11 6]] [[ 8 9] [23 7]] [[11 6] [ 8 9]] [[23 7] [ 1 12]]] I thought this should produce a sorted version of the indices array. Any help is appreciated. Best regards, Mads -- +-+ | Mads Ipsen | +--+--+ | Gåsebæksvej 7, 4. tv | | | DK-2500 Valby| phone: +45-29716388 | | Denmark | email: mads.ip...@gmail.com | +--+--+ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort
On Tue, Jan 15, 2013 at 4:50 AM, Mads Ipsen madsip...@gmail.com wrote: Hi, I simply can't understand this. I'm trying to use argsort to produce indices that can be used to sort an array: from numpy import * indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]]) args = argsort(indices, axis=0) print indices[args] gives: [[[ 1 12] [ 4 3]] [[ 4 3] [11 6]] [[ 8 9] [23 7]] [[11 6] [ 8 9]] [[23 7] [ 1 12]]] I thought this should produce a sorted version of the indices array. Any help is appreciated. Fancy indexing is a funny creature and not easy to understand in more than one dimension. What is happening is that each index is replaced by the corresponding row of a and the result is of shape (5,2,2). To do what you want to do: In [20]: a[i, [[0,1]]*5] Out[20]: array([[ 1, 3], [ 4, 6], [ 8, 7], [11, 9], [23, 12]]) I agree that there should be an easier way to do this. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort
On Tue, Jan 15, 2013 at 3:44 PM, Charles R Harris charlesr.har...@gmail.com wrote: Fancy indexing is a funny creature and not easy to understand in more than one dimension. What is happening is that each index is replaced by the corresponding row of a and the result is of shape (5,2,2). To do what you want to do: In [20]: a[i, [[0,1]]*5] Out[20]: array([[ 1, 3], [ 4, 6], [ 8, 7], [11, 9], [23, 12]]) I agree that there should be an easier way to do this. Slightly easier, though no more transparent: a[i, [0,1]] http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort
Hi, On Tue, Jan 15, 2013 at 1:50 PM, Mads Ipsen madsip...@gmail.com wrote: Hi, I simply can't understand this. I'm trying to use argsort to produce indices that can be used to sort an array: from numpy import * indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]]) args = argsort(indices, axis=0) print indices[args] gives: [[[ 1 12] [ 4 3]] [[ 4 3] [11 6]] [[ 8 9] [23 7]] [[11 6] [ 8 9]] [[23 7] [ 1 12]]] I thought this should produce a sorted version of the indices array. Any help is appreciated. Perhaps these three different point of views will help you a little bit more to move on: In []: x Out[]: array([[ 4, 3], [ 1, 12], [23, 7], [11, 6], [ 8, 9]]) In []: ind= x.argsort(axis= 0) In []: ind Out[]: array([[1, 0], [0, 3], [4, 2], [3, 4], [2, 1]]) In []: x[ind[:, 0]] Out[]: array([[ 1, 12], [ 4, 3], [ 8, 9], [11, 6], [23, 7]]) In []: x[ind[:, 1]] Out[]: array([[ 4, 3], [11, 6], [23, 7], [ 8, 9], [ 1, 12]]) In []: x[ind, [0, 1]] Out[]: array([[ 1, 3], [ 4, 6], [ 8, 7], [11, 9], [23, 12]]) -eat Best regards, Mads -- +-+ | Mads Ipsen | +--+--+ | Gåsebæksvej 7, 4. tv | | | DK-2500 Valby| phone: +45-29716388 | | Denmark | email: mads.ip...@gmail.com | +--+--+ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Argsort
Hi, Ok, what am I missing here: x = np.array([[4,2],[5,3]]) x[x.argsort(1)] array([[[5, 3], [4, 2]], [[5, 3], [4, 2]]]) I was expecting: array([[2,4],[3,5]]) Certainly not a 3D array. What am I doing wrong? Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort in descending order
On Fri, 05 Sep 2008 07:02:38 -0700, SimonPalmer wrote: another newb question I suspect, but is there a way to instruct argsort to sort in descending order or should I just sort and reverse? Just sort and subtract to get the reverse order. I can think of two reasonable ways to do it with no copying: from numpy import * a = rand(100) # this does NOT make a copy of a, simply indexes it backward, and should # be very fast reverseorder = argsort(a[::-1]) # this way doesn't make a copy either, and probably is a little slower # since it has to do many subtractions reverseorder = len(a)-1-argsort(a) HTH, Dan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] argsort in descending order
another newb question I suspect, but is there a way to instruct argsort to sort in descending order or should I just sort and reverse? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort in descending order
On Fri, Sep 5, 2008 at 8:02 AM, SimonPalmer [EMAIL PROTECTED] wrote: another newb question I suspect, but is there a way to instruct argsort to sort in descending order or should I just sort and reverse? You'll just have to reverse the indices. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort memory problem?
* Robin [EMAIL PROTECTED] [2008-01-29 19:23:11 +]: On Jan 29, 2008 7:16 PM, Lou Pecora [EMAIL PROTECTED] wrote: Hmmm... Interesting. I am using Python 2.4.4. It would be nice to have other Mac people with same/other Python and numpy versions try the argsort bug code. I don't see any memory leak with the test code. Mac OS X 10.5.1 Python 2.5.1 (not apple one) Numpy 1.0.5.dev4722 I have run the test1 code again, this time on my laptop PC (no MAC-user, sorry) using the last stable numpy release. The memory problem does _not_ show up now. I'm running with: - Ubuntu Feisty (kernel 2.6.17-12-generic i686) - python 2.5.1 (Feisty package) - numpy 1.0.4 (compiled with gcc version 4.1.2) However, the memory leak appears on my laptop if I use python 2.5.1 with numpy 1.0.1. At least here, this seems to be an issue dependent only on the numpy version, and a solved one. -- oriol ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort memory problem?
This still occurs in numpy 1.0.3.1 so must have been fixed between that and your 1.0.4-5 version. By the way the memory problem crashes my Intel Mac Book Pro (system 10.4.11) with the gray screen and black dialog box telling me to restart my computer. A very UN-unix like and UN-Mac like way of handling a memory problem IMHO. Let us Mac people not be too smug. -- Lou Pecora --- Alexandre Fayolle [EMAIL PROTECTED] wrote: On Tue, Jan 29, 2008 at 02:58:15PM +0100, Oriol Vendrell wrote: Hi all, I've noticed something that looks like an odd behaviour in array.argsort(). # test1 - from numpy import array while True: a=array([8.0,7.0,6.0,5.0,4.0,2.0]) i=a.argsort() # --- # test2 - from numpy import array a=array([8.0,7.0,6.0,5.0,4.0,2.0]) while True: i=a.argsort() # --- test1 runs out of memory after a few minutes, it seems that in each cycle some memory is allocated and never returned back. test2 runs fine until killed. I'm unsure if I'm missing something or if this could be a bug. I'm using numpy 1.0.1 with python 2.4.4 in a debian stable system. Certainly a bug, but it has been fixed and I cannot reproduce in debian sid (using 1.0.4-5) -- Alexandre Fayolle Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort memory problem?
On Tue, Jan 29, 2008 at 02:58:15PM +0100, Oriol Vendrell wrote: Hi all, I've noticed something that looks like an odd behaviour in array.argsort(). # test1 - from numpy import array while True: a=array([8.0,7.0,6.0,5.0,4.0,2.0]) i=a.argsort() # --- # test2 - from numpy import array a=array([8.0,7.0,6.0,5.0,4.0,2.0]) while True: i=a.argsort() # --- test1 runs out of memory after a few minutes, it seems that in each cycle some memory is allocated and never returned back. test2 runs fine until killed. I'm unsure if I'm missing something or if this could be a bug. I'm using numpy 1.0.1 with python 2.4.4 in a debian stable system. Certainly a bug, but it has been fixed and I cannot reproduce in debian sid (using 1.0.4-5) -- Alexandre Fayolle LOGILAB, Paris (France) Formations Python, Zope, Plone, Debian: http://www.logilab.fr/formations Développement logiciel sur mesure: http://www.logilab.fr/services Informatique scientifique: http://www.logilab.fr/science signature.asc Description: Digital signature ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] argsort memory problem?
Hi all, I've noticed something that looks like an odd behaviour in array.argsort(). # test1 - from numpy import array while True: a=array([8.0,7.0,6.0,5.0,4.0,2.0]) i=a.argsort() # --- # test2 - from numpy import array a=array([8.0,7.0,6.0,5.0,4.0,2.0]) while True: i=a.argsort() # --- test1 runs out of memory after a few minutes, it seems that in each cycle some memory is allocated and never returned back. test2 runs fine until killed. I'm unsure if I'm missing something or if this could be a bug. I'm using numpy 1.0.1 with python 2.4.4 in a debian stable system. Any suggestions? Thanks, -- Oriol Vendrell [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort memory problem?
On Jan 29, 2008 7:16 PM, Lou Pecora [EMAIL PROTECTED] wrote: Hmmm... Interesting. I am using Python 2.4.4. It would be nice to have other Mac people with same/other Python and numpy versions try the argsort bug code. I don't see any memory leak with the test code. Mac OS X 10.5.1 Python 2.5.1 (not apple one) Numpy 1.0.5.dev4722 Robin ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort memory problem?
Hmmm... Interesting. I am using Python 2.4.4. It would be nice to have other Mac people with same/other Python and numpy versions try the argsort bug code. -- Lou Pecora --- Francesc Altet [EMAIL PROTECTED] wrote: A Tuesday 29 January 2008, Lou Pecora escrigué: This still occurs in numpy 1.0.3.1 so must have been fixed between that and your 1.0.4-5 version. It works here and I'm using NumPy 1.0.3, Python 2.5.1 on a Ubuntu 7.10 / Pentium4 machine. By the way the memory problem crashes my Intel Mac Book Pro (system 10.4.11) with the gray screen and black dialog box telling me to restart my computer. A very UN-unix like and UN-Mac like way of handling a memory problem IMHO. Let us Mac people not be too smug. Um, it would be nice if some other Mac-user can reproduce your problem. Perhaps you are suffering some other problem that can be exposed by this code snip. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ -- Lou Pecora, my views are my own. Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] argsort and take along arbitrary axes
Hello all, I've got a few questions that came up as I tried to calculate various statistics about an image time-series. For example, I have an array of shape (t,x,y) representing t frames of a time-lapse of resolution (x,y). Now, say I want to both argsort and sort this time-series, pixel- wise. (For example.) In 1-d it's easy: indices = a.argsort() sorted = a[indices] I would have thought that doing this on my 3-d array would work similarly: indices = a.argsort(axis=0) sorted = a.take(indices, axis=0) Unfortunately, this gives a ValueError of dimensions too large. Now, I know that 'a.sort(axis=0)' works fine for the given example, but I'm curious about how to this sort of indexing operation in the general case. Thanks for any insight, Zach Pincus Program in Biomedical Informatics and Department of Biochemistry Stanford University School of Medicine ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort and take along arbitrary axes
I've got a few questions that came up as I tried to calculate various statistics about an image time-series. For example, I have an array of shape (t,x,y) representing t frames of a time-lapse of resolution (x,y). Now, say I want to both argsort and sort this time-series, pixel- wise. (For example.) In 1-d it's easy: indices = a.argsort() sorted = a[indices] I would have thought that doing this on my 3-d array would work similarly: indices = a.argsort(axis=0) sorted = a.take(indices, axis=0) Unfortunately, this gives a ValueError of dimensions too large. Now, I know that 'a.sort(axis=0)' works fine for the given example, but I'm curious about how to this sort of indexing operation in the general case. Unfortunately, argsort doesn't work transparently with take or fancy indexing for multidimensional arrays. I am thinking of adding a function argtake for this, and also for the results returned by argmax and argmin, but at the moment you have to fill in the values of the other indices and use fancy indexing. For now, it is probably simpler, prettier, and faster to just sort the array. Thanks Charles. Unfortunately, the argsort/sort buisness was, as I mentioned, just an example of the kind of 'take' operation that I am trying to figure out how to do. There are other operations that will have similarly-formatted 'indices' arrays (as above) that aren't generated from argsort... As such, how do I fill in the values of the other indices and use fancy indexing? Even after reading the numpy book about that, and reading the docstring for numpy.take, I'm still vague on this. Would I use numpy.indices to get a list of index arrays, and then swap in (at the correct position in this list) the result of argsort (or the other operations), and use that for fancy indexing? Is there an easier/faster way? Thanks again, Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort and take along arbitrary axes
On 5/14/07, Zachary Pincus [EMAIL PROTECTED] wrote: I've got a few questions that came up as I tried to calculate various statistics about an image time-series. For example, I have an array of shape (t,x,y) representing t frames of a time-lapse of resolution (x,y). Now, say I want to both argsort and sort this time-series, pixel- wise. (For example.) In 1-d it's easy: indices = a.argsort() sorted = a[indices] I would have thought that doing this on my 3-d array would work similarly: indices = a.argsort(axis=0) sorted = a.take(indices, axis=0) Unfortunately, this gives a ValueError of dimensions too large. Now, I know that 'a.sort(axis=0)' works fine for the given example, but I'm curious about how to this sort of indexing operation in the general case. Unfortunately, argsort doesn't work transparently with take or fancy indexing for multidimensional arrays. I am thinking of adding a function argtake for this, and also for the results returned by argmax and argmin, but at the moment you have to fill in the values of the other indices and use fancy indexing. For now, it is probably simpler, prettier, and faster to just sort the array. Thanks Charles. Unfortunately, the argsort/sort buisness was, as I mentioned, just an example of the kind of 'take' operation that I am trying to figure out how to do. There are other operations that will have similarly-formatted 'indices' arrays (as above) that aren't generated from argsort... As such, how do I fill in the values of the other indices and use fancy indexing? Even after reading the numpy book about that, and reading the docstring for numpy.take, I'm still vague on this. Would I use numpy.indices to get a list of index arrays, and then swap in (at the correct position in this list) the result of argsort (or the other operations), and use that for fancy indexing? Is there an easier/faster way? I've attached a quick, mostly untested, version of argtake. It's in Python, probably not too fast, but see if it works for you. Chuck import numpy as N def argtake(arr, ind, axis=-1) : Take using output of argsort *Description* The usual output of argsort is not easily used with take when the relevent array is multidimensional. This is a quick and dirty standin. *Parameters*: arr : ndarray The array from which the elements are taken. Typically this will be the same array to which argsort was applied. ind : ndarray Indices returned by argsort or lexsort. axis : integer Axis to which argsort or lexsort was applied. Must match the original call. Defaults to -1. *Returns*: reordered_array : same type as arr The input array reordered by ind along the axis. if arr.shape != ind.shape : raise shape mismatch if arr.ndim == 1 : return N.take(arr, ind) else : naxis = arr.shape[axis] aswap = N.swapaxes(arr, axis, -1) iswap = N.swapaxes(ind, axis, -1) shape = aswap.shape aswap = aswap.reshape(-1, naxis) iswap = iswap.reshape(-1, naxis) oswap = N.empty_like(aswap) for i in xrange(len(aswap)) : N.take(aswap[i], iswap[i], out=oswap[i]) oswap = oswap.reshape(shape) oswap = N.swapaxes(oswap, axis, -1) return oswap ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion