Re: [Numpy-discussion] argsort speed

2014-02-24 Thread Ondřej Čertík
On Fri, Feb 21, 2014 at 11:09 PM, Charles R Harris
charlesr.har...@gmail.com wrote:



 On Fri, Feb 21, 2014 at 10:35 PM, Ondřej Čertík ondrej.cer...@gmail.com
 wrote:

 On Mon, Feb 17, 2014 at 11:40 AM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
 
  On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor
  jtaylor.deb...@googlemail.com wrote:
 
  On 17.02.2014 15:18, Francesc Alted wrote:
   On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote:
   On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com
   wrote:
   On 16 February 2014 23:43, josef.p...@gmail.com wrote:
   What's the fastest argsort for a 1d array with around 28 Million
   elements, roughly uniformly distributed, random order?
  
   On numpy latest version:
  
   for kind in ['quicksort', 'mergesort', 'heapsort']:
print kind
%timeit np.sort(data, kind=kind)
%timeit np.argsort(data, kind=kind)
  
  
   quicksort
   1 loops, best of 3: 3.55 s per loop
   1 loops, best of 3: 10.3 s per loop
   mergesort
   1 loops, best of 3: 4.84 s per loop
   1 loops, best of 3: 9.49 s per loop
   heapsort
   1 loops, best of 3: 12.1 s per loop
   1 loops, best of 3: 39.3 s per loop
  
  
   It looks quicksort is quicker sorting, but mergesort is marginally
   faster
   sorting args. The diference is slim, but upon repetition, it
   remains
   significant.
  
   Why is that? Probably part of the reason is what Eelco said, and
   part
   is
   that in sort comparison are done accessing the array elements
   directly, but
   in argsort you have to index the array, introducing some overhead.
   Thanks, both.
  
   I also gain a second with mergesort.
  
   matlab would be nicer in my case, it returns both.
   I still need to use the argsort to index into the array to also get
   the sorted array.
  
   Many years ago I needed something similar, so I made some functions
   for
   sorting and argsorting in one single shot.  Maybe you want to reuse
   them.  Here it is an example of the C implementation:
  
   https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619
  
   and here the Cython wrapper for all of them:
  
  
  
   https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129
  
   Francesc
  
 
  that doesn't really make a big difference if the data is randomly
  distributed.
  the sorting operation is normally much more expensive than latter
  applying the indices:
 
  In [1]: d = np.arange(1000)
 
  In [2]: np.random.shuffle(d)
 
  In [3]: %timeit  np.argsort(d)
  1 loops, best of 3: 1.99 s per loop
 
  In [4]: idx = np.argsort(d)
 
  In [5]: %timeit d[idx]
  1 loops, best of 3: 213 ms per loop
 
 
 
  But if your data is not random it can make a difference as even
  quicksort can be a lot faster then.
  timsort would be a nice addition to numpy, it performs very well for
  partially sorted data. Unfortunately its quite complicated to
  implement.
 
 
  Quicksort and shellsort gain speed by having simple inner loops. I have
  the
  impression that timsort is optimal when compares and memory access are
  expensive, but I haven't seen any benchmarks for native types in
  contiguous
  memory.

 I found some benchmarks for continuous memory here:

 https://github.com/swenson/sort/
 https://github.com/gfx/cpp-TimSort

 The first one seems the best, it probably can be directly reused in numpy.
 The only issue is that it only sorts the array, but does not provide
 argsort.


 I'm impressed by the heapsort time. Heapsort is the slowest of the numpy
 sorts. So either the heapsort implementation is better than ours or the
 other sort are worse ;)

 Partially sorted sequence are pretty common, so timsort might be a worthy
 addition. Last time I looked, JDK was using timsort for sorting objects, and
 quicksort for native types. Another sort is dual pivot quicksort that I've
 heard some good things about.

 Adding indirect sorts isn't so difficult once the basic sort is available.
 Since the memory access tends to be larger as it gets randomly accessed,
 timsort might be a good choice for that.

Indeed, I think one has to be very careful about these benchmarks, since
it highly depends on the structure of the arrays being sorted.

I've been looking into this a bit, since I need some fast algorithm in Fortran,
that returns indices that sort the array. So far I use quicksort, but
this Timsort
might perform better for partially sorted arrays, which is typically
my use case.

Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-21 Thread Ondřej Čertík
On Mon, Feb 17, 2014 at 11:40 AM, Charles R Harris
charlesr.har...@gmail.com wrote:



 On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:

 On 17.02.2014 15:18, Francesc Alted wrote:
  On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote:
  On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote:
  On 16 February 2014 23:43, josef.p...@gmail.com wrote:
  What's the fastest argsort for a 1d array with around 28 Million
  elements, roughly uniformly distributed, random order?
 
  On numpy latest version:
 
  for kind in ['quicksort', 'mergesort', 'heapsort']:
   print kind
   %timeit np.sort(data, kind=kind)
   %timeit np.argsort(data, kind=kind)
 
 
  quicksort
  1 loops, best of 3: 3.55 s per loop
  1 loops, best of 3: 10.3 s per loop
  mergesort
  1 loops, best of 3: 4.84 s per loop
  1 loops, best of 3: 9.49 s per loop
  heapsort
  1 loops, best of 3: 12.1 s per loop
  1 loops, best of 3: 39.3 s per loop
 
 
  It looks quicksort is quicker sorting, but mergesort is marginally
  faster
  sorting args. The diference is slim, but upon repetition, it remains
  significant.
 
  Why is that? Probably part of the reason is what Eelco said, and part
  is
  that in sort comparison are done accessing the array elements
  directly, but
  in argsort you have to index the array, introducing some overhead.
  Thanks, both.
 
  I also gain a second with mergesort.
 
  matlab would be nicer in my case, it returns both.
  I still need to use the argsort to index into the array to also get
  the sorted array.
 
  Many years ago I needed something similar, so I made some functions for
  sorting and argsorting in one single shot.  Maybe you want to reuse
  them.  Here it is an example of the C implementation:
 
  https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619
 
  and here the Cython wrapper for all of them:
 
 
  https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129
 
  Francesc
 

 that doesn't really make a big difference if the data is randomly
 distributed.
 the sorting operation is normally much more expensive than latter
 applying the indices:

 In [1]: d = np.arange(1000)

 In [2]: np.random.shuffle(d)

 In [3]: %timeit  np.argsort(d)
 1 loops, best of 3: 1.99 s per loop

 In [4]: idx = np.argsort(d)

 In [5]: %timeit d[idx]
 1 loops, best of 3: 213 ms per loop



 But if your data is not random it can make a difference as even
 quicksort can be a lot faster then.
 timsort would be a nice addition to numpy, it performs very well for
 partially sorted data. Unfortunately its quite complicated to implement.


 Quicksort and shellsort gain speed by having simple inner loops. I have the
 impression that timsort is optimal when compares and memory access are
 expensive, but I haven't seen any benchmarks for native types in contiguous
 memory.

I found some benchmarks for continuous memory here:

https://github.com/swenson/sort/
https://github.com/gfx/cpp-TimSort

The first one seems the best, it probably can be directly reused in numpy.
The only issue is that it only sorts the array, but does not provide argsort.

Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-21 Thread Charles R Harris
On Fri, Feb 21, 2014 at 10:35 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote:

 On Mon, Feb 17, 2014 at 11:40 AM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
 
  On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor
  jtaylor.deb...@googlemail.com wrote:
 
  On 17.02.2014 15:18, Francesc Alted wrote:
   On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote:
   On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com
 wrote:
   On 16 February 2014 23:43, josef.p...@gmail.com wrote:
   What's the fastest argsort for a 1d array with around 28 Million
   elements, roughly uniformly distributed, random order?
  
   On numpy latest version:
  
   for kind in ['quicksort', 'mergesort', 'heapsort']:
print kind
%timeit np.sort(data, kind=kind)
%timeit np.argsort(data, kind=kind)
  
  
   quicksort
   1 loops, best of 3: 3.55 s per loop
   1 loops, best of 3: 10.3 s per loop
   mergesort
   1 loops, best of 3: 4.84 s per loop
   1 loops, best of 3: 9.49 s per loop
   heapsort
   1 loops, best of 3: 12.1 s per loop
   1 loops, best of 3: 39.3 s per loop
  
  
   It looks quicksort is quicker sorting, but mergesort is marginally
   faster
   sorting args. The diference is slim, but upon repetition, it remains
   significant.
  
   Why is that? Probably part of the reason is what Eelco said, and
 part
   is
   that in sort comparison are done accessing the array elements
   directly, but
   in argsort you have to index the array, introducing some overhead.
   Thanks, both.
  
   I also gain a second with mergesort.
  
   matlab would be nicer in my case, it returns both.
   I still need to use the argsort to index into the array to also get
   the sorted array.
  
   Many years ago I needed something similar, so I made some functions
 for
   sorting and argsorting in one single shot.  Maybe you want to reuse
   them.  Here it is an example of the C implementation:
  
   https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619
  
   and here the Cython wrapper for all of them:
  
  
  
 https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129
  
   Francesc
  
 
  that doesn't really make a big difference if the data is randomly
  distributed.
  the sorting operation is normally much more expensive than latter
  applying the indices:
 
  In [1]: d = np.arange(1000)
 
  In [2]: np.random.shuffle(d)
 
  In [3]: %timeit  np.argsort(d)
  1 loops, best of 3: 1.99 s per loop
 
  In [4]: idx = np.argsort(d)
 
  In [5]: %timeit d[idx]
  1 loops, best of 3: 213 ms per loop
 
 
 
  But if your data is not random it can make a difference as even
  quicksort can be a lot faster then.
  timsort would be a nice addition to numpy, it performs very well for
  partially sorted data. Unfortunately its quite complicated to implement.
 
 
  Quicksort and shellsort gain speed by having simple inner loops. I have
 the
  impression that timsort is optimal when compares and memory access are
  expensive, but I haven't seen any benchmarks for native types in
 contiguous
  memory.

 I found some benchmarks for continuous memory here:

 https://github.com/swenson/sort/
 https://github.com/gfx/cpp-TimSort

 The first one seems the best, it probably can be directly reused in numpy.
 The only issue is that it only sorts the array, but does not provide
 argsort.


I'm impressed by the heapsort time. Heapsort is the slowest of the numpy
sorts. So either the heapsort implementation is better than ours or the
other sort are worse ;)

Partially sorted sequence are pretty common, so timsort might be a worthy
addition. Last time I looked, JDK was using timsort for sorting objects,
and quicksort for native types. Another sort is dual pivot quicksort that
I've heard some good things about.

Adding indirect sorts isn't so difficult once the basic sort is available.
Since the memory access tends to be larger as it gets randomly accessed,
timsort might be a good choice for that.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-17 Thread Francesc Alted
On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote:
 On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote:
 On 16 February 2014 23:43, josef.p...@gmail.com wrote:
 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?

 On numpy latest version:

 for kind in ['quicksort', 'mergesort', 'heapsort']:
  print kind
  %timeit np.sort(data, kind=kind)
  %timeit np.argsort(data, kind=kind)


 quicksort
 1 loops, best of 3: 3.55 s per loop
 1 loops, best of 3: 10.3 s per loop
 mergesort
 1 loops, best of 3: 4.84 s per loop
 1 loops, best of 3: 9.49 s per loop
 heapsort
 1 loops, best of 3: 12.1 s per loop
 1 loops, best of 3: 39.3 s per loop


 It looks quicksort is quicker sorting, but mergesort is marginally faster
 sorting args. The diference is slim, but upon repetition, it remains
 significant.

 Why is that? Probably part of the reason is what Eelco said, and part is
 that in sort comparison are done accessing the array elements directly, but
 in argsort you have to index the array, introducing some overhead.
 Thanks, both.

 I also gain a second with mergesort.

 matlab would be nicer in my case, it returns both.
 I still need to use the argsort to index into the array to also get
 the sorted array.

Many years ago I needed something similar, so I made some functions for 
sorting and argsorting in one single shot.  Maybe you want to reuse 
them.  Here it is an example of the C implementation:

https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619

and here the Cython wrapper for all of them:

https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129

Francesc


 Josef


 I seem unable to find the code for ndarray.sort, so I can't check. I have
 tried to grep it tring all possible combinations of def ndarray,
 self.sort, etc. Where is it?


 /David.


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-17 Thread josef . pktd
On Mon, Feb 17, 2014 at 9:18 AM, Francesc Alted franc...@continuum.io wrote:
 On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote:
 On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote:
 On 16 February 2014 23:43, josef.p...@gmail.com wrote:
 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?

 On numpy latest version:

 for kind in ['quicksort', 'mergesort', 'heapsort']:
  print kind
  %timeit np.sort(data, kind=kind)
  %timeit np.argsort(data, kind=kind)


 quicksort
 1 loops, best of 3: 3.55 s per loop
 1 loops, best of 3: 10.3 s per loop
 mergesort
 1 loops, best of 3: 4.84 s per loop
 1 loops, best of 3: 9.49 s per loop
 heapsort
 1 loops, best of 3: 12.1 s per loop
 1 loops, best of 3: 39.3 s per loop


 It looks quicksort is quicker sorting, but mergesort is marginally faster
 sorting args. The diference is slim, but upon repetition, it remains
 significant.

 Why is that? Probably part of the reason is what Eelco said, and part is
 that in sort comparison are done accessing the array elements directly, but
 in argsort you have to index the array, introducing some overhead.
 Thanks, both.

 I also gain a second with mergesort.

 matlab would be nicer in my case, it returns both.
 I still need to use the argsort to index into the array to also get
 the sorted array.

 Many years ago I needed something similar, so I made some functions for
 sorting and argsorting in one single shot.  Maybe you want to reuse
 them.  Here it is an example of the C implementation:

 https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619

 and here the Cython wrapper for all of them:

 https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129


Thanks Francesc

That would be very useful to have.
However, I don't speak C, and it would require too much maintenance
time if I were to include that in statsmodels.

I'm trying to concentrate more on stats (and my backlog there), and
leave other parts to developers that know those better.

Josef


 Francesc


 Josef


 I seem unable to find the code for ndarray.sort, so I can't check. I have
 tried to grep it tring all possible combinations of def ndarray,
 self.sort, etc. Where is it?


 /David.


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 --
 Francesc Alted

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-17 Thread Julian Taylor
On 17.02.2014 15:18, Francesc Alted wrote:
 On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote:
 On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote:
 On 16 February 2014 23:43, josef.p...@gmail.com wrote:
 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?

 On numpy latest version:

 for kind in ['quicksort', 'mergesort', 'heapsort']:
  print kind
  %timeit np.sort(data, kind=kind)
  %timeit np.argsort(data, kind=kind)


 quicksort
 1 loops, best of 3: 3.55 s per loop
 1 loops, best of 3: 10.3 s per loop
 mergesort
 1 loops, best of 3: 4.84 s per loop
 1 loops, best of 3: 9.49 s per loop
 heapsort
 1 loops, best of 3: 12.1 s per loop
 1 loops, best of 3: 39.3 s per loop


 It looks quicksort is quicker sorting, but mergesort is marginally faster
 sorting args. The diference is slim, but upon repetition, it remains
 significant.

 Why is that? Probably part of the reason is what Eelco said, and part is
 that in sort comparison are done accessing the array elements directly, but
 in argsort you have to index the array, introducing some overhead.
 Thanks, both.

 I also gain a second with mergesort.

 matlab would be nicer in my case, it returns both.
 I still need to use the argsort to index into the array to also get
 the sorted array.
 
 Many years ago I needed something similar, so I made some functions for 
 sorting and argsorting in one single shot.  Maybe you want to reuse 
 them.  Here it is an example of the C implementation:
 
 https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619
 
 and here the Cython wrapper for all of them:
 
 https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129
 
 Francesc
 

that doesn't really make a big difference if the data is randomly
distributed.
the sorting operation is normally much more expensive than latter
applying the indices:

In [1]: d = np.arange(1000)

In [2]: np.random.shuffle(d)

In [3]: %timeit  np.argsort(d)
1 loops, best of 3: 1.99 s per loop

In [4]: idx = np.argsort(d)

In [5]: %timeit d[idx]
1 loops, best of 3: 213 ms per loop



But if your data is not random it can make a difference as even
quicksort can be a lot faster then.
timsort would be a nice addition to numpy, it performs very well for
partially sorted data. Unfortunately its quite complicated to implement.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-17 Thread Charles R Harris
On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor 
jtaylor.deb...@googlemail.com wrote:

 On 17.02.2014 15:18, Francesc Alted wrote:
  On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote:
  On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote:
  On 16 February 2014 23:43, josef.p...@gmail.com wrote:
  What's the fastest argsort for a 1d array with around 28 Million
  elements, roughly uniformly distributed, random order?
 
  On numpy latest version:
 
  for kind in ['quicksort', 'mergesort', 'heapsort']:
   print kind
   %timeit np.sort(data, kind=kind)
   %timeit np.argsort(data, kind=kind)
 
 
  quicksort
  1 loops, best of 3: 3.55 s per loop
  1 loops, best of 3: 10.3 s per loop
  mergesort
  1 loops, best of 3: 4.84 s per loop
  1 loops, best of 3: 9.49 s per loop
  heapsort
  1 loops, best of 3: 12.1 s per loop
  1 loops, best of 3: 39.3 s per loop
 
 
  It looks quicksort is quicker sorting, but mergesort is marginally
 faster
  sorting args. The diference is slim, but upon repetition, it remains
  significant.
 
  Why is that? Probably part of the reason is what Eelco said, and part
 is
  that in sort comparison are done accessing the array elements
 directly, but
  in argsort you have to index the array, introducing some overhead.
  Thanks, both.
 
  I also gain a second with mergesort.
 
  matlab would be nicer in my case, it returns both.
  I still need to use the argsort to index into the array to also get
  the sorted array.
 
  Many years ago I needed something similar, so I made some functions for
  sorting and argsorting in one single shot.  Maybe you want to reuse
  them.  Here it is an example of the C implementation:
 
  https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619
 
  and here the Cython wrapper for all of them:
 
 
 https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129
 
  Francesc
 

 that doesn't really make a big difference if the data is randomly
 distributed.
 the sorting operation is normally much more expensive than latter
 applying the indices:

 In [1]: d = np.arange(1000)

 In [2]: np.random.shuffle(d)

 In [3]: %timeit  np.argsort(d)
 1 loops, best of 3: 1.99 s per loop

 In [4]: idx = np.argsort(d)

 In [5]: %timeit d[idx]
 1 loops, best of 3: 213 ms per loop



 But if your data is not random it can make a difference as even
 quicksort can be a lot faster then.
 timsort would be a nice addition to numpy, it performs very well for
 partially sorted data. Unfortunately its quite complicated to implement.


Quicksort and shellsort gain speed by having simple inner loops. I have the
impression that timsort is optimal when compares and memory access are
expensive, but I haven't seen any benchmarks for native types in contiguous
memory.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] argsort speed

2014-02-16 Thread josef . pktd
currently using numpy 1.6.1

What's the fastest argsort for a 1d array with around 28 Million
elements, roughly uniformly distributed, random order?

Is there a reason that np.argsort is almost 3 times slower than np.sort?

I'm doing semi-systematic timing for a stats(models) algorithm.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Eelco Hoogendoorn

My guess;

First of all, you are actually manipulating twice as much data as opposed to 
an inplace sort.

Moreover, an inplace sort gains locality as it is being sorted, whereas the 
argsort is continuously making completely random memory accesses.


-Original Message- 
From: josef.p...@gmail.com
Sent: Sunday, February 16, 2014 11:43 PM
To: Discussion of Numerical Python
Subject: [Numpy-discussion] argsort speed

currently using numpy 1.6.1

What's the fastest argsort for a 1d array with around 28 Million
elements, roughly uniformly distributed, random order?

Is there a reason that np.argsort is almost 3 times slower than np.sort?

I'm doing semi-systematic timing for a stats(models) algorithm.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Daπid
On 16 February 2014 23:43, josef.p...@gmail.com wrote:

 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?


On numpy latest version:

for kind in ['quicksort', 'mergesort', 'heapsort']:
print kind
%timeit np.sort(data, kind=kind)
%timeit np.argsort(data, kind=kind)


quicksort
1 loops, best of 3: 3.55 s per loop
1 loops, best of 3: 10.3 s per loop
mergesort
1 loops, best of 3: 4.84 s per loop
1 loops, best of 3: 9.49 s per loop
heapsort
1 loops, best of 3: 12.1 s per loop
1 loops, best of 3: 39.3 s per loop


It looks quicksort is quicker sorting, but mergesort is marginally faster
sorting args. The diference is slim, but upon repetition, it remains
significant.

Why is that? Probably part of the reason is what Eelco said, and part is
that in sort comparison are done accessing the array elements directly, but
in argsort you have to index the array, introducing some overhead.

I seem unable to find the code for ndarray.sort, so I can't check. I have
tried to grep it tring all possible combinations of def ndarray,
self.sort, etc. Where is it?


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Daπid
On 17 February 2014 00:12, Daπid davidmen...@gmail.com wrote:

 I seem unable to find the code for ndarray.sort, so I can't check. I have
 tried to grep it tring all possible combinations of def ndarray,
 self.sort, etc. Where is it?



Nevermind, it is in core/src/multiarray/methods.c
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread josef . pktd
On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn
hoogendoorn.ee...@gmail.com wrote:

 My guess;

 First of all, you are actually manipulating twice as much data as opposed to
 an inplace sort.

 Moreover, an inplace sort gains locality as it is being sorted, whereas the
 argsort is continuously making completely random memory accesses.


 -Original Message-
 From: josef.p...@gmail.com
 Sent: Sunday, February 16, 2014 11:43 PM
 To: Discussion of Numerical Python
 Subject: [Numpy-discussion] argsort speed

 currently using numpy 1.6.1

 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?

 Is there a reason that np.argsort is almost 3 times slower than np.sort?

 I'm doing semi-systematic timing for a stats(models) algorithm.

I was using np.sort, inplace sort is only a little bit faster

It looks like sorting first, and then argsorting is faster than argsort alon

pvals.sort()
sortind = np.argsort(pvals)

replacing the inplace sort in the above reduces speed only a bit

--
import time
use_master = True
if use_master:
import sys
sys.path.insert(0,
rE:\Josef\!reps\numpy\dist\Programs\Python27\Lib\site-packages)

import numpy as np
print np.__version__ =, np.__version__

n = 5300
pvals = np.random.rand(n**2)

t0 = time.time()

p = np.sort(pvals)
t1 = time.time()

sortind = np.argsort(pvals)
t2 = time.time()

pvals.sort()
sortind = np.argsort(pvals)
t3 = time.time()

print t1 - t0, t2 - t1, t3 - t2
print (t2 - t1) * 1. / (t1 - t0),  (t3 - t2) * 1. / (t1 - t0)


np.__version__ = 1.9.0.dev-2868dc4
3.91900014877 9.556218 4.92900013924
2.43863219163 1.2577187936

Josef



 Josef
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread josef . pktd
On Sun, Feb 16, 2014 at 6:15 PM,  josef.p...@gmail.com wrote:
 On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn
 hoogendoorn.ee...@gmail.com wrote:

 My guess;

 First of all, you are actually manipulating twice as much data as opposed to
 an inplace sort.

 Moreover, an inplace sort gains locality as it is being sorted, whereas the
 argsort is continuously making completely random memory accesses.


 -Original Message-
 From: josef.p...@gmail.com
 Sent: Sunday, February 16, 2014 11:43 PM
 To: Discussion of Numerical Python
 Subject: [Numpy-discussion] argsort speed

 currently using numpy 1.6.1

 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?

 Is there a reason that np.argsort is almost 3 times slower than np.sort?

 I'm doing semi-systematic timing for a stats(models) algorithm.

 I was using np.sort, inplace sort is only a little bit faster

 It looks like sorting first, and then argsorting is faster than argsort alon

 pvals.sort()
 sortind = np.argsort(pvals)

Ok, that was useless, that won't be anything I want.

Josef



 replacing the inplace sort in the above reduces speed only a bit

 --
 import time
 use_master = True
 if use_master:
 import sys
 sys.path.insert(0,
 rE:\Josef\!reps\numpy\dist\Programs\Python27\Lib\site-packages)

 import numpy as np
 print np.__version__ =, np.__version__

 n = 5300
 pvals = np.random.rand(n**2)

 t0 = time.time()

 p = np.sort(pvals)
 t1 = time.time()

 sortind = np.argsort(pvals)
 t2 = time.time()

 pvals.sort()
 sortind = np.argsort(pvals)
 t3 = time.time()

 print t1 - t0, t2 - t1, t3 - t2
 print (t2 - t1) * 1. / (t1 - t0),  (t3 - t2) * 1. / (t1 - t0)
 

 np.__version__ = 1.9.0.dev-2868dc4
 3.91900014877 9.556218 4.92900013924
 2.43863219163 1.2577187936

 Josef



 Josef
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread josef . pktd
On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote:
 On 16 February 2014 23:43, josef.p...@gmail.com wrote:

 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?


 On numpy latest version:

 for kind in ['quicksort', 'mergesort', 'heapsort']:
 print kind
 %timeit np.sort(data, kind=kind)
 %timeit np.argsort(data, kind=kind)


 quicksort
 1 loops, best of 3: 3.55 s per loop
 1 loops, best of 3: 10.3 s per loop
 mergesort
 1 loops, best of 3: 4.84 s per loop
 1 loops, best of 3: 9.49 s per loop
 heapsort
 1 loops, best of 3: 12.1 s per loop
 1 loops, best of 3: 39.3 s per loop


 It looks quicksort is quicker sorting, but mergesort is marginally faster
 sorting args. The diference is slim, but upon repetition, it remains
 significant.

 Why is that? Probably part of the reason is what Eelco said, and part is
 that in sort comparison are done accessing the array elements directly, but
 in argsort you have to index the array, introducing some overhead.

Thanks, both.

I also gain a second with mergesort.

matlab would be nicer in my case, it returns both.
I still need to use the argsort to index into the array to also get
the sorted array.

Josef



 I seem unable to find the code for ndarray.sort, so I can't check. I have
 tried to grep it tring all possible combinations of def ndarray,
 self.sort, etc. Where is it?


 /David.


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Charles R Harris
On Sun, Feb 16, 2014 at 4:18 PM, josef.p...@gmail.com wrote:

 On Sun, Feb 16, 2014 at 6:15 PM,  josef.p...@gmail.com wrote:
  On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn
  hoogendoorn.ee...@gmail.com wrote:
 
  My guess;
 
  First of all, you are actually manipulating twice as much data as
 opposed to
  an inplace sort.
 
  Moreover, an inplace sort gains locality as it is being sorted, whereas
 the
  argsort is continuously making completely random memory accesses.
 
 
  -Original Message-
  From: josef.p...@gmail.com
  Sent: Sunday, February 16, 2014 11:43 PM
  To: Discussion of Numerical Python
  Subject: [Numpy-discussion] argsort speed
 
  currently using numpy 1.6.1
 
  What's the fastest argsort for a 1d array with around 28 Million
  elements, roughly uniformly distributed, random order?
 
  Is there a reason that np.argsort is almost 3 times slower than np.sort?
 
  I'm doing semi-systematic timing for a stats(models) algorithm.
 
  I was using np.sort, inplace sort is only a little bit faster
 
  It looks like sorting first, and then argsorting is faster than argsort
 alon
 
  pvals.sort()
  sortind = np.argsort(pvals)

 Ok, that was useless, that won't be anything I want.


I think locality is the most important thing. The argsort routines used to
move both the indexes and the array to argsort, the new ones only move the
indexes. It is a tradeoff, twice as many moves vs locality. It's probably
possible to invent an algorithm that mixes the two.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread josef . pktd
On Sun, Feb 16, 2014 at 7:13 PM, Charles R Harris
charlesr.har...@gmail.com wrote:



 On Sun, Feb 16, 2014 at 4:18 PM, josef.p...@gmail.com wrote:

 On Sun, Feb 16, 2014 at 6:15 PM,  josef.p...@gmail.com wrote:
  On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn
  hoogendoorn.ee...@gmail.com wrote:
 
  My guess;
 
  First of all, you are actually manipulating twice as much data as
  opposed to
  an inplace sort.
 
  Moreover, an inplace sort gains locality as it is being sorted, whereas
  the
  argsort is continuously making completely random memory accesses.
 
 
  -Original Message-
  From: josef.p...@gmail.com
  Sent: Sunday, February 16, 2014 11:43 PM
  To: Discussion of Numerical Python
  Subject: [Numpy-discussion] argsort speed
 
  currently using numpy 1.6.1
 
  What's the fastest argsort for a 1d array with around 28 Million
  elements, roughly uniformly distributed, random order?
 
  Is there a reason that np.argsort is almost 3 times slower than
  np.sort?
 
  I'm doing semi-systematic timing for a stats(models) algorithm.
 
  I was using np.sort, inplace sort is only a little bit faster
 
  It looks like sorting first, and then argsorting is faster than argsort
  alon
 
  pvals.sort()
  sortind = np.argsort(pvals)

 Ok, that was useless, that won't be anything I want.


 I think locality is the most important thing. The argsort routines used to
 move both the indexes and the array to argsort, the new ones only move the
 indexes. It is a tradeoff, twice as many moves vs locality. It's probably
 possible to invent an algorithm that mixes the two.

If that's the way it is, then that's the way it is.

I just never realized that argsort can take so long, since I usually
use only smaller arrays.
I was surprised that argsort took almost 10 out of around 12 seconds
in my function, and this after I cleaned up my code and removed
duplicate and avoidable argsorts.

Josef



 snip

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Charles R Harris
On Sun, Feb 16, 2014 at 4:12 PM, Daπid davidmen...@gmail.com wrote:

 On 16 February 2014 23:43, josef.p...@gmail.com wrote:

 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?


 On numpy latest version:

 for kind in ['quicksort', 'mergesort', 'heapsort']:
 print kind
 %timeit np.sort(data, kind=kind)
 %timeit np.argsort(data, kind=kind)


 quicksort
 1 loops, best of 3: 3.55 s per loop
 1 loops, best of 3: 10.3 s per loop
 mergesort
 1 loops, best of 3: 4.84 s per loop
 1 loops, best of 3: 9.49 s per loop
 heapsort
 1 loops, best of 3: 12.1 s per loop
 1 loops, best of 3: 39.3 s per loop


Interesting. I get slightly different results

quicksort
1 loops, best of 3: 3.25 s per loop
1 loops, best of 3: 6.16 s per loop
mergesort
1 loops, best of 3: 3.99 s per loop
1 loops, best of 3: 6.97 s per loop
heapsort
1 loops, best of 3: 10.1 s per loop
1 loops, best of 3: 29.3 s per loop

Possibly faster memory here.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort

2013-01-16 Thread Mads Ipsen

Hi,

Thanks everybody for all the answers that make perfect sense when axis=0.

Now suppose I want to sort the array in such a way that each row is 
sorted individually. Then I suppose I should do this:


from numpy import *

v = array([[4,3],
   [1,12],
   [23,7],
   [11,6],
   [8,9]])
idx = argsort(v, axis=1)

idx is then

[[1 0]
 [0 1]
 [1 0]
 [1 0]
 [0 1]]

which makes sense, since these are the indices in an order that would 
sort each row. But when I try


a[idx, variuos_additional_arguments]

I just get strange results. Anybody that can point me towards the 
correct solution.


Best regards,

Mads


On 01/15/2013 09:53 PM, eat wrote:

Hi,

On Tue, Jan 15, 2013 at 1:50 PM, Mads Ipsen madsip...@gmail.com 
mailto:madsip...@gmail.com wrote:


Hi,

I simply can't understand this. I'm trying to use argsort to
produce indices that can be used to sort an array:

  from numpy import *

  indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]])
  args = argsort(indices, axis=0)
  print indices[args]

gives:

[[[ 1 12]
  [ 4  3]]

 [[ 4  3]
  [11  6]]

 [[ 8  9]
  [23  7]]

 [[11  6]
  [ 8  9]]

 [[23  7]
  [ 1 12]]]

I thought this should produce a sorted version of the indices array.

Any help is appreciated.

Perhaps these three different point of views will help you a little 
bit more to move on:

In []: x
Out[]:
array([[ 4,  3],
   [ 1, 12],
   [23,  7],
   [11,  6],
   [ 8,  9]])
In []: ind= x.argsort(axis= 0)
In []: ind
Out[]:
array([[1, 0],
   [0, 3],
   [4, 2],
   [3, 4],
   [2, 1]])

In []: x[ind[:, 0]]
Out[]:
array([[ 1, 12],
   [ 4,  3],
   [ 8,  9],
   [11,  6],
   [23,  7]])

In []: x[ind[:, 1]]
Out[]:
array([[ 4,  3],
   [11,  6],
   [23,  7],
   [ 8,  9],
   [ 1, 12]])

In []: x[ind, [0, 1]]
Out[]:
array([[ 1,  3],
   [ 4,  6],
   [ 8,  7],
   [11,  9],
   [23, 12]])
-eat


Best regards,

Mads

-- 
+-+

| Mads Ipsen  |
+--+--+
| Gåsebæksvej 7, 4. tv |  |
| DK-2500 Valby| phone:+45-29716388  tel:%2B45-29716388  |
| Denmark  | email:mads.ip...@gmail.com  
mailto:mads.ip...@gmail.com  |
+--+--+


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



--
+-+
| Mads Ipsen  |
+--+--+
| Gåsebæksvej 7, 4. tv |  |
| DK-2500 Valby| phone:  +45-29716388 |
| Denmark  | email:  mads.ip...@gmail.com |
+--+--+

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort

2013-01-16 Thread Robert Kern
On Wed, Jan 16, 2013 at 9:30 AM, Mads Ipsen madsip...@gmail.com wrote:
 Hi,

 Thanks everybody for all the answers that make perfect sense when axis=0.

 Now suppose I want to sort the array in such a way that each row is sorted
 individually. Then I suppose I should do this:

 from numpy import *


 v = array([[4,3],
[1,12],
[23,7],
[11,6],
[8,9]])
 idx = argsort(v, axis=1)

 idx is then

 [[1 0]
  [0 1]
  [1 0]
  [1 0]
  [0 1]]

 which makes sense, since these are the indices in an order that would sort
 each row. But when I try

 a[idx, variuos_additional_arguments]

 I just get strange results. Anybody that can point me towards the correct
 solution.

Please have a look at the documentation again. If idx has indices for
the second axis, you need to put it into the second place.

  
http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays
  
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing


[~]
|4 idx0 = np.arange(v.shape[0])[:,np.newaxis]

[~]
|5 idx0
array([[0],
   [1],
   [2],
   [3],
   [4]])

[~]
|7 v[idx0, idx]
array([[ 3,  4],
   [ 1, 12],
   [ 7, 23],
   [ 6, 11],
   [ 8,  9]])

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] argsort

2013-01-15 Thread Mads Ipsen

Hi,

I simply can't understand this. I'm trying to use argsort to produce 
indices that can be used to sort an array:


  from numpy import *

  indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]])
  args = argsort(indices, axis=0)
  print indices[args]

gives:

[[[ 1 12]
  [ 4  3]]

 [[ 4  3]
  [11  6]]

 [[ 8  9]
  [23  7]]

 [[11  6]
  [ 8  9]]

 [[23  7]
  [ 1 12]]]

I thought this should produce a sorted version of the indices array.

Any help is appreciated.

Best regards,

Mads

--
+-+
| Mads Ipsen  |
+--+--+
| Gåsebæksvej 7, 4. tv |  |
| DK-2500 Valby| phone:  +45-29716388 |
| Denmark  | email:  mads.ip...@gmail.com |
+--+--+

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort

2013-01-15 Thread Charles R Harris
On Tue, Jan 15, 2013 at 4:50 AM, Mads Ipsen madsip...@gmail.com wrote:

  Hi,

 I simply can't understand this. I'm trying to use argsort to produce
 indices that can be used to sort an array:

   from numpy import *

   indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]])
   args = argsort(indices, axis=0)
   print indices[args]

 gives:

 [[[ 1 12]
   [ 4  3]]

  [[ 4  3]
   [11  6]]

  [[ 8  9]
   [23  7]]

  [[11  6]
   [ 8  9]]

  [[23  7]
   [ 1 12]]]

 I thought this should produce a sorted version of the indices array.

 Any help is appreciated.


Fancy indexing is a funny creature and not easy to understand in more than
one dimension. What is happening is that each index is replaced by the
corresponding row of a and the result is of shape (5,2,2). To do what you
want to do:

In [20]: a[i, [[0,1]]*5]
Out[20]:
array([[ 1,  3],
   [ 4,  6],
   [ 8,  7],
   [11,  9],
   [23, 12]])

I agree that there should be an easier way to do this.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort

2013-01-15 Thread Robert Kern
On Tue, Jan 15, 2013 at 3:44 PM, Charles R Harris
charlesr.har...@gmail.com wrote:
 Fancy indexing is a funny creature and not easy to understand in more than
 one dimension. What is happening is that each index is replaced by the
 corresponding row of a and the result is of shape (5,2,2). To do what you
 want to do:

 In [20]: a[i, [[0,1]]*5]
 Out[20]:
 array([[ 1,  3],
[ 4,  6],
[ 8,  7],
[11,  9],
[23, 12]])

 I agree that there should be an easier way to do this.

Slightly easier, though no more transparent:

  a[i, [0,1]]

http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort

2013-01-15 Thread eat
Hi,

On Tue, Jan 15, 2013 at 1:50 PM, Mads Ipsen madsip...@gmail.com wrote:

  Hi,

 I simply can't understand this. I'm trying to use argsort to produce
 indices that can be used to sort an array:

   from numpy import *

   indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]])
   args = argsort(indices, axis=0)
   print indices[args]

 gives:

 [[[ 1 12]
   [ 4  3]]

  [[ 4  3]
   [11  6]]

  [[ 8  9]
   [23  7]]

  [[11  6]
   [ 8  9]]

  [[23  7]
   [ 1 12]]]

 I thought this should produce a sorted version of the indices array.

 Any help is appreciated.

Perhaps these three different point of views will help you a little bit
more to move on:
In []: x
Out[]:
array([[ 4,  3],
   [ 1, 12],
   [23,  7],
   [11,  6],
   [ 8,  9]])
In []: ind= x.argsort(axis= 0)
In []: ind
Out[]:
array([[1, 0],
   [0, 3],
   [4, 2],
   [3, 4],
   [2, 1]])

In []: x[ind[:, 0]]
Out[]:
array([[ 1, 12],
   [ 4,  3],
   [ 8,  9],
   [11,  6],
   [23,  7]])

In []: x[ind[:, 1]]
Out[]:
array([[ 4,  3],
   [11,  6],
   [23,  7],
   [ 8,  9],
   [ 1, 12]])

In []: x[ind, [0, 1]]
Out[]:
array([[ 1,  3],
   [ 4,  6],
   [ 8,  7],
   [11,  9],
   [23, 12]])
-eat


 Best regards,

 Mads

  --
 +-+
 | Mads Ipsen  |
 +--+--+
 | Gåsebæksvej 7, 4. tv |  |
 | DK-2500 Valby| phone:  +45-29716388 |
 | Denmark  | email:  mads.ip...@gmail.com |
 +--+--+



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Argsort

2009-02-05 Thread Ryan May
Hi,

Ok, what am I missing here:

x = np.array([[4,2],[5,3]])
x[x.argsort(1)]

array([[[5, 3],
[4, 2]],

   [[5, 3],
[4, 2]]])

I was expecting:

array([[2,4],[3,5]])

Certainly not a 3D array.  What am I doing wrong?

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort in descending order

2008-09-06 Thread Dan Lenski
On Fri, 05 Sep 2008 07:02:38 -0700, SimonPalmer wrote:

 another newb question I suspect, but is there a way to instruct argsort
 to sort in descending order or should I just sort and reverse?

Just sort and subtract to get the reverse order.  I can think of two 
reasonable ways to do it with no copying:

from numpy import *
a = rand(100)

# this does NOT make a copy of a, simply indexes it backward, and should 
# be very fast
reverseorder = argsort(a[::-1])

# this way doesn't make a copy either, and probably is a little slower
# since it has to do many subtractions
reverseorder = len(a)-1-argsort(a)

HTH,
Dan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] argsort in descending order

2008-09-05 Thread SimonPalmer
another newb question I suspect, but is there a way to instruct
argsort to sort in descending order or should I just sort and reverse?
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort in descending order

2008-09-05 Thread Charles R Harris
On Fri, Sep 5, 2008 at 8:02 AM, SimonPalmer [EMAIL PROTECTED] wrote:

 another newb question I suspect, but is there a way to instruct
 argsort to sort in descending order or should I just sort and reverse?


You'll just have to reverse the indices.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort memory problem?

2008-01-30 Thread Oriol Vendrell
* Robin [EMAIL PROTECTED] [2008-01-29 19:23:11 +]:

 On Jan 29, 2008 7:16 PM, Lou Pecora [EMAIL PROTECTED] wrote:
  Hmmm... Interesting.  I am using Python 2.4.4.  It
  would be nice to have other Mac people with same/other
  Python and numpy versions try the argsort bug code.
 
 I don't see any memory leak with the test code.
 Mac OS X 10.5.1
 Python 2.5.1 (not apple one)
 Numpy 1.0.5.dev4722

I have run the test1 code again, this time on my laptop PC (no MAC-user,
sorry) using the last stable numpy release. The memory problem does
_not_ show up now.  I'm running with:
 - Ubuntu Feisty (kernel 2.6.17-12-generic i686)
 - python 2.5.1 (Feisty package)
 - numpy 1.0.4 (compiled with gcc version 4.1.2)

However, the memory leak appears on my laptop if I use python 2.5.1 with
numpy 1.0.1. At least here, this seems to be an issue dependent only on
the numpy version, and a solved one.

-- oriol
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort memory problem?

2008-01-29 Thread Lou Pecora
This still occurs in numpy 1.0.3.1  so must have been
fixed between that and your 1.0.4-5 version. 

By the way the memory problem crashes my Intel Mac
Book Pro (system 10.4.11) with the gray screen and
black dialog box telling me to restart my computer.  A
very UN-unix like and UN-Mac like way of handling a
memory problem IMHO.  Let us Mac people not be too
smug.

-- Lou Pecora


--- Alexandre Fayolle [EMAIL PROTECTED]
wrote:

 On Tue, Jan 29, 2008 at 02:58:15PM +0100, Oriol
 Vendrell wrote:
  Hi all,
  
  I've noticed something that looks like an odd
 behaviour in array.argsort().
  
  
  # test1 -
  from numpy import array
  while True:
  a=array([8.0,7.0,6.0,5.0,4.0,2.0])
  i=a.argsort()
  # ---
  
  # test2 -
  from numpy import array
  a=array([8.0,7.0,6.0,5.0,4.0,2.0])
  while True:
  i=a.argsort()
  # ---
  
  
  test1 runs out of memory after a few minutes, it
 seems that in each cycle
  some memory is allocated and never returned back.
  test2 runs fine until killed.
  
  I'm unsure if I'm missing something or if this
 could be a bug. I'm using
  numpy 1.0.1 with python 2.4.4 in a debian stable
 system.
 
 Certainly a bug, but it has been fixed and I cannot
 reproduce in debian
 sid (using 1.0.4-5) 
 
 -- 
 Alexandre Fayolle 



  

Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort memory problem?

2008-01-29 Thread Alexandre Fayolle
On Tue, Jan 29, 2008 at 02:58:15PM +0100, Oriol Vendrell wrote:
 Hi all,
 
 I've noticed something that looks like an odd behaviour in array.argsort().
 
 
 # test1 -
 from numpy import array
 while True:
 a=array([8.0,7.0,6.0,5.0,4.0,2.0])
 i=a.argsort()
 # ---
 
 # test2 -
 from numpy import array
 a=array([8.0,7.0,6.0,5.0,4.0,2.0])
 while True:
 i=a.argsort()
 # ---
 
 
 test1 runs out of memory after a few minutes, it seems that in each cycle
 some memory is allocated and never returned back.
 test2 runs fine until killed.
 
 I'm unsure if I'm missing something or if this could be a bug. I'm using
 numpy 1.0.1 with python 2.4.4 in a debian stable system.

Certainly a bug, but it has been fixed and I cannot reproduce in debian
sid (using 1.0.4-5) 

-- 
Alexandre Fayolle  LOGILAB, Paris (France)
Formations Python, Zope, Plone, Debian:  http://www.logilab.fr/formations
Développement logiciel sur mesure:   http://www.logilab.fr/services
Informatique scientifique:   http://www.logilab.fr/science


signature.asc
Description: Digital signature
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] argsort memory problem?

2008-01-29 Thread Oriol Vendrell
Hi all,

I've noticed something that looks like an odd behaviour in array.argsort().


# test1 -
from numpy import array
while True:
a=array([8.0,7.0,6.0,5.0,4.0,2.0])
i=a.argsort()
# ---

# test2 -
from numpy import array
a=array([8.0,7.0,6.0,5.0,4.0,2.0])
while True:
i=a.argsort()
# ---


test1 runs out of memory after a few minutes, it seems that in each cycle
some memory is allocated and never returned back.
test2 runs fine until killed.

I'm unsure if I'm missing something or if this could be a bug. I'm using
numpy 1.0.1 with python 2.4.4 in a debian stable system.

Any suggestions?

Thanks,

-- 
Oriol Vendrell
[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort memory problem?

2008-01-29 Thread Robin
On Jan 29, 2008 7:16 PM, Lou Pecora [EMAIL PROTECTED] wrote:
 Hmmm... Interesting.  I am using Python 2.4.4.  It
 would be nice to have other Mac people with same/other
 Python and numpy versions try the argsort bug code.

I don't see any memory leak with the test code.
Mac OS X 10.5.1
Python 2.5.1 (not apple one)
Numpy 1.0.5.dev4722

Robin
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort memory problem?

2008-01-29 Thread Lou Pecora
Hmmm... Interesting.  I am using Python 2.4.4.  It
would be nice to have other Mac people with same/other
Python and numpy versions try the argsort bug code. 


-- Lou Pecora

--- Francesc Altet [EMAIL PROTECTED] wrote:

 A Tuesday 29 January 2008, Lou Pecora escrigué:
  This still occurs in numpy 1.0.3.1  so must have
 been
  fixed between that and your 1.0.4-5 version.
 
 It works here and I'm using NumPy 1.0.3, Python
 2.5.1 on a Ubuntu 7.10 / 
 Pentium4 machine.
 
  By the way the memory problem crashes my Intel Mac
  Book Pro (system 10.4.11) with the gray screen and
  black dialog box telling me to restart my
 computer.  A
  very UN-unix like and UN-Mac like way of handling
 a
  memory problem IMHO.  Let us Mac people not be too
  smug.
 
 Um, it would be nice if some other Mac-user can
 reproduce your problem. 
 Perhaps you are suffering some other problem that
 can be exposed by 
 this code snip.
 
 Cheers,
 
 -- 
 0,0   Francesc Altet     http://www.carabos.com/


-- Lou Pecora,   my views are my own.


  

Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] argsort and take along arbitrary axes

2007-05-14 Thread Zachary Pincus
Hello all,

I've got a few questions that came up as I tried to calculate various  
statistics about an image time-series. For example, I have an array  
of shape (t,x,y) representing t frames of a time-lapse of resolution  
(x,y).

Now, say I want to both argsort and sort this time-series, pixel- 
wise. (For example.)

In 1-d it's easy:
indices = a.argsort()
sorted = a[indices]

I would have thought that doing this on my 3-d array would work  
similarly:
indices = a.argsort(axis=0)
sorted = a.take(indices, axis=0)

Unfortunately, this gives a ValueError of dimensions too large.  
Now, I know that 'a.sort(axis=0)' works fine for the given example,  
but I'm curious about how to this sort of indexing operation in the  
general case.

Thanks for any insight,

Zach Pincus

Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort and take along arbitrary axes

2007-05-14 Thread Zachary Pincus
 I've got a few questions that came up as I tried to calculate various
 statistics about an image time-series. For example, I have an array
 of shape (t,x,y) representing t frames of a time-lapse of resolution
 (x,y).

 Now, say I want to both argsort and sort this time-series, pixel-
 wise. (For example.)

 In 1-d it's easy:
 indices = a.argsort()
 sorted = a[indices]

 I would have thought that doing this on my 3-d array would work
 similarly:
 indices = a.argsort(axis=0)
 sorted = a.take(indices, axis=0)

 Unfortunately, this gives a ValueError of dimensions too large.
 Now, I know that 'a.sort(axis=0)' works fine for the given example,
 but I'm curious about how to this sort of indexing operation in the
 general case.

 Unfortunately, argsort doesn't work transparently with take or  
 fancy indexing for multidimensional arrays. I am thinking of adding  
 a function argtake for this, and also for the results returned by  
 argmax and argmin, but at the moment you have to fill in the   
 values of the other indices and use fancy indexing. For now, it is  
 probably simpler, prettier, and faster to just sort the array.

Thanks Charles. Unfortunately, the argsort/sort buisness was, as I  
mentioned, just an example of the kind of 'take' operation that I am  
trying to figure out how to do. There are other operations that will  
have similarly-formatted 'indices' arrays (as above) that aren't  
generated from argsort...

As such, how do I fill in the values of the other indices and use  
fancy indexing? Even after reading the numpy book about that, and  
reading the docstring for numpy.take, I'm still vague on this. Would  
I use numpy.indices to get a list of index arrays, and then swap in  
(at the correct position in this list) the result of argsort (or the  
other operations), and use that for fancy indexing? Is there an  
easier/faster way?

Thanks again,

Zach





___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort and take along arbitrary axes

2007-05-14 Thread Charles R Harris

On 5/14/07, Zachary Pincus [EMAIL PROTECTED] wrote:


 I've got a few questions that came up as I tried to calculate various
 statistics about an image time-series. For example, I have an array
 of shape (t,x,y) representing t frames of a time-lapse of resolution
 (x,y).

 Now, say I want to both argsort and sort this time-series, pixel-
 wise. (For example.)

 In 1-d it's easy:
 indices = a.argsort()
 sorted = a[indices]

 I would have thought that doing this on my 3-d array would work
 similarly:
 indices = a.argsort(axis=0)
 sorted = a.take(indices, axis=0)

 Unfortunately, this gives a ValueError of dimensions too large.
 Now, I know that 'a.sort(axis=0)' works fine for the given example,
 but I'm curious about how to this sort of indexing operation in the
 general case.

 Unfortunately, argsort doesn't work transparently with take or
 fancy indexing for multidimensional arrays. I am thinking of adding
 a function argtake for this, and also for the results returned by
 argmax and argmin, but at the moment you have to fill in the
 values of the other indices and use fancy indexing. For now, it is
 probably simpler, prettier, and faster to just sort the array.

Thanks Charles. Unfortunately, the argsort/sort buisness was, as I
mentioned, just an example of the kind of 'take' operation that I am
trying to figure out how to do. There are other operations that will
have similarly-formatted 'indices' arrays (as above) that aren't
generated from argsort...

As such, how do I fill in the values of the other indices and use
fancy indexing? Even after reading the numpy book about that, and
reading the docstring for numpy.take, I'm still vague on this. Would
I use numpy.indices to get a list of index arrays, and then swap in
(at the correct position in this list) the result of argsort (or the
other operations), and use that for fancy indexing? Is there an
easier/faster way?



I've attached  a quick, mostly untested, version of argtake. It's in Python,
probably not too fast, but see if it works for you.

Chuck
import numpy as N

def argtake(arr, ind, axis=-1) :
Take using output of argsort

*Description*

The usual output of argsort is not easily used with take when the relevent
array is multidimensional. This is a quick and dirty standin.

*Parameters*:

arr : ndarray
The array from which the elements are taken. Typically this will be
the same array to which argsort was applied.

ind : ndarray
Indices returned by argsort or lexsort.

axis : integer
Axis to which argsort or lexsort was applied. Must match the
original call. Defaults to -1.

*Returns*:

reordered_array : same type as arr
The input array reordered by ind along the axis.


if arr.shape != ind.shape :
raise shape mismatch
if arr.ndim == 1 :
return N.take(arr, ind)
else :
naxis = arr.shape[axis]
aswap = N.swapaxes(arr, axis, -1)
iswap = N.swapaxes(ind, axis, -1)
shape = aswap.shape
aswap = aswap.reshape(-1, naxis)
iswap = iswap.reshape(-1, naxis)
oswap = N.empty_like(aswap)
for i in xrange(len(aswap)) :
N.take(aswap[i], iswap[i], out=oswap[i])
oswap = oswap.reshape(shape)
oswap = N.swapaxes(oswap, axis, -1)
return oswap
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion