[Numpy-discussion] k maximal elements

2011-06-06 Thread Alex Ter-Sarkissov
I have a vector of positive integers length n. Is there a simple (i.e.
without sorting/ranking) of 'pulling out' k larrgest (or smallest) values.
Something like

*sum(x[sum(x,1)(max(sum(x,1)+min(sum(x,1/2,])*

but smarter
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] k maximal elements

2011-06-06 Thread Olivier Delalleau
I don't really understand your proposed solution, but you can do something
like:

import heapq
q = list(x)
heapq.heapify(q)
k_smallest = [heapq.heappop(q) for i in xrange(k)]

which is in O(n + k log n)

-=- Olivier

2011/6/6 Alex Ter-Sarkissov ater1...@gmail.com

 I have a vector of positive integers length n. Is there a simple (i.e.
 without sorting/ranking) of 'pulling out' k larrgest (or smallest) values.
 Something like

 *sum(x[sum(x,1)(max(sum(x,1)+min(sum(x,1/2,])*

 but smarter

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] k maximal elements

2011-06-06 Thread David Cournapeau
On Mon, Jun 6, 2011 at 3:15 PM, Alex Ter-Sarkissov ater1...@gmail.com wrote:
 I have a vector of positive integers length n. Is there a simple (i.e.
 without sorting/ranking) of 'pulling out' k larrgest (or smallest) values.

Maybe not so simple, but does not require sorting (and its associated
o(NlogN) cost): http://en.wikipedia.org/wiki/Selection_algorithm
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] k maximal elements

2011-06-06 Thread Keith Goodman
On Sun, Jun 5, 2011 at 11:15 PM, Alex Ter-Sarkissov ater1...@gmail.com wrote:
 I have a vector of positive integers length n. Is there a simple (i.e.
 without sorting/ranking) of 'pulling out' k larrgest (or smallest) values.
 Something like

 sum(x[sum(x,1)(max(sum(x,1)+min(sum(x,1/2,])

 but smarter

You could use bottleneck [1]. It has a partial sort written in cython:

 a = np.arange(10)
 np.random.shuffle(a)
 a
   array([6, 3, 8, 5, 4, 2, 0, 1, 9, 7])
 import bottleneck as bn
 k = 3
 b = bn.partsort(a, a.size-k)
 b[-k:]
   array([8, 9, 7])

Or you could get the indices:

 idx = bn.argpartsort(a, a.size-k)
 idx[-k:]
   array([2, 8, 9])

[1] The partial sort will be in Bottleneck 0.5.0. There is a beta2 at
https://github.com/kwgoodman/bottleneck/downloads. Release version
will be posted at http://pypi.python.org/pypi/Bottleneck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] k maximal elements

2011-06-06 Thread Keith Goodman
On Mon, Jun 6, 2011 at 6:44 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Sun, Jun 5, 2011 at 11:15 PM, Alex Ter-Sarkissov ater1...@gmail.com 
 wrote:
 I have a vector of positive integers length n. Is there a simple (i.e.
 without sorting/ranking) of 'pulling out' k larrgest (or smallest) values.
 Something like

 sum(x[sum(x,1)(max(sum(x,1)+min(sum(x,1/2,])

 but smarter

 You could use bottleneck [1]. It has a partial sort written in cython:

     a = np.arange(10)
     np.random.shuffle(a)
     a
       array([6, 3, 8, 5, 4, 2, 0, 1, 9, 7])
     import bottleneck as bn
     k = 3
     b = bn.partsort(a, a.size-k)
     b[-k:]
       array([8, 9, 7])

 Or you could get the indices:

     idx = bn.argpartsort(a, a.size-k)
     idx[-k:]
       array([2, 8, 9])

 [1] The partial sort will be in Bottleneck 0.5.0. There is a beta2 at
 https://github.com/kwgoodman/bottleneck/downloads. Release version
 will be posted at http://pypi.python.org/pypi/Bottleneck

Oh, and here are some timings:

 a = np.random.rand(1e6)
 timeit a.sort()
10 loops, best of 3: 18.3 ms per loop
 timeit bn.partsort(a, 100)
100 loops, best of 3: 3.71 ms per loop

 a = np.random.rand(1e4)
 timeit a.sort()
1000 loops, best of 3: 170 us per loop
 timeit bn.partsort(a, 10)
1 loops, best of 3: 19.4 us per loop
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] k maximal elements

2011-06-06 Thread gary ruben
I learn a lot by watching the numpy and scipy lists (today Olivier
taught me about heapq :), but he may not have noticed that Python 2.4
added an nsmallest method)

import heapq
q = list(x)
heapq.heapify(q)
k_smallest = heapq.nsmallest(k,q)

On Mon, Jun 6, 2011 at 10:52 PM, Olivier Delalleau sh...@keba.be wrote:
 I don't really understand your proposed solution, but you can do something
 like:

 import heapq
 q = list(x)
 heapq.heapify(q)
 k_smallest = [heapq.heappop(q) for i in xrange(k)]

 which is in O(n + k log n)

 -=- Olivier

 2011/6/6 Alex Ter-Sarkissov ater1...@gmail.com

 I have a vector of positive integers length n. Is there a simple (i.e.
 without sorting/ranking) of 'pulling out' k larrgest (or smallest) values.
 Something like

 sum(x[sum(x,1)(max(sum(x,1)+min(sum(x,1/2,])

 but smarter

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] k maximal elements

2011-06-06 Thread Keith Goodman
On Mon, Jun 6, 2011 at 6:57 AM, gary ruben gru...@bigpond.net.au wrote:
 I learn a lot by watching the numpy and scipy lists (today Olivier
 taught me about heapq :), but he may not have noticed that Python 2.4
 added an nsmallest method)

 import heapq
 q = list(x)
 heapq.heapify(q)
 k_smallest = heapq.nsmallest(k,q)

I learned something new too---heapq.

Timings:

 alist = a.tolist()
 heapq.heapify(alist)
 timeit k_smallest = heapq.nsmallest(k, alist)
100 loops, best of 3: 1.84 ms per loop
 timeit bn.partsort(a, a.size-10)
1 loops, best of 3: 39.3 us per loop
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] k maximal elements

2011-06-06 Thread RadimRehurek
 On Mon, Jun 6, 2011 at 6:57 AM, gary ruben gru...@bigpond.net.au wrote:
  I learn a lot by watching the numpy and scipy lists (today Olivier
  taught me about heapq :), but he may not have noticed that Python 2.4
  added an nsmallest method)

I needed  indices of the selected elements as well (not just the k-smallest 
values themselves) some time ago and compared heapq vs. sort vs. numpy.argsort.

Timing results (for my specific usecase) are at 
https://github.com/piskvorky/gensim/issues/5 . The results are as expected: 
argsort is a must, forming (value, index) tuples explicitly and then sorting 
that with sort/heapq kills performance.

Best,
Radim
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion