[Numpy-discussion] k maximal elements
I have a vector of positive integers length n. Is there a simple (i.e. without sorting/ranking) of 'pulling out' k larrgest (or smallest) values. Something like *sum(x[sum(x,1)(max(sum(x,1)+min(sum(x,1/2,])* but smarter ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] k maximal elements
I don't really understand your proposed solution, but you can do something like: import heapq q = list(x) heapq.heapify(q) k_smallest = [heapq.heappop(q) for i in xrange(k)] which is in O(n + k log n) -=- Olivier 2011/6/6 Alex Ter-Sarkissov ater1...@gmail.com I have a vector of positive integers length n. Is there a simple (i.e. without sorting/ranking) of 'pulling out' k larrgest (or smallest) values. Something like *sum(x[sum(x,1)(max(sum(x,1)+min(sum(x,1/2,])* but smarter ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] k maximal elements
On Mon, Jun 6, 2011 at 3:15 PM, Alex Ter-Sarkissov ater1...@gmail.com wrote: I have a vector of positive integers length n. Is there a simple (i.e. without sorting/ranking) of 'pulling out' k larrgest (or smallest) values. Maybe not so simple, but does not require sorting (and its associated o(NlogN) cost): http://en.wikipedia.org/wiki/Selection_algorithm ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] k maximal elements
On Sun, Jun 5, 2011 at 11:15 PM, Alex Ter-Sarkissov ater1...@gmail.com wrote: I have a vector of positive integers length n. Is there a simple (i.e. without sorting/ranking) of 'pulling out' k larrgest (or smallest) values. Something like sum(x[sum(x,1)(max(sum(x,1)+min(sum(x,1/2,]) but smarter You could use bottleneck [1]. It has a partial sort written in cython: a = np.arange(10) np.random.shuffle(a) a array([6, 3, 8, 5, 4, 2, 0, 1, 9, 7]) import bottleneck as bn k = 3 b = bn.partsort(a, a.size-k) b[-k:] array([8, 9, 7]) Or you could get the indices: idx = bn.argpartsort(a, a.size-k) idx[-k:] array([2, 8, 9]) [1] The partial sort will be in Bottleneck 0.5.0. There is a beta2 at https://github.com/kwgoodman/bottleneck/downloads. Release version will be posted at http://pypi.python.org/pypi/Bottleneck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] k maximal elements
On Mon, Jun 6, 2011 at 6:44 AM, Keith Goodman kwgood...@gmail.com wrote: On Sun, Jun 5, 2011 at 11:15 PM, Alex Ter-Sarkissov ater1...@gmail.com wrote: I have a vector of positive integers length n. Is there a simple (i.e. without sorting/ranking) of 'pulling out' k larrgest (or smallest) values. Something like sum(x[sum(x,1)(max(sum(x,1)+min(sum(x,1/2,]) but smarter You could use bottleneck [1]. It has a partial sort written in cython: a = np.arange(10) np.random.shuffle(a) a array([6, 3, 8, 5, 4, 2, 0, 1, 9, 7]) import bottleneck as bn k = 3 b = bn.partsort(a, a.size-k) b[-k:] array([8, 9, 7]) Or you could get the indices: idx = bn.argpartsort(a, a.size-k) idx[-k:] array([2, 8, 9]) [1] The partial sort will be in Bottleneck 0.5.0. There is a beta2 at https://github.com/kwgoodman/bottleneck/downloads. Release version will be posted at http://pypi.python.org/pypi/Bottleneck Oh, and here are some timings: a = np.random.rand(1e6) timeit a.sort() 10 loops, best of 3: 18.3 ms per loop timeit bn.partsort(a, 100) 100 loops, best of 3: 3.71 ms per loop a = np.random.rand(1e4) timeit a.sort() 1000 loops, best of 3: 170 us per loop timeit bn.partsort(a, 10) 1 loops, best of 3: 19.4 us per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] k maximal elements
I learn a lot by watching the numpy and scipy lists (today Olivier taught me about heapq :), but he may not have noticed that Python 2.4 added an nsmallest method) import heapq q = list(x) heapq.heapify(q) k_smallest = heapq.nsmallest(k,q) On Mon, Jun 6, 2011 at 10:52 PM, Olivier Delalleau sh...@keba.be wrote: I don't really understand your proposed solution, but you can do something like: import heapq q = list(x) heapq.heapify(q) k_smallest = [heapq.heappop(q) for i in xrange(k)] which is in O(n + k log n) -=- Olivier 2011/6/6 Alex Ter-Sarkissov ater1...@gmail.com I have a vector of positive integers length n. Is there a simple (i.e. without sorting/ranking) of 'pulling out' k larrgest (or smallest) values. Something like sum(x[sum(x,1)(max(sum(x,1)+min(sum(x,1/2,]) but smarter ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] k maximal elements
On Mon, Jun 6, 2011 at 6:57 AM, gary ruben gru...@bigpond.net.au wrote: I learn a lot by watching the numpy and scipy lists (today Olivier taught me about heapq :), but he may not have noticed that Python 2.4 added an nsmallest method) import heapq q = list(x) heapq.heapify(q) k_smallest = heapq.nsmallest(k,q) I learned something new too---heapq. Timings: alist = a.tolist() heapq.heapify(alist) timeit k_smallest = heapq.nsmallest(k, alist) 100 loops, best of 3: 1.84 ms per loop timeit bn.partsort(a, a.size-10) 1 loops, best of 3: 39.3 us per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] k maximal elements
On Mon, Jun 6, 2011 at 6:57 AM, gary ruben gru...@bigpond.net.au wrote: I learn a lot by watching the numpy and scipy lists (today Olivier taught me about heapq :), but he may not have noticed that Python 2.4 added an nsmallest method) I needed indices of the selected elements as well (not just the k-smallest values themselves) some time ago and compared heapq vs. sort vs. numpy.argsort. Timing results (for my specific usecase) are at https://github.com/piskvorky/gensim/issues/5 . The results are as expected: argsort is a must, forming (value, index) tuples explicitly and then sorting that with sort/heapq kills performance. Best, Radim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion