Raymond Hettinger wrote:
[Scott David Daniels]
def most_frequent(arr, N): ...
In Py2.4 and later, see heapq.nlargest().
I should have remembered this one
In Py3.1, see collections.Counter(data).most_common(n)
This one is from Py3.2, I think.
--Scott David Daniels
scott.dani...@acm.org
--
[Scott David Daniels]
def most_frequent(arr, N):
'''Return the top N (freq, val) elements in arr'''
counted = frequency(arr) # get an iterator for freq-val pairs
heap = []
# First, just fill up the array with the first N distinct
for i in range(N):
try:
mclovin hanoo...@gmail.com wrote in message
news:c5332c9b-2348-4194-bfa0-d70c77107...@x3g2000yqa.googlegroups.com...
Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)
so I set up a
Scott David Daniels wrote:
Scott David Daniels wrote:
t = timeit.Timer('sum(part[:-1]==part[1:])',
'from __main__ import part')
What happens if you calculate the sum in numpy? Try
t = timeit.Timer('(part[:-1]==part[1:]).sum()',
'from __main__
Peter Otten wrote:
Scott David Daniels wrote:
Scott David Daniels wrote:
t = timeit.Timer('sum(part[:-1]==part[1:])',
'from __main__ import part')
What happens if you calculate the sum in numpy? Try
t = timeit.Timer('(part[:-1]==part[1:]).sum()',
Scott David Daniels wrote:
... Here's a heuristic replacement for my previous frequency code:
I've tried to mark where you could fudge numbers if the run time
is at all close.
Boy, I cannot let go. I did a bit of a test checking for cost to
calculated number of discovered samples, and found
On Sun, 05 Jul 2009 17:30:58 -0700, Scott David Daniels wrote:
Summary: when dealing with numpy, (or any bulk - individual values
transitions), try several ways that you think are equivalent and
_measure_.
This advice is *much* more general than numpy -- it applies to any
optimization
Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)
so I set up a dictionary to handle the counting so when I am
iterating I up the count on the corrosponding dictionary element. I
then iterate
On Sat, Jul 4, 2009 at 12:33 AM, mclovinhanoo...@gmail.com wrote:
Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)
so I set up a dictionary to handle the counting so when I am
iterating I
On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote:
Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)
so I set up a dictionary to handle the counting so when I am
iterating I
2009/7/4 Andre Engels andreeng...@gmail.com:
On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote:
Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)
so I set up a dictionary to
You can join all your arrays into a single big array with concatenate.
import numpy as np
a = np.concatenate(array_of_arrays)
Then count the number of occurrances of each unique element using this trick
with searchsorted. This should be pretty fast.
a.sort()
unique_a = np.unique(a)
count =
On Sat, 04 Jul 2009 10:55:44 +0100, Vilya Harvey wrote:
2009/7/4 Andre Engels andreeng...@gmail.com:
On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote:
Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances
Vilya Harvey wrote:
2009/7/4 Andre Engels andreeng...@gmail.com:
On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote:
Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)...
Try
On Sat, 04 Jul 2009 13:42:06 +, Steven D'Aprano wrote:
On Sat, 04 Jul 2009 10:55:44 +0100, Vilya Harvey wrote:
2009/7/4 Andre Engels andreeng...@gmail.com:
On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote:
Currently I need to find the most common elements in thousands of
OK then. I will try some of the strategies here but I guess things
arent looking too good. I need to run this over a dataset that someone
pickled. I need to run this 480,000 times so you can see my
frustration. So it doesn't need to be real time but it would be nice
it was done sorting this month.
2009/7/4 Steven D'Aprano st...@remove-this-cybersource.com.au:
On Sat, 04 Jul 2009 13:42:06 +, Steven D'Aprano wrote:
On Sat, 04 Jul 2009 10:55:44 +0100, Vilya Harvey wrote:
2009/7/4 Andre Engels andreeng...@gmail.com:
On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote:
mclovin wrote:
OK then. I will try some of the strategies here but I guess things
arent looking too good. I need to run this over a dataset that someone
pickled. I need to run this 480,000 times so you can see my
frustration. So it doesn't need to be real time but it would be nice
it was done
mclovin wrote:
OK then. I will try some of the strategies here but I guess things
arent looking too good. I need to run this over a dataset that someone
pickled. I need to run this 480,000 times so you can see my
frustration. So it doesn't need to be real time but it would be nice
it was done
On Jul 4, 12:51 pm, Scott David Daniels scott.dani...@acm.org wrote:
mclovin wrote:
OK then. I will try some of the strategies here but I guess things
arent looking too good. I need to run this over a dataset that someone
pickled. I need to run this 480,000 times so you can see my
mclovin wrote:
[snip]
like I said I need to do this 480,000 times so to get this done
realistically I need to analyse about 5 a second. It appears that the
average matrix size contains about 15 million elements.
I threaded my program using your code and I did about 1,000 in an hour
so it is
On Jul 4, 3:29 pm, MRAB pyt...@mrabarnett.plus.com wrote:
mclovin wrote:
[snip]
like I said I need to do this 480,000 times so to get this done
realistically I need to analyse about 5 a second. It appears that the
average matrix size contains about 15 million elements.
I threaded my
mclovin wrote:
On Jul 4, 12:51 pm, Scott David Daniels scott.dani...@acm.org wrote:
mclovin wrote:
OK then. I will try some of the strategies here but I guess things
arent looking too good. I need to run this over a dataset that someone
pickled. I need to run this 480,000 times so you can see
mclovin wrote:
On Jul 4, 3:29 pm, MRAB pyt...@mrabarnett.plus.com wrote:
mclovin wrote:
[snip]
like I said I need to do this 480,000 times so to get this done
realistically I need to analyse about 5 a second. It appears that the
average matrix size contains about 15 million elements.
I
On 7/4/2009 12:33 AM mclovin said...
Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)
so I set up a dictionary to handle the counting so when I am
iterating I
** up the count on the
On Sat, 04 Jul 2009 15:06:29 -0700, mclovin wrote:
like I said I need to do this 480,000 times so to get this done
realistically I need to analyse about 5 a second. It appears that the
average matrix size contains about 15 million elements.
Have you considered recording the element counts as
On Sat, 04 Jul 2009 07:19:48 -0700, Scott David Daniels wrote:
Actually the next step is to maintain a min-heap as you run down the
sorted array. Something like:
Not bad.
I did some tests on it, using the following sample data:
arr = np.array([xrange(i, i+7000) for i in xrange(143)] +
27 matches
Mail list logo