Re: finding most common elements between thousands of multiple arrays.

2009-07-10 Thread Scott David Daniels
Raymond Hettinger wrote: [Scott David Daniels] def most_frequent(arr, N): ... In Py2.4 and later, see heapq.nlargest(). I should have remembered this one In Py3.1, see collections.Counter(data).most_common(n) This one is from Py3.2, I think. --Scott David Daniels scott.dani...@acm.org --

Re: finding most common elements between thousands of multiple arrays.

2009-07-08 Thread Raymond Hettinger
[Scott David Daniels] def most_frequent(arr, N):      '''Return the top N (freq, val) elements in arr'''      counted = frequency(arr) # get an iterator for freq-val pairs      heap = []      # First, just fill up the array with the first N distinct      for i in range(N):          try:    

Re: finding most common elements between thousands of multiple arrays.

2009-07-07 Thread Andrew Henshaw
mclovin hanoo...@gmail.com wrote in message news:c5332c9b-2348-4194-bfa0-d70c77107...@x3g2000yqa.googlegroups.com... Currently I need to find the most common elements in thousands of arrays within one large array (arround 2 million instances with ~70k unique elements) so I set up a

Re: finding most common elements between thousands of multiple arrays.

2009-07-06 Thread Peter Otten
Scott David Daniels wrote: Scott David Daniels wrote: t = timeit.Timer('sum(part[:-1]==part[1:])', 'from __main__ import part') What happens if you calculate the sum in numpy? Try t = timeit.Timer('(part[:-1]==part[1:]).sum()', 'from __main__

Re: finding most common elements between thousands of multiple arrays.

2009-07-06 Thread Scott David Daniels
Peter Otten wrote: Scott David Daniels wrote: Scott David Daniels wrote: t = timeit.Timer('sum(part[:-1]==part[1:])', 'from __main__ import part') What happens if you calculate the sum in numpy? Try t = timeit.Timer('(part[:-1]==part[1:]).sum()',

Re: finding most common elements between thousands of multiple arrays.

2009-07-05 Thread Scott David Daniels
Scott David Daniels wrote: ... Here's a heuristic replacement for my previous frequency code: I've tried to mark where you could fudge numbers if the run time is at all close. Boy, I cannot let go. I did a bit of a test checking for cost to calculated number of discovered samples, and found

Re: finding most common elements between thousands of multiple arrays.

2009-07-05 Thread Steven D'Aprano
On Sun, 05 Jul 2009 17:30:58 -0700, Scott David Daniels wrote: Summary: when dealing with numpy, (or any bulk - individual values transitions), try several ways that you think are equivalent and _measure_. This advice is *much* more general than numpy -- it applies to any optimization

finding most common elements between thousands of multiple arrays.

2009-07-04 Thread mclovin
Currently I need to find the most common elements in thousands of arrays within one large array (arround 2 million instances with ~70k unique elements) so I set up a dictionary to handle the counting so when I am iterating I up the count on the corrosponding dictionary element. I then iterate

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Chris Rebert
On Sat, Jul 4, 2009 at 12:33 AM, mclovinhanoo...@gmail.com wrote: Currently I need to find the most common elements in thousands of arrays within one large array (arround 2 million instances with ~70k unique elements) so I set up a dictionary to handle the counting so when I am iterating  I

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Andre Engels
On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote: Currently I need to find the most common elements in thousands of arrays within one large array (arround 2 million instances with ~70k unique elements) so I set up a dictionary to handle the counting so when I am iterating  I

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Vilya Harvey
2009/7/4 Andre Engels andreeng...@gmail.com: On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote: Currently I need to find the most common elements in thousands of arrays within one large array (arround 2 million instances with ~70k unique elements) so I set up a dictionary to

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Neil Crighton
You can join all your arrays into a single big array with concatenate. import numpy as np a = np.concatenate(array_of_arrays) Then count the number of occurrances of each unique element using this trick with searchsorted. This should be pretty fast. a.sort() unique_a = np.unique(a) count =

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Steven D'Aprano
On Sat, 04 Jul 2009 10:55:44 +0100, Vilya Harvey wrote: 2009/7/4 Andre Engels andreeng...@gmail.com: On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote: Currently I need to find the most common elements in thousands of arrays within one large array (arround 2 million instances

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Scott David Daniels
Vilya Harvey wrote: 2009/7/4 Andre Engels andreeng...@gmail.com: On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote: Currently I need to find the most common elements in thousands of arrays within one large array (arround 2 million instances with ~70k unique elements)... Try

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Steven D'Aprano
On Sat, 04 Jul 2009 13:42:06 +, Steven D'Aprano wrote: On Sat, 04 Jul 2009 10:55:44 +0100, Vilya Harvey wrote: 2009/7/4 Andre Engels andreeng...@gmail.com: On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote: Currently I need to find the most common elements in thousands of

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread mclovin
OK then. I will try some of the strategies here but I guess things arent looking too good. I need to run this over a dataset that someone pickled. I need to run this 480,000 times so you can see my frustration. So it doesn't need to be real time but it would be nice it was done sorting this month.

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Vilya Harvey
2009/7/4 Steven D'Aprano st...@remove-this-cybersource.com.au: On Sat, 04 Jul 2009 13:42:06 +, Steven D'Aprano wrote: On Sat, 04 Jul 2009 10:55:44 +0100, Vilya Harvey wrote: 2009/7/4 Andre Engels andreeng...@gmail.com: On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote:

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Lie Ryan
mclovin wrote: OK then. I will try some of the strategies here but I guess things arent looking too good. I need to run this over a dataset that someone pickled. I need to run this 480,000 times so you can see my frustration. So it doesn't need to be real time but it would be nice it was done

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Scott David Daniels
mclovin wrote: OK then. I will try some of the strategies here but I guess things arent looking too good. I need to run this over a dataset that someone pickled. I need to run this 480,000 times so you can see my frustration. So it doesn't need to be real time but it would be nice it was done

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread mclovin
On Jul 4, 12:51 pm, Scott David Daniels scott.dani...@acm.org wrote: mclovin wrote: OK then. I will try some of the strategies here but I guess things arent looking too good. I need to run this over a dataset that someone pickled. I need to run this 480,000 times so you can see my

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread MRAB
mclovin wrote: [snip] like I said I need to do this 480,000 times so to get this done realistically I need to analyse about 5 a second. It appears that the average matrix size contains about 15 million elements. I threaded my program using your code and I did about 1,000 in an hour so it is

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread mclovin
On Jul 4, 3:29 pm, MRAB pyt...@mrabarnett.plus.com wrote: mclovin wrote: [snip] like I said I need to do this 480,000 times so to get this done realistically I need to analyse about 5 a second. It appears that the average matrix size contains about 15 million elements. I threaded my

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Scott David Daniels
mclovin wrote: On Jul 4, 12:51 pm, Scott David Daniels scott.dani...@acm.org wrote: mclovin wrote: OK then. I will try some of the strategies here but I guess things arent looking too good. I need to run this over a dataset that someone pickled. I need to run this 480,000 times so you can see

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread MRAB
mclovin wrote: On Jul 4, 3:29 pm, MRAB pyt...@mrabarnett.plus.com wrote: mclovin wrote: [snip] like I said I need to do this 480,000 times so to get this done realistically I need to analyse about 5 a second. It appears that the average matrix size contains about 15 million elements. I

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Emile van Sebille
On 7/4/2009 12:33 AM mclovin said... Currently I need to find the most common elements in thousands of arrays within one large array (arround 2 million instances with ~70k unique elements) so I set up a dictionary to handle the counting so when I am iterating I ** up the count on the

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Steven D'Aprano
On Sat, 04 Jul 2009 15:06:29 -0700, mclovin wrote: like I said I need to do this 480,000 times so to get this done realistically I need to analyse about 5 a second. It appears that the average matrix size contains about 15 million elements. Have you considered recording the element counts as

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Steven D'Aprano
On Sat, 04 Jul 2009 07:19:48 -0700, Scott David Daniels wrote: Actually the next step is to maintain a min-heap as you run down the sorted array. Something like: Not bad. I did some tests on it, using the following sample data: arr = np.array([xrange(i, i+7000) for i in xrange(143)] +