Re: [Numpy-discussion] fast method to to count a particular value in a large matrix

2012-02-06 Thread Naresh
David: from 9-10 minutes to about 2-3 seconds, it's amazing!

Thanks,
Naresh

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast method to to count a particular value in a large matrix

2012-02-06 Thread David Cournapeau
On Mon, Feb 6, 2012 at 1:17 AM, Wes McKinney wesmck...@gmail.com wrote:


 Whenever I get motivated enough I'm going to make a pull request on
 NumPy with something like khash.h and start fixing all the O(N log N)
 algorithms floating around that ought to be O(N). NumPy should really
 have a match function similar to R's and a lot of other things.

khash.h is not the only thing that I'd like to use in numpy if I had
more time :)

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast method to to count a particular value in a large matrix

2012-02-05 Thread David Verelst
Just out of curiosity, what speed-up factor did you achieve?

Regards,
David

On 04/02/12 22:20, Naresh wrote:
 Warren Weckesserwarren.weckesserat  enthought.com  writes:


 On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Rootben.rootat  ou.edu  wrote:


 On Saturday, February 4, 2012, Naresh Painpaiat  uark.edu  wrote:  I am
 somewhat new to Python (been coding with Matlab mostly). I am trying to
 simplify (and expedite) a piece of code that is currently a bottleneck in a
 larger
 code.  I have a large array (7000 rows x 4500 columns) titled say, abc, and
 I am trying  to find a fast method to count the number of instances of each
 unique value within  it. All unique values are stored in a variable, say,
 unique_elem. My current code

 is as follows:  import numpy as np  #allocate space for storing element
 count  elem_count = zeros((len(unique_elem),1))  #loop through and count 
 number
 of unique_elem  for i in range(len(unique_elem)):

 elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x
 in [unique_elem[i]])))  This loop is bottleneck because I have about 850 
 unique
 elements and it takes  about 9-10 minutes. Can you suggest a faster way to do
 this?

 Thank you,  Naresh
 no.unique() can return indices and reverse indices.  It would be trivial to
 histogram the reverse indices using np.histogram().

 Instead of histogram(), you can use bincount() on the inverse indices:u, inv 
 =
 np.unique(abc, return_inverse=True)n = np.bincount(inv)u will be an array of 
 the
 unique elements, and n will be an array of the corresponding number of
 occurrences.Warren



 ___
 NumPy-Discussion mailing list
 NumPy-Discussionat  scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 The histogram() solution works perfect since unique_elem is ordered. I
 appreciate everyone's help.


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast method to to count a particular value in a large matrix

2012-02-05 Thread Wes McKinney
On Sun, Feb 5, 2012 at 7:02 PM, David Verelst david.vere...@gmail.com wrote:
 Just out of curiosity, what speed-up factor did you achieve?

 Regards,
 David

 On 04/02/12 22:20, Naresh wrote:
 Warren Weckesserwarren.weckesserat  enthought.com  writes:


 On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Rootben.rootat  ou.edu  wrote:


 On Saturday, February 4, 2012, Naresh Painpaiat  uark.edu  wrote:  I am
 somewhat new to Python (been coding with Matlab mostly). I am trying to
 simplify (and expedite) a piece of code that is currently a bottleneck in a
 larger
 code.  I have a large array (7000 rows x 4500 columns) titled say, abc, 
 and
 I am trying  to find a fast method to count the number of instances of each
 unique value within  it. All unique values are stored in a variable, say,
 unique_elem. My current code

 is as follows:  import numpy as np  #allocate space for storing element
 count  elem_count = zeros((len(unique_elem),1))  #loop through and count 
 number
 of unique_elem  for i in range(len(unique_elem)):

     elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x
 in [unique_elem[i]])))  This loop is bottleneck because I have about 850 
 unique
 elements and it takes  about 9-10 minutes. Can you suggest a faster way to 
 do
 this?

 Thank you,  Naresh
 no.unique() can return indices and reverse indices.  It would be trivial to
 histogram the reverse indices using np.histogram().

 Instead of histogram(), you can use bincount() on the inverse indices:u, 
 inv =
 np.unique(abc, return_inverse=True)n = np.bincount(inv)u will be an array of 
 the
 unique elements, and n will be an array of the corresponding number of
 occurrences.Warren



 ___
 NumPy-Discussion mailing list
 NumPy-Discussionat  scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 The histogram() solution works perfect since unique_elem is ordered. I
 appreciate everyone's help.


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

np.histogram works pretty well. I'm getting speeds something like 1300
ms on float64 data. A hash table-based solution is faster (no big
surprise here), about 800ms so in the ballpark of 40% faster.

Whenever I get motivated enough I'm going to make a pull request on
NumPy with something like khash.h and start fixing all the O(N log N)
algorithms floating around that ought to be O(N). NumPy should really
have a match function similar to R's and a lot of other things.

- Wes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast method to to count a particular value in a large matrix

2012-02-04 Thread Benjamin Root
On Saturday, February 4, 2012, Naresh Pai n...@uark.edu wrote:
 I am somewhat new to Python (been coding with Matlab mostly). I am trying
to
 simplify (and expedite) a piece of code that is currently a bottleneck in
a larger
 code.
 I have a large array (7000 rows x 4500 columns) titled say, abc, and I am
trying
 to find a fast method to count the number of instances of each unique
value within
 it. All unique values are stored in a variable, say, unique_elem. My
current code
 is as follows:
 import numpy as np
 #allocate space for storing element count
 elem_count = zeros((len(unique_elem),1))
 #loop through and count number of unique_elem
 for i in range(len(unique_elem)):
elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x
in [unique_elem[i]])))
 This loop is bottleneck because I have about 850 unique elements and it
takes
 about 9-10 minutes. Can you suggest a faster way to do this?
 Thank you,
 Naresh


no.unique() can return indices and reverse indices.  It would be trivial to
histogram the reverse indices using np.histogram().

Does that help?

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast method to to count a particular value in a large matrix

2012-02-04 Thread Jerome Kieffer
On Sat, 4 Feb 2012 14:35:08 -0600
Benjamin Root ben.r...@ou.edu wrote:

 
 no.unique() can return indices and reverse indices.  It would be trivial to
 histogram the reverse indices using np.histogram().

Even np.histogram(abc,unique_elem) or something like this. 
Works if unique_elem is ordered.

np.histogram(abc,list(unique_elem)+[unique_elem[-1]+1])[0].reshape(-1,1)
is 40x faster and gives the same result.

-- 
Jérôme Kieffer
Data analysis unit - ESRF
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast method to to count a particular value in a large matrix

2012-02-04 Thread Warren Weckesser
On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Saturday, February 4, 2012, Naresh Pai n...@uark.edu wrote:
  I am somewhat new to Python (been coding with Matlab mostly). I am
 trying to
  simplify (and expedite) a piece of code that is currently a bottleneck
 in a larger
  code.
  I have a large array (7000 rows x 4500 columns) titled say, abc, and I
 am trying
  to find a fast method to count the number of instances of each unique
 value within
  it. All unique values are stored in a variable, say, unique_elem. My
 current code
  is as follows:
  import numpy as np
  #allocate space for storing element count
  elem_count = zeros((len(unique_elem),1))
  #loop through and count number of unique_elem
  for i in range(len(unique_elem)):
 elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x
 in [unique_elem[i]])))
  This loop is bottleneck because I have about 850 unique elements and it
 takes
  about 9-10 minutes. Can you suggest a faster way to do this?
  Thank you,
  Naresh
 

 no.unique() can return indices and reverse indices.  It would be trivial
 to histogram the reverse indices using np.histogram().



Instead of histogram(), you can use bincount() on the inverse indices:

u, inv = np.unique(abc, return_inverse=True)
n = np.bincount(inv)


u will be an array of the unique elements, and n will be an array of the
corresponding number of occurrences.

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast method to to count a particular value in a large matrix

2012-02-04 Thread Naresh
Warren Weckesser warren.weckesser at enthought.com writes:

 
 
 On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Root ben.root at ou.edu wrote:
 
 
 On Saturday, February 4, 2012, Naresh Pai npai at uark.edu wrote: I am 
somewhat new to Python (been coding with Matlab mostly). I am trying to 
 
  simplify (and expedite) a piece of code that is currently a bottleneck in a 
larger 
  code.  I have a large array (7000 rows x 4500 columns) titled say, abc, 
  and 
I am trying  to find a fast method to count the number of instances of each 
unique value within  it. All unique values are stored in a variable, say, 
unique_elem. My current code 
 
 
  is as follows: import numpy as np #allocate space for storing element 
count elem_count = zeros((len(unique_elem),1)) #loop through and count number 
of unique_elem for i in range(len(unique_elem)):
 
 
     elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x 
in [unique_elem[i]]))) This loop is bottleneck because I have about 850 unique 
elements and it takes  about 9-10 minutes. Can you suggest a faster way to do 
this? 
 
 
  Thank you, Naresh
 no.unique() can return indices and reverse indices.  It would be trivial to 
histogram the reverse indices using np.histogram().
 
 
 Instead of histogram(), you can use bincount() on the inverse indices:u, inv 
 = 
np.unique(abc, return_inverse=True)n = np.bincount(inv)u will be an array of 
the 
unique elements, and n will be an array of the corresponding number of 
occurrences.Warren
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion at scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 

The histogram() solution works perfect since unique_elem is ordered. I 
appreciate everyone's help.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion