Re: [Numpy-discussion] fast method to to count a particular value in a large matrix
David: from 9-10 minutes to about 2-3 seconds, it's amazing! Thanks, Naresh ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast method to to count a particular value in a large matrix
On Mon, Feb 6, 2012 at 1:17 AM, Wes McKinney wesmck...@gmail.com wrote: Whenever I get motivated enough I'm going to make a pull request on NumPy with something like khash.h and start fixing all the O(N log N) algorithms floating around that ought to be O(N). NumPy should really have a match function similar to R's and a lot of other things. khash.h is not the only thing that I'd like to use in numpy if I had more time :) David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast method to to count a particular value in a large matrix
Just out of curiosity, what speed-up factor did you achieve? Regards, David On 04/02/12 22:20, Naresh wrote: Warren Weckesserwarren.weckesserat enthought.com writes: On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Rootben.rootat ou.edu wrote: On Saturday, February 4, 2012, Naresh Painpaiat uark.edu wrote: I am somewhat new to Python (been coding with Matlab mostly). I am trying to simplify (and expedite) a piece of code that is currently a bottleneck in a larger code. I have a large array (7000 rows x 4500 columns) titled say, abc, and I am trying to find a fast method to count the number of instances of each unique value within it. All unique values are stored in a variable, say, unique_elem. My current code is as follows: import numpy as np #allocate space for storing element count elem_count = zeros((len(unique_elem),1)) #loop through and count number of unique_elem for i in range(len(unique_elem)): elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x in [unique_elem[i]]))) This loop is bottleneck because I have about 850 unique elements and it takes about 9-10 minutes. Can you suggest a faster way to do this? Thank you, Naresh no.unique() can return indices and reverse indices. It would be trivial to histogram the reverse indices using np.histogram(). Instead of histogram(), you can use bincount() on the inverse indices:u, inv = np.unique(abc, return_inverse=True)n = np.bincount(inv)u will be an array of the unique elements, and n will be an array of the corresponding number of occurrences.Warren ___ NumPy-Discussion mailing list NumPy-Discussionat scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion The histogram() solution works perfect since unique_elem is ordered. I appreciate everyone's help. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast method to to count a particular value in a large matrix
On Sun, Feb 5, 2012 at 7:02 PM, David Verelst david.vere...@gmail.com wrote: Just out of curiosity, what speed-up factor did you achieve? Regards, David On 04/02/12 22:20, Naresh wrote: Warren Weckesserwarren.weckesserat enthought.com writes: On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Rootben.rootat ou.edu wrote: On Saturday, February 4, 2012, Naresh Painpaiat uark.edu wrote: I am somewhat new to Python (been coding with Matlab mostly). I am trying to simplify (and expedite) a piece of code that is currently a bottleneck in a larger code. I have a large array (7000 rows x 4500 columns) titled say, abc, and I am trying to find a fast method to count the number of instances of each unique value within it. All unique values are stored in a variable, say, unique_elem. My current code is as follows: import numpy as np #allocate space for storing element count elem_count = zeros((len(unique_elem),1)) #loop through and count number of unique_elem for i in range(len(unique_elem)): elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x in [unique_elem[i]]))) This loop is bottleneck because I have about 850 unique elements and it takes about 9-10 minutes. Can you suggest a faster way to do this? Thank you, Naresh no.unique() can return indices and reverse indices. It would be trivial to histogram the reverse indices using np.histogram(). Instead of histogram(), you can use bincount() on the inverse indices:u, inv = np.unique(abc, return_inverse=True)n = np.bincount(inv)u will be an array of the unique elements, and n will be an array of the corresponding number of occurrences.Warren ___ NumPy-Discussion mailing list NumPy-Discussionat scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion The histogram() solution works perfect since unique_elem is ordered. I appreciate everyone's help. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion np.histogram works pretty well. I'm getting speeds something like 1300 ms on float64 data. A hash table-based solution is faster (no big surprise here), about 800ms so in the ballpark of 40% faster. Whenever I get motivated enough I'm going to make a pull request on NumPy with something like khash.h and start fixing all the O(N log N) algorithms floating around that ought to be O(N). NumPy should really have a match function similar to R's and a lot of other things. - Wes ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast method to to count a particular value in a large matrix
On Saturday, February 4, 2012, Naresh Pai n...@uark.edu wrote: I am somewhat new to Python (been coding with Matlab mostly). I am trying to simplify (and expedite) a piece of code that is currently a bottleneck in a larger code. I have a large array (7000 rows x 4500 columns) titled say, abc, and I am trying to find a fast method to count the number of instances of each unique value within it. All unique values are stored in a variable, say, unique_elem. My current code is as follows: import numpy as np #allocate space for storing element count elem_count = zeros((len(unique_elem),1)) #loop through and count number of unique_elem for i in range(len(unique_elem)): elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x in [unique_elem[i]]))) This loop is bottleneck because I have about 850 unique elements and it takes about 9-10 minutes. Can you suggest a faster way to do this? Thank you, Naresh no.unique() can return indices and reverse indices. It would be trivial to histogram the reverse indices using np.histogram(). Does that help? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast method to to count a particular value in a large matrix
On Sat, 4 Feb 2012 14:35:08 -0600 Benjamin Root ben.r...@ou.edu wrote: no.unique() can return indices and reverse indices. It would be trivial to histogram the reverse indices using np.histogram(). Even np.histogram(abc,unique_elem) or something like this. Works if unique_elem is ordered. np.histogram(abc,list(unique_elem)+[unique_elem[-1]+1])[0].reshape(-1,1) is 40x faster and gives the same result. -- Jérôme Kieffer Data analysis unit - ESRF ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast method to to count a particular value in a large matrix
On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Root ben.r...@ou.edu wrote: On Saturday, February 4, 2012, Naresh Pai n...@uark.edu wrote: I am somewhat new to Python (been coding with Matlab mostly). I am trying to simplify (and expedite) a piece of code that is currently a bottleneck in a larger code. I have a large array (7000 rows x 4500 columns) titled say, abc, and I am trying to find a fast method to count the number of instances of each unique value within it. All unique values are stored in a variable, say, unique_elem. My current code is as follows: import numpy as np #allocate space for storing element count elem_count = zeros((len(unique_elem),1)) #loop through and count number of unique_elem for i in range(len(unique_elem)): elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x in [unique_elem[i]]))) This loop is bottleneck because I have about 850 unique elements and it takes about 9-10 minutes. Can you suggest a faster way to do this? Thank you, Naresh no.unique() can return indices and reverse indices. It would be trivial to histogram the reverse indices using np.histogram(). Instead of histogram(), you can use bincount() on the inverse indices: u, inv = np.unique(abc, return_inverse=True) n = np.bincount(inv) u will be an array of the unique elements, and n will be an array of the corresponding number of occurrences. Warren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast method to to count a particular value in a large matrix
Warren Weckesser warren.weckesser at enthought.com writes: On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Root ben.root at ou.edu wrote: On Saturday, February 4, 2012, Naresh Pai npai at uark.edu wrote: I am somewhat new to Python (been coding with Matlab mostly). I am trying to simplify (and expedite) a piece of code that is currently a bottleneck in a larger code. I have a large array (7000 rows x 4500 columns) titled say, abc, and I am trying to find a fast method to count the number of instances of each unique value within it. All unique values are stored in a variable, say, unique_elem. My current code is as follows: import numpy as np #allocate space for storing element count elem_count = zeros((len(unique_elem),1)) #loop through and count number of unique_elem for i in range(len(unique_elem)): elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x in [unique_elem[i]]))) This loop is bottleneck because I have about 850 unique elements and it takes about 9-10 minutes. Can you suggest a faster way to do this? Thank you, Naresh no.unique() can return indices and reverse indices. It would be trivial to histogram the reverse indices using np.histogram(). Instead of histogram(), you can use bincount() on the inverse indices:u, inv = np.unique(abc, return_inverse=True)n = np.bincount(inv)u will be an array of the unique elements, and n will be an array of the corresponding number of occurrences.Warren ___ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion The histogram() solution works perfect since unique_elem is ordered. I appreciate everyone's help. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion