On 17/03/15 08:39, Gael Varoquaux wrote:

> Being faster is a good thing. Do you have an idea of how much faster?

"It depends..."

- Typical NumPy codes for GMM and c-means are very affected by the use 
of temporary arrays, and they get this NumPy overhead to the same 
degree. The relative speed difference between NumPy codes will be less 
than the relative speed difference between equivalent C implementations.

- GNU libm requires 65 cycles per exp(x), MKL about 5, and yeppp less 
than 3. But the GMM code must be written to use vector math.

- In image segmentation we can often use lookup tables for the Gaussian 
likelihood. With 8 bit per color and GMM with diagonal covariance, we 
get away with lookup table of at most 768 elements (which is far less 
than the number of pixels in a typical image). This takes away the speed 
advantage of c-means.

- For the purpose of clustering (as opposed to density estimation) we 
can often use a short power series approximation for exp(x) because the 
exact Gaussian likelihood is not critical for the result.

c-means is in some sence a variation of the latter. If we replace the 
Gaussian kernel with a triangular or a bisquare kernel, which also can 
be seen as rough approximations, what we get is "fuzzy c-means".


Sturla



------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to