Hi Sturla and Yuan.
Yesterday I looked into this and I would like to share with you my two cents.



Yuan Luo wrote:
> Hi,
> Does anyone know how I can make GMM parallel the fitting of some moderately
> big matrix (say, 390,000 x 400) with 200 components?

Actually, with scikit you can't do this out-of-the-box. Btw, your question 
didn't involve scikit indeed.
Thus, I think that Sturla's reply perfectly described you almost all the 
possibilities you have.

My additional contribution on this could be to point you out this book[1], 
where the author provides an example 
on how to parallelize EM using the Map-Reduce algorithm.
--
[1]: http://lintool.github.io/MapReduceAlgorithms/MapReduce-book-final.pdf

Sturla Molden wrote:
> 
> I am not sure about GMM code in scikit-learn, but the EM-algorithm for GMMs
> is very easy to vectorise. 

Yes, and actually this seems to be the "problem" here :-)
The implementation of the GMM included in scikit is highly vectorized and 
exploits the great vectorisation features of numpy arrays.
You were right when you said that under the hood the main `for` loop iterates 
over the number of components, but in scikit this is 
not done *explicitly* via Python loops.

Thus, as for the scikit implementation, it seems to be quite difficult to 
parallelise this.

That said, please note that in the initialisation step, one of the default 
parameters (i.e., `k`) computes the means of data by using of the KMeans 
algorithm.
The `KMeans` algorithm in scikit allows for multiple processing (via the 
`n_jobs` parameter), but unfortunately it seems that there is no easy
way to inject this additional parameter to the constructor in the `fit` 
function.

Maybe, you could implement this on your own, and try make a PR :-)

well, I know that this is not actually "parallelising GMM", but it is probably 
one step forward to it :-)

HTH,
Valerio






------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to