Hi Sturla and Yuan. Yesterday I looked into this and I would like to share with you my two cents.
Yuan Luo wrote: > Hi, > Does anyone know how I can make GMM parallel the fitting of some moderately > big matrix (say, 390,000 x 400) with 200 components? Actually, with scikit you can't do this out-of-the-box. Btw, your question didn't involve scikit indeed. Thus, I think that Sturla's reply perfectly described you almost all the possibilities you have. My additional contribution on this could be to point you out this book[1], where the author provides an example on how to parallelize EM using the Map-Reduce algorithm. -- [1]: http://lintool.github.io/MapReduceAlgorithms/MapReduce-book-final.pdf Sturla Molden wrote: > > I am not sure about GMM code in scikit-learn, but the EM-algorithm for GMMs > is very easy to vectorise. Yes, and actually this seems to be the "problem" here :-) The implementation of the GMM included in scikit is highly vectorized and exploits the great vectorisation features of numpy arrays. You were right when you said that under the hood the main `for` loop iterates over the number of components, but in scikit this is not done *explicitly* via Python loops. Thus, as for the scikit implementation, it seems to be quite difficult to parallelise this. That said, please note that in the initialisation step, one of the default parameters (i.e., `k`) computes the means of data by using of the KMeans algorithm. The `KMeans` algorithm in scikit allows for multiple processing (via the `n_jobs` parameter), but unfortunately it seems that there is no easy way to inject this additional parameter to the constructor in the `fit` function. Maybe, you could implement this on your own, and try make a PR :-) well, I know that this is not actually "parallelising GMM", but it is probably one step forward to it :-) HTH, Valerio ------------------------------------------------------------------------------ Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
