2011/12/23 Benjamin Hepp <[email protected]>: > Hi, > > I was wondering about the KMeans implementation in scikit-learn. From a > quick scan of the code I see that the main stuff is implemented in > Cython but it's spread in two different functions for the m- and the > e-step and the main loop is in python. I'm using my own KMeans routine > written as a Python C-module making use of OpenMP (it actually scales > really well).
What kind of algorithm have you implemented? The original batch variant where all the data is assumed to fit in memory or some streaming variant such as the mini batch kmeans? > For consistency I would like to switch to scikit-learn completely so I > could either make a new KMeans classifier (there's already KMeans and > MiniBatchKMeans), maybe call it KMeansParallel or so. Or I could modify > the existing KMeans classifier. What do you think is more appropriate? I would rather avoid such as generic name such as KMeansParallel (or would rather reserve it to a multiprocess / multimachine implementation using either joblib parallel or ipython parallel in case of distributed implementation). However experimenting using the OpenMP support of Cython (prange) in the default KMeans or MiniBatchKMeans implementations my be interesting. Please feel free to motivate such an work with comparative benchmarks (also giving the final value of the inertia for each implementations and the init method and number of random init for each). -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
