2011/12/23 Benjamin Hepp <[email protected]>:
> Hi,
>
> I was wondering about the KMeans implementation in scikit-learn. From a
> quick scan of the code I see that the main stuff is implemented in
> Cython but it's spread in two different functions for the m- and the
> e-step and the main loop is in python. I'm using my own KMeans routine
> written as a Python C-module making use of OpenMP (it actually scales
> really well).

What kind of algorithm have you implemented? The original batch
variant where all the data is assumed to fit in memory or some
streaming variant such as the mini batch kmeans?

> For consistency I would like to switch to scikit-learn completely so I
> could either make a new KMeans classifier (there's already KMeans and
> MiniBatchKMeans), maybe call it KMeansParallel or so. Or I could modify
> the existing KMeans classifier. What do you think is more appropriate?

I would rather avoid such as generic name such as KMeansParallel (or
would rather reserve it to a multiprocess / multimachine
implementation using either joblib parallel or ipython parallel in
case of distributed implementation).

However experimenting using the OpenMP support of Cython (prange) in
the default KMeans or MiniBatchKMeans implementations my be
interesting.

Please feel free to motivate such an work with comparative benchmarks
(also giving the final value of the inertia for each implementations
and the init method and number of random init for each).

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to