Minibatch K-means should work just fine. Alternatively there are hebbian K-means approaches which are quite easy to implement and should be fast (though I think it basically boils down to minibatch K-means, I haven't looked at details of minibatch K-means). There is an approach here http://www.iro.umontreal.ca/~memisevr/code.html that could be useful once the website is fixed...
I have run the hebbian K-means approach over CIFAR10, so it should work for MNIST. On Thu, Jun 18, 2015 at 8:47 AM, Vince Fernando <[email protected]> wrote: > What is best routine in scikit-learn (or elsewhere) for clustering large > data sets such as MNIST? > I asked a similar question last year but would like to hear an update. > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
