Minibatch K-means should work just fine. Alternatively there are hebbian K-means approaches which are quite easy to implement and should be fast (though I think it basically boils down to minibatch K-means, I haven't looked at details of minibatch K-means). There is an approach here http://www.iro.umontreal.ca/~memisevr/code.html that could be useful once the website is fixed...
I have run the hebbian K-means approach over CIFAR10, so it should work for MNIST. On Thu, Jun 18, 2015 at 8:47 AM, Vince Fernando <y...@vincefernando.co.uk> wrote: > What is best routine in scikit-learn (or elsewhere) for clustering large > data sets such as MNIST? > I asked a similar question last year but would like to hear an update. > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general