2011/10/4 Peter Prettenhofer <[email protected]>: > @alexandre: thanks; basically yes -> I use the SGD classifier from > Bolt instead of sklearn because I had to patch it up a bit. > > @ogirsel: have you tried to run MiniBatchKMeans on the unlabeled data? > I'm curious whether that scales...
I did run minibatchkmeans with batchsize of 1000 over the complete dataset (hence my recent checkins on this class to skip the computation of the label assignments). It seems to work albeit probably slower than the sofia-ml implementation (I would say at least twice slower). I need to do further check on smaller datasets to profile the scikit-learn impl. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
