2011/10/4 Peter Prettenhofer <[email protected]>:
> @alexandre: thanks; basically yes -> I use the SGD classifier from
> Bolt instead of sklearn because I had to patch it up a bit.
>
> @ogirsel: have you tried to run MiniBatchKMeans on the unlabeled data?
> I'm curious whether that scales...

I did run minibatchkmeans with batchsize of 1000 over the complete
dataset (hence my recent checkins on this class to skip the
computation of the label assignments). It seems to work albeit
probably slower than the sofia-ml implementation (I would say at least
twice slower). I need to do further check on smaller datasets to
profile the scikit-learn impl.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to