2013/4/26 Eustache DIEMERT <eusta...@diemert.fr>:
> I tried a very naive bootstrapping approach (learn RF on supervised data,
> predict on unsupervised, then learn RF on all) but with no luck either.

Sounds like you've now got a self-training algorithm with only one
iteration. You may have more luck with a proper self-training/EM
algorithm, e.g. DCEM [1] (works sometimes, esp. with the error
modelling) or CVEM [2] (not tried that one yet). Such algorithms take
the classifier's confidence into account when doing the next fit, by
doing something like class_weight=predict_proba. You can also try
learning extra features with k-means [3], or just try the
LabelPropagation algorithm in scikit-learn.

[1] http://ama.liglab.fr/~amini/Publis/SemiSupImpSpr.pdf
[2] 
http://research.microsoft.com/en-us/um/people/xiaohe/nips08/paperaccepted/nips2008wsl1_02.pdf
[3] http://fastml.com/the-secret-of-the-big-guys/

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to