On Sun, Jan 15, 2012 at 07:39:00PM +0100, Philipp Singer wrote: > The problem is that my representation is very sparse so I have a huge > amount of zeros.
That's actually good: some of our estimators are able to use a sparse representation to speed up computation. > Furthermore the dataset is skewed so one class takes a huge amount of > labels and another one is also pretty high. > I have successfully used logistic regression and I could achieve a > recall of about (in the best case dataset) 65%. I am pretty happy with > that result. But when looking at the confusion matrix the problem is > that many examples get mapped to the large class. Use "class_weight='auto'" in the logistic regression to counter the effect of un-balanced classes. For SVMs, the following example shows the trick: http://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane_unbalanced.html HTH, Gael ------------------------------------------------------------------------------ RSA(R) Conference 2012 Mar 27 - Feb 2 Save $400 by Jan. 27 Register now! http://p.sf.net/sfu/rsa-sfdev2dev2 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
