On Sun, Jan 15, 2012 at 07:39:00PM +0100, Philipp Singer wrote:
> The problem is that my representation is very sparse so I have a huge
> amount of zeros.

That's actually good: some of our estimators are able to use a sparse
representation to speed up computation.

> Furthermore the dataset is skewed so one class takes a huge amount of 
> labels and another one is also pretty high.

> I have successfully used logistic regression and I could achieve a 
> recall of about (in the best case dataset) 65%. I am pretty happy with 
> that result. But when looking at the confusion matrix the problem is 
> that many examples get mapped to the large class.

Use "class_weight='auto'" in the logistic regression to counter the
effect of un-balanced classes. 

For SVMs, the following example shows the trick:
http://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane_unbalanced.html

HTH,

Gael

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to