2012/9/12 Christian Jauvin <[email protected]>:
>> May I ask why you think you need this?
>
> It was my naive assumption of how to tackle class imbalance with an
> SGD classifier, but as Olivier already suggested, using class_weight
> makes more sense for this. Is there another mechanism or strategy that
> I should be aware of you think?

For SGD you can sub-sample the over-represented classes (you further
get a speed benefit by doing so as you don't use all the data). It
would be great to have that option builtin in SGD models.

You can also oversample the under-represented classes (but without the
speed benefit of undersampling).

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to