Hi Sergey,

There is a sample_weights option (not very well documented) in the random 
forest classifier that might help. You might want to check out the SVC example 
to see the sample_weights format.
http://scikit-learn.org/stable/auto_examples/svm/plot_weighted_samples.html

You can provide different weights to different classes (for e.g., inversely 
proportional to the number of samples). 

-Manish

On Jul 12, 2013, at 4:40 PM, Sergey Feldman <sergeyfeld...@gmail.com> wrote:

> I'm dealing with a 50-class classification problem with extremely unbalanced 
> classes.  The smallest class has about 1000 samples and the largest has 
> 500,000.  The random forest I've trained is being heavily skewed towards the 
> big classes.  
> 
> Is there a good way to deal with this kind of problem in sklearn as of now?  
> Or is there room to implement some kind of stratified bootstrap strategy or a 
> weighting strategy (as in here, for example)?
> 
> What other non-linear classifiers in sklearn would be good for this kind of 
> dataset?  It's about 2 million examples in 500+ dimensions.
> 
> Thanks,
> Sergey
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to