Fellow sklearners,

I am working on a classification problem with an unbalanced data set and
have been successful using SVM classifiers with the class_weight option.

I have also tried Random Forests and am getting a decent ROC performance
but I am hoping to get a performance improvement by using Weighted or
Balanced Random Forests as suggested in this paper.
http://www.stat.berkeley.edu/tech-reports/666.pdf

I don't see any implementation of these options but I might be mistaken so
I wanted to ask the community. Also, I am willing to write code and
contribute back if this will be useful to other folks.

I have also thought about balancing the data using up/down sampling the
minority/majority class (with or without replacement) and even SMOTE but
couldn't find those implementation in the scikit-learn library yet.  The
modified Random Forests seem to outperform these methods according to the
paper, hence I am interested in trying those first.

-Manish
------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to