Fellow sklearners, I am working on a classification problem with an unbalanced data set and have been successful using SVM classifiers with the class_weight option.
I have also tried Random Forests and am getting a decent ROC performance but I am hoping to get a performance improvement by using Weighted or Balanced Random Forests as suggested in this paper. http://www.stat.berkeley.edu/tech-reports/666.pdf I don't see any implementation of these options but I might be mistaken so I wanted to ask the community. Also, I am willing to write code and contribute back if this will be useful to other folks. I have also thought about balancing the data using up/down sampling the minority/majority class (with or without replacement) and even SMOTE but couldn't find those implementation in the scikit-learn library yet. The modified Random Forests seem to outperform these methods according to the paper, hence I am interested in trying those first. -Manish ------------------------------------------------------------------------------ Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general