I have a training set, a validation set and a test set. I build a random forest using RandomForestClassifier on the training set. However, I would like to tune it by scoring on the validation set. I find that the cross-validation score on the training set is a lot better than the score on the validation set.
To improve this I would like to do [RFE][1] to do feature selection to deal with overfitting. I have tried removing features by hand and in some cases it does improve the score on the validation set. This [question and answer][2] show how to use RFE with RandomForestClassifier but I don't understand how to do this when you score on a separate validation set. Can this sort of feature selection be done using RFE or some other scikit learn method? [1]: http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html [2]: https://stackoverflow.com/questions/24123498/recursive-feature-elimination-on-random-forest-using-scikit-learn Raphael ------------------------------------------------------------------------------ Full-scale, agent-less Infrastructure Monitoring from a single dashboard Integrate with 40+ ManageEngine ITSM Solutions for complete visibility Physical-Virtual-Cloud Infrastructure monitoring from one console Real user monitoring with APM Insights and performance trend reports Learn More http://pubads.g.doubleclick.net/gampad/clk?id=247754911&iu=/4140 _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general