[Scikit-learn-general] How to optimize a random forest for out of sample prediction

Raphael C Wed, 07 Oct 2015 00:17:38 -0700

I have a training set, a validation  set and a test set.  I build a
random forest using RandomForestClassifier on the training set.
However, I would like to tune it by scoring on  the validation  set.
I find that the cross-validation score on  the training set is a lot
better than the score on the validation set.


To improve this I would like to do [RFE][1] to do feature selection to
deal with overfitting.  I have tried removing features by hand and in
some cases it does improve the score on the validation set.  This
[question and answer][2] show how to use RFE with
RandomForestClassifier but I don't understand how to do this when  you
score on a separate validation set.

 Can this sort of feature selection be done using RFE or some other
scikit learn method?


  [1]: 
http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html
  [2]: 
https://stackoverflow.com/questions/24123498/recursive-feature-elimination-on-random-forest-using-scikit-learn

Raphael

------------------------------------------------------------------------------
Full-scale, agent-less Infrastructure Monitoring from a single dashboard
Integrate with 40+ ManageEngine ITSM Solutions for complete visibility
Physical-Virtual-Cloud Infrastructure monitoring from one console
Real user monitoring with APM Insights and performance trend reports 
Learn More http://pubads.g.doubleclick.net/gampad/clk?id=247754911&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] How to optimize a random forest for out of sample prediction

Reply via email to