Hello Paul, > Do fully developed trees make sense for rather small datasets? Overall, I > have 622 samples with 177 features each. Isn't there the risk of > overfitting?
Yes, overfitting might happen, but it should be limited since you are building randomized trees and average them together. > > Do you mean by "tune min_sample_split" the training/test set split? > Or rather the ensemble method ExtraTreesRegressor: > http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html No, I meant the `min_samples_split` parameter of the RandomForestClassifier itself, as you did in the example of your next e-mail. Note also that in your grid-search, you should instead evaluate values between 1 and the number of samples in your dataset (= 622 if you train your model on your entire data). Please also use more than 10 trees (the default number of estimators) if you want better results. In your grid search, you invoke RandomForestClassifier() without defining the number of estimators. ExtraTreesClassifier might also yield better results than RandomForestClassifier. You should evaluate both algorithms. Hope this helps, Gilles ------------------------------------------------------------------------------ LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
