Hello Paul,

> Do fully developed trees make sense for rather small datasets? Overall, I
> have 622 samples with 177 features each. Isn't there the risk of
> overfitting?

Yes, overfitting might happen, but it should be limited since you are
building randomized trees and average them together.

>
> Do you mean by "tune min_sample_split" the training/test set split?
> Or rather the ensemble method ExtraTreesRegressor:
> http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html

No, I meant the `min_samples_split` parameter of the
RandomForestClassifier itself, as you did in the example of your next
e-mail.

Note also that in your grid-search, you should instead evaluate values
between 1 and the number of samples in your dataset (= 622 if you
train your model on your entire data).

Please also use more than 10 trees (the default number of estimators)
if you want better results. In your grid search, you invoke
RandomForestClassifier() without defining the number of estimators.

ExtraTreesClassifier might also yield better results than
RandomForestClassifier. You should evaluate both algorithms.

Hope this helps,

Gilles

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to