Re: [Scikit-learn-general] optimizing ensemble method based classifier

2012-09-12 Thread Brian Holt
Random forest draws n_samples uniformly from n_samples of data, so on average 30% of the data is left out per estimator. Breimans 2001 paper describes it in more detail. Cheers Brian On Sep 12, 2012 1:21 PM, "Sheila the angel" wrote: > So I think it should be as > # for n_estimators=10 > clf = R

Re: [Scikit-learn-general] optimizing ensemble method based classifier

2012-09-12 Thread Sheila the angel
So I think it should be as # for n_estimators=10 clf = RandomForestClassifier(n_estimators=10, oob_score=True) clf.fit(X,y) print clf.oob_score_ clf.oob_score_ will give oob accuracy. But I would also like to know what percent of data is used to calculate this score? On Wed, Sep 12, 2012 at 1

Re: [Scikit-learn-general] optimizing ensemble method based classifier

2012-09-12 Thread Brian Holt
You're absolutely right, you can simply use the oob estimate as your measure of generalisability. No need for GridSearchCV... On Sep 12, 2012 12:09 PM, "Sheila the angel" wrote: > Hello all, > I want to optimize n_estimators and max_features for ensemble methods (say > forRandomForestClassifier )

[Scikit-learn-general] optimizing ensemble method based classifier

2012-09-12 Thread Sheila the angel
Hello all, I want to optimize n_estimators and max_features for ensemble methods (say forRandomForestClassifier ). Usually I use GridSearchCV() with cv=4 which do 4 fold cross validation for data and gives best parameter/model. In the document section 'out-of-bag-estimates' http://scikit-learn.org/