Random forest draws n_samples uniformly from n_samples of data, so on
average 30% of the data is left out per estimator. Breimans 2001 paper
describes it in more detail.
Cheers
Brian
On Sep 12, 2012 1:21 PM, "Sheila the angel" wrote:
> So I think it should be as
> # for n_estimators=10
> clf = R
So I think it should be as
# for n_estimators=10
clf = RandomForestClassifier(n_estimators=10, oob_score=True)
clf.fit(X,y)
print clf.oob_score_
clf.oob_score_ will give oob accuracy.
But I would also like to know what percent of data is used to calculate
this score?
On Wed, Sep 12, 2012 at 1
You're absolutely right, you can simply use the oob estimate as your
measure of generalisability. No need for GridSearchCV...
On Sep 12, 2012 12:09 PM, "Sheila the angel" wrote:
> Hello all,
> I want to optimize n_estimators and max_features for ensemble methods (say
> forRandomForestClassifier )