With all the parameters set to default, (especially bootstrap and max_samples), no of samples passed to each estimator is X.shape[0]. Doesn't it account for all the instances in the dataset with calculated no. of feature? Then how come only a subset is given to the estimator?
On Wed, Mar 11, 2020 at 10:58 AM Brown J.B. via scikit-learn < scikit-learn@python.org> wrote: > Regardless of the number of features, each DT estimator is given only a > subset of the data. > Each DT estimator then uses the features to derive decision rules for the > samples it was given. > With more trees and few examples, you might get similar or identical > trees, but that is not the norm. > > Pardon brevity. > J.B. > > 2020年3月11日(水) 14:11 aditya aggarwal <adityaselfeffici...@gmail.com>: > >> For RandomForestClassifier in sklearn >> >> max_features parameter gives the max no of features for split in random >> forest which is sqrt(n_features) as default. If m is sqrt of n, then no of >> combinations for DT formation is nCm. What if nCm is less than n_estimators >> (no of decision trees in random forest)? >> >> *example:* For n = 7, max_features is 3, so nCm is 35, meaning 35 unique >> combinations of features for decision trees. Now for n_estimators = 100, >> will the remaining 65 trees have repeated combination of features? If so, >> won't trees be correlated introducing bias in the answer? >> >> >> Thanks >> >> Aditya Aggarwal >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn