For RandomForestClassifier in sklearn max_features parameter gives the max no of features for split in random forest which is sqrt(n_features) as default. If m is sqrt of n, then no of combinations for DT formation is nCm. What if nCm is less than n_estimators (no of decision trees in random forest)?
*example:* For n = 7, max_features is 3, so nCm is 35, meaning 35 unique combinations of features for decision trees. Now for n_estimators = 100, will the remaining 65 trees have repeated combination of features? If so, won't trees be correlated introducing bias in the answer? Thanks Aditya Aggarwal
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn