With all the parameters set to default, (especially bootstrap and
max_samples), no of samples passed to each estimator is X.shape[0]. Doesn't
it account for all the instances in the dataset with calculated no. of
feature? Then how come only a subset is given to the estimator?

On Wed, Mar 11, 2020 at 10:58 AM Brown J.B. via scikit-learn <
scikit-learn@python.org> wrote:

> Regardless of the number of features, each DT estimator is given only a
> subset of the data.
> Each DT estimator then uses the features to derive decision rules for the
> samples it was given.
> With more trees and few examples, you might get similar or identical
> trees, but that is not the norm.
>
> Pardon brevity.
> J.B.
>
> 2020年3月11日(水) 14:11 aditya aggarwal <adityaselfeffici...@gmail.com>:
>
>> For RandomForestClassifier in sklearn
>>
>> max_features parameter gives the max no of features for split in random
>> forest which is sqrt(n_features) as default. If m is sqrt of n, then no of
>> combinations for DT formation is nCm. What if nCm is less than n_estimators
>> (no of decision trees in random forest)?
>>
>> *example:* For n = 7, max_features is 3, so nCm is 35, meaning 35 unique
>> combinations of features for decision trees. Now for n_estimators = 100,
>> will the remaining 65 trees have repeated combination of features? If so,
>> won't trees be correlated introducing bias in the answer?
>>
>>
>> Thanks
>>
>> Aditya Aggarwal
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to