Re: [scikit-learn] Understanding max_features parameter in RandomForestClassifier
With all the parameters set to default, (especially bootstrap and max_samples), no of samples passed to each estimator is X.shape[0]. Doesn't it account for all the instances in the dataset with calculated no. of feature? Then how come only a subset is given to the estimator? On Wed, Mar 11, 2020 at 10:58 AM Brown J.B. via scikit-learn < scikit-learn@python.org> wrote: > Regardless of the number of features, each DT estimator is given only a > subset of the data. > Each DT estimator then uses the features to derive decision rules for the > samples it was given. > With more trees and few examples, you might get similar or identical > trees, but that is not the norm. > > Pardon brevity. > J.B. > > 2020年3月11日(水) 14:11 aditya aggarwal : > >> For RandomForestClassifier in sklearn >> >> max_features parameter gives the max no of features for split in random >> forest which is sqrt(n_features) as default. If m is sqrt of n, then no of >> combinations for DT formation is nCm. What if nCm is less than n_estimators >> (no of decision trees in random forest)? >> >> *example:* For n = 7, max_features is 3, so nCm is 35, meaning 35 unique >> combinations of features for decision trees. Now for n_estimators = 100, >> will the remaining 65 trees have repeated combination of features? If so, >> won't trees be correlated introducing bias in the answer? >> >> >> Thanks >> >> Aditya Aggarwal >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Understanding max_features parameter in RandomForestClassifier
Regardless of the number of features, each DT estimator is given only a subset of the data. Each DT estimator then uses the features to derive decision rules for the samples it was given. With more trees and few examples, you might get similar or identical trees, but that is not the norm. Pardon brevity. J.B. 2020年3月11日(水) 14:11 aditya aggarwal : > For RandomForestClassifier in sklearn > > max_features parameter gives the max no of features for split in random > forest which is sqrt(n_features) as default. If m is sqrt of n, then no of > combinations for DT formation is nCm. What if nCm is less than n_estimators > (no of decision trees in random forest)? > > *example:* For n = 7, max_features is 3, so nCm is 35, meaning 35 unique > combinations of features for decision trees. Now for n_estimators = 100, > will the remaining 65 trees have repeated combination of features? If so, > won't trees be correlated introducing bias in the answer? > > > Thanks > > Aditya Aggarwal > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Threshold for roc_curve in binary classification
Hello I was going through the logic to calculate threshold to plot roc_curve. As far as I could understand, fps, tps and threshold is calculated in slklearn.metrics._binary_clf_curve . How are multiple values of threshold calculated for binary classification? Also what is happening in the following lines? distinct_value_indices = np.where(np.diff(y_score))[0] threshold_idxs = np.r_[distinct_value_indices, y_true.size - 1] Thanks ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Understanding max_features parameter in RandomForestClassifier
For RandomForestClassifier in sklearn max_features parameter gives the max no of features for split in random forest which is sqrt(n_features) as default. If m is sqrt of n, then no of combinations for DT formation is nCm. What if nCm is less than n_estimators (no of decision trees in random forest)? *example:* For n = 7, max_features is 3, so nCm is 35, meaning 35 unique combinations of features for decision trees. Now for n_estimators = 100, will the remaining 65 trees have repeated combination of features? If so, won't trees be correlated introducing bias in the answer? Thanks Aditya Aggarwal ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn