Re: [scikit-learn] Why the default max_samples of Random Forest is X.shape[0]?

2020-05-10 Thread Fernando Marcos Wittmann
Ohh, I can see now my mistake after reviewing the concept of bootstrapping and sampling with replacement. I was assuming that the "replacement" was made only after finishing each tree (i.e. If I was samping 2/3 of data, the very same data could be selected again for each tree, but no element would

Re: [scikit-learn] Why the default max_samples of Random Forest is X.shape[0]?

2020-05-10 Thread Fernando Marcos Wittmann
My question is why the full dataset is being used as default when building each tree. That's not random forest. The main point of RF is to build each tree with a subsample of the full dataset On Sun, May 10, 2020, 09:50 Joel Nothman wrote: > A bootstrap is very commonly a random draw with

Re: [scikit-learn] Why the default max_samples of Random Forest is X.shape[0]?

2020-05-10 Thread Joel Nothman
A bootstrap is very commonly a random draw with replacement of equal size to the original sample. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] Why the default max_samples of Random Forest is X.shape[0]?

2020-05-08 Thread Fernando Marcos Wittmann
When reading the documentation of Random Forest, I got the following: ``` max_samples : int or float, default=None If bootstrap is True, the number of samples to draw from X to train each base estimator. - *If None (default), then draw `X.shape[0]` samples.* - If int, then draw `max_samples`