Hi Peter. re. looping over features in for RandomForests in _tree.pyx:Tree.find_best_split - yes, I see it now, thanks.
Re. seeing where max_depth is used - cool, I see that now too in base.py._make_estimator(), thanks. Re. my question: >> I'm interested to learn the lower bound of the number of random >> features that can be chosen. >could you elaborate on that? I was wondering whether, given just 2 features as we have in the iris demo: http://scikit-learn.org/dev/auto_examples/ensemble/plot_forest_iris.html we'd visit a subset of potentially just 1 of the features, or always 2, when building the RF DecisionTrees. The descriptions I'd read in several books talked about selecting a random subset of the features but not what the minimum number of features might be. As I understand it in sklearn: In _tree.pyx:Tree.find_best_split we break out of the feature testing loop: if visited_features >= max_features: break when we've visited enough features. visited_features is 0 at the start of the for loop. With the iris dataset in the demo we're limiting the classifiers to 2 features per row in the plot. For RandomForest max_features is set in __init__ to be "auto": - If "auto", then `max_features=sqrt(n_features)`. so we test a minimum of >1.41 features which means in practice we check all (both) features for each step in the tree creation process. As such the RandomForest process tests all (not a random subset) of the features in plot_forest_iris for the RandomForest and ExtraForest example columns. That's cool, I just wanted it clear in my mind. If we had >4 features then we'd start to sample a random subset of the features. I'm making notes of the things that weren't clear, I'll probably tidy them into a bug report for the docs with suggested new wording. Cheers, i. On 7 July 2013 19:49, Peter Prettenhofer <peter.prettenho...@gmail.com> wrote: > Hi Ian, > > > 2013/7/7 Ian Ozsvald <i...@ianozsvald.com> >> >> Hi all. I'm following the RandomForest code (in dev from a 1 week old >> checkout). As I understand it (and similar to the previous post - I >> have some RF usage experience but nothing fundamental), RF uses a >> weighted sample of examples to learn *and* a random subset of features >> when building its decision trees. > > > correct - although weighted samples are optional - usually, RF takes a > bootstrap sample and this is implemented via sample_weights (e.g. a sample > that is picked two times for the bootstrap has weight 2.0) >> >> >> Does the scikit-learn implementation use a random subset of features? >> I've followed the code in forest.py and I can't find where the choice >> might be made. I haven't looked at the C code for the DecisionTree. > > > Its in the implementation of DecisionTree - see sklearn/tree/_tree.pyx - > look for the for loop over ``features``. > >> >> >> I'm interested to learn the lower bound of the number of random >> features that can be chosen. > > > could you elaborate on that? > >> >> >> I'm also curious to understand where we can restrict the depth of the >> RandomForest classifier. All I can see is that in forest.py the >> constructor takes but ignores the max_depth argument: >> class RandomForestClassifier(ForestClassifier): >> ... >> def __init__(self, >> n_estimators=10, >> criterion="gini", >> max_depth=None, >> ... >> super(RandomForestClassifier, self).__init__( >> base_estimator=DecisionTreeClassifier(), >> ... >> >> base.py._make_estimator just clones the existing base_estimator. Am I >> missing something? > > > after cloning it calls ``set_params`` with ``estimator_params`` - > ``'max_depth'`` is one of those. > > best, > Peter > >> >> >> Thanks for listening, >> Ian. >> >> -- >> Ian Ozsvald (A.I. researcher) >> i...@ianozsvald.com >> >> http://IanOzsvald.com >> http://MorConsulting.com/ >> http://Annotate.IO >> http://SocialTiesApp.com/ >> http://TheScreencastingHandbook.com >> http://FivePoundApp.com/ >> http://twitter.com/IanOzsvald >> http://ShowMeDo.com >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by Windows: >> >> Build for Windows Store. >> >> http://p.sf.net/sfu/windows-dev2dev >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > -- > Peter Prettenhofer > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > -- Ian Ozsvald (A.I. researcher) i...@ianozsvald.com http://IanOzsvald.com http://MorConsulting.com/ http://Annotate.IO http://SocialTiesApp.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald http://ShowMeDo.com ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general