I come across https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tree.py Does this Tree object holding randomly sampled features for building a single tree? Reading its description seemingly attributes 'feature' and 'value' might be what I am looking for. But given a test the result looks like follow which looks not features in the train set header. Any place I can find which features are sampled for building a tree?
feature: [ 88 0 194 23 80 0 119 32 90 151 64 64 73 23 208 66 23 124 ...] value: [[[ 1757. 952.]] [[ 1418. 874.]] [[ 206. 184.]] ... [[ 0. 1.]]] Thanks ----- Mail original ----- De : Aaron Jacques <[email protected]> À : Gilles Louppe <[email protected]>; "[email protected]" <[email protected]> Cc : Envoyé le : Jeudi 29 août 2013 4h02 Objet : Re: [Scikit-learn-general] sample_weight and features in a single tree Some more questions. Is it possible to know which features are selected for building a tree? From the document [1] max_features can be specified for telling the number of features to be randomly selected, but it is still not clear what features are used in building a single tree. Or features_importances can be used to check what features are selected when building a tree by its values (with that value larger than 0)? My header has around 200 columns with sqrt(200) is around 14; but checking features importances that has value larger than 0 shows that is not the case. [2] explains that all trees are equal, and there is no tree weighting in random forest. Can I say that's why the predict() function result is obtained through the majority prediction because all trees are equal so that no tree's vote is more important than others? And the same should apply to proba() function with the forest proba() output is the mean of all tree's probability? Thanks again for you help. [1]. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.max_features [2]. http://stackoverflow.com/questions/17057139/how-to-find-key-trees-features-from-a-trained-random-forest ----- Mail original ----- De : Gilles Louppe <[email protected]> À : Aaron Jacques <[email protected]>; "[email protected]" <[email protected]> Cc : Envoyé le : Mercredi 28 août 2013 7h10 Objet : Re: [Scikit-learn-general] sample_weight and features in a single tree Hi Aaron, Assume that X is your data and y is the labels for X. If classes in y are not balanced and you want to fix that, you can indeed use sample weights to simulate class weights. Basically you can simply do: forest.fit(X, y, sample_weight=balance_weights(y)) > In addition, how can I know what features are used for each tree > (RandomForestClassifier.estimators_)? Or RandomForestClassifier uses all > features for each tree? For example, a DataFrame with features f=[Age, Job, > Title, ...], when calling fit(), each tree will use all features in f? Or any > way we can know which features are used for a single tree? Both random forests and single decision trees are built on *all* the features that provide in X. If you want to know which ones were the most helpful/important to build the forest, then you can check the `feature_importances_` attribute which will give you a score for each feature (the higher, the more important). Hope this helps, Gilles On 28 August 2013 12:41, Aaron Jacques <[email protected]> wrote: > > > In SO[1] a thread states that weight class for random forest can be achieved > by sample_weight function when executing fit() function. If I have a dataset > with format (2 dimension) > > > categorical_1 numeric categorical_2 ... > row 1 string_a 182 string_x ... > row 2 string_b 12 string_y ... > row 3 string_a 3342 string_z ... > ... > > How can I pass in sample_weight as classes weigh for such cases? Passing in > sample_weight as multiple dimension leads to following error > preprocessing.balance_weights([[1,2,3,4,5][1,2,3,4,4]]) > > TypeError: list indices must be integers, not tuple > > > Or should I passed in a format like [string_a, string_b, string_a, 182, 12, > 3342, string_x ...] with all classes as flat list where string_a is the > factor of all classes? Or what is the right way to do that? Or can I just > pass in weight for a single tree? > > In addition, how can I know what features are used for each tree > (RandomForestClassifier.estimators_)? Or RandomForestClassifier uses all > features for each tree? For example, a DataFrame with features f=[Age, Job, > Title, ...], when calling fit(), each tree will use all features in f? Or any > way we can know which features are used for a single tree? > > Thanks > > [1]. > http://stackoverflow.com/questions/17688147/how-to-weight-classes-in-a-randomforest-implementation > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
