Hi Aaron, Assume that X is your data and y is the labels for X. If classes in y are not balanced and you want to fix that, you can indeed use sample weights to simulate class weights. Basically you can simply do:
forest.fit(X, y, sample_weight=balance_weights(y)) > In addition, how can I know what features are used for each tree > (RandomForestClassifier.estimators_)? Or RandomForestClassifier uses all > features for each tree? For example, a DataFrame with features f=[Age, Job, > Title, ...], when calling fit(), each tree will use all features in f? Or any > way we can know which features are used for a single tree? Both random forests and single decision trees are built on *all* the features that provide in X. If you want to know which ones were the most helpful/important to build the forest, then you can check the `feature_importances_` attribute which will give you a score for each feature (the higher, the more important). Hope this helps, Gilles On 28 August 2013 12:41, Aaron Jacques <[email protected]> wrote: > > > In SO[1] a thread states that weight class for random forest can be achieved > by sample_weight function when executing fit() function. If I have a dataset > with format (2 dimension) > > > categorical_1 numeric categorical_2 ... > row 1 string_a 182 string_x ... > row 2 string_b 12 string_y ... > row 3 string_a 3342 string_z ... > ... > > How can I pass in sample_weight as classes weigh for such cases? Passing in > sample_weight as multiple dimension leads to following error > preprocessing.balance_weights([[1,2,3,4,5][1,2,3,4,4]]) > > TypeError: list indices must be integers, not tuple > > > Or should I passed in a format like [string_a, string_b, string_a, 182, 12, > 3342, string_x ...] with all classes as flat list where string_a is the > factor of all classes? Or what is the right way to do that? Or can I just > pass in weight for a single tree? > > In addition, how can I know what features are used for each tree > (RandomForestClassifier.estimators_)? Or RandomForestClassifier uses all > features for each tree? For example, a DataFrame with features f=[Age, Job, > Title, ...], when calling fit(), each tree will use all features in f? Or any > way we can know which features are used for a single tree? > > Thanks > > [1]. > http://stackoverflow.com/questions/17688147/how-to-weight-classes-in-a-randomforest-implementation > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
