In SO[1] a thread states that weight class for random forest can be achieved by sample_weight function when executing fit() function. If I have a dataset with format (2 dimension)
categorical_1 numeric categorical_2 ... row 1 string_a 182 string_x ... row 2 string_b 12 string_y ... row 3 string_a 3342 string_z ... ... How can I pass in sample_weight as classes weigh for such cases? Passing in sample_weight as multiple dimension leads to following error preprocessing.balance_weights([[1,2,3,4,5][1,2,3,4,4]]) TypeError: list indices must be integers, not tuple Or should I passed in a format like [string_a, string_b, string_a, 182, 12, 3342, string_x ...] with all classes as flat list where string_a is the factor of all classes? Or what is the right way to do that? Or can I just pass in weight for a single tree? In addition, how can I know what features are used for each tree (RandomForestClassifier.estimators_)? Or RandomForestClassifier uses all features for each tree? For example, a DataFrame with features f=[Age, Job, Title, ...], when calling fit(), each tree will use all features in f? Or any way we can know which features are used for a single tree? Thanks [1]. http://stackoverflow.com/questions/17688147/how-to-weight-classes-in-a-randomforest-implementation ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
