In SO[1] a thread states that weight class for random forest can be achieved by 
sample_weight function when executing fit() function. If I have a dataset with 
format (2 dimension) 

 
          categorical_1 numeric categorical_2   ... 
row 1  string_a         182       string_x           ...
row 2  string_b         12         string_y           ...
row 3  string_a         3342     string_z           ... 
...

How can I pass in sample_weight as classes weigh for such cases?  Passing in 
sample_weight as multiple dimension leads to following error
  preprocessing.balance_weights([[1,2,3,4,5][1,2,3,4,4]])

  TypeError: list indices must be integers, not tuple


Or should I passed in a format like [string_a, string_b, string_a, 182, 12, 
3342, string_x ...] with all classes as flat list where string_a is the factor 
of all classes? Or what is the right way to do that? Or can I just pass in 
weight for a single tree?

In addition, how can I know what features are used for each tree 
(RandomForestClassifier.estimators_)? Or RandomForestClassifier uses all 
features for each tree?   For example, a DataFrame with features f=[Age, Job, 
Title, ...], when calling fit(), each tree will use all features in f? Or any 
way we can know which features are used for a single tree?

Thanks

[1]. 
http://stackoverflow.com/questions/17688147/how-to-weight-classes-in-a-randomforest-implementation
 

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to