[Scikit-learn-general] sample_weight and features in a single tree

Aaron Jacques Wed, 28 Aug 2013 03:43:41 -0700


In SO[1] a thread states that weight class for random forest can be achieved by 
sample_weight function when executing fit() function. If I have a dataset with 
format (2 dimension)

categorical_1 numeric categorical_2 ...
row 1 string_a 182 string_x ...
row 2 string_b 12 string_y ...
row 3 string_a 3342 string_z ...
...

How can I pass in sample_weight as classes weigh for such cases? Passing in
sample_weight as multiple dimension leads to following error
preprocessing.balance_weights([[1,2,3,4,5][1,2,3,4,4]])

TypeError: list indices must be integers, not tuple

Or should I passed in a format like [string_a, string_b, string_a, 182, 12,
3342, string_x ...] with all classes as flat list where string_a is the factor
of all classes? Or what is the right way to do that? Or can I just pass in
weight for a single tree?

In addition, how can I know what features are used for each tree
(RandomForestClassifier.estimators_)? Or RandomForestClassifier uses all
features for each tree? For example, a DataFrame with features f=[Age, Job,
Title, ...], when calling fit(), each tree will use all features in f? Or any
way we can know which features are used for a single tree?

Thanks

[1].
http://stackoverflow.com/questions/17688147/how-to-weight-classes-in-a-randomforest-implementation

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] sample_weight and features in a single tree

Reply via email to