Re: [Scikit-learn-general] sample_weight and features in a single tree

Gilles Louppe Wed, 28 Aug 2013 04:12:27 -0700

Hi Aaron,

Assume that X is your data and y is the labels for X. If classes in y
are not balanced and you want to fix that, you can indeed use sample
weights to simulate class weights. Basically you can simply do:


forest.fit(X, y, sample_weight=balance_weights(y))

> In addition, how can I know what features are used for each tree 
> (RandomForestClassifier.estimators_)? Or RandomForestClassifier uses all 
> features for each tree?   For example, a DataFrame with features f=[Age, Job, 
> Title, ...], when calling fit(), each tree will use all features in f? Or any 
> way we can know which features are used for a single tree?

Both random forests and single decision trees are built on *all* the
features that provide in X.

If you want to know which ones were the most helpful/important to
build the forest, then you can check the `feature_importances_`
attribute which will give you a score for each feature (the higher,
the more important).

Hope this helps,

Gilles

On 28 August 2013 12:41, Aaron Jacques <[email protected]> wrote:
>
>
> In SO[1] a thread states that weight class for random forest can be achieved 
> by sample_weight function when executing fit() function. If I have a dataset 
> with format (2 dimension)
>
>
>           categorical_1 numeric categorical_2   ...
> row 1  string_a         182       string_x           ...
> row 2  string_b         12         string_y           ...
> row 3  string_a         3342     string_z           ...
> ...
>
> How can I pass in sample_weight as classes weigh for such cases?  Passing in 
> sample_weight as multiple dimension leads to following error
>   preprocessing.balance_weights([[1,2,3,4,5][1,2,3,4,4]])
>
>   TypeError: list indices must be integers, not tuple
>
>
> Or should I passed in a format like [string_a, string_b, string_a, 182, 12, 
> 3342, string_x ...] with all classes as flat list where string_a is the 
> factor of all classes? Or what is the right way to do that? Or can I just 
> pass in weight for a single tree?
>
> In addition, how can I know what features are used for each tree 
> (RandomForestClassifier.estimators_)? Or RandomForestClassifier uses all 
> features for each tree?   For example, a DataFrame with features f=[Age, Job, 
> Title, ...], when calling fit(), each tree will use all features in f? Or any 
> way we can know which features are used for a single tree?
>
> Thanks
>
> [1]. 
> http://stackoverflow.com/questions/17688147/how-to-weight-classes-in-a-randomforest-implementation
>
> ------------------------------------------------------------------------------
> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> Discover the easy way to master current and previous Microsoft technologies
> and advance your career. Get an incredible 1,500+ hours of step-by-step
> tutorial videos with LearnDevNow. Subscribe today and save!
> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] sample_weight and features in a single tree

Reply via email to