Huh, cool I didn't know about balance_weights before. I'm also having a
hard time finding documentation on it besides:
"DEPRECATED: balance_weights is an internal function and will be removed in
0.16"
What will it be replaced by in 0.16?
Thanks,
sf
On Wed, Aug 28, 2013 at 4:10 AM, Gilles Louppe <[email protected]> wrote:
> Hi Aaron,
>
> Assume that X is your data and y is the labels for X. If classes in y
> are not balanced and you want to fix that, you can indeed use sample
> weights to simulate class weights. Basically you can simply do:
>
> forest.fit(X, y, sample_weight=balance_weights(y))
>
> > In addition, how can I know what features are used for each tree
> (RandomForestClassifier.estimators_)? Or RandomForestClassifier uses all
> features for each tree? For example, a DataFrame with features f=[Age,
> Job, Title, ...], when calling fit(), each tree will use all features in f?
> Or any way we can know which features are used for a single tree?
>
> Both random forests and single decision trees are built on *all* the
> features that provide in X.
>
> If you want to know which ones were the most helpful/important to
> build the forest, then you can check the `feature_importances_`
> attribute which will give you a score for each feature (the higher,
> the more important).
>
> Hope this helps,
>
> Gilles
>
> On 28 August 2013 12:41, Aaron Jacques <[email protected]> wrote:
> >
> >
> > In SO[1] a thread states that weight class for random forest can be
> achieved by sample_weight function when executing fit() function. If I have
> a dataset with format (2 dimension)
> >
> >
> > categorical_1 numeric categorical_2 ...
> > row 1 string_a 182 string_x ...
> > row 2 string_b 12 string_y ...
> > row 3 string_a 3342 string_z ...
> > ...
> >
> > How can I pass in sample_weight as classes weigh for such cases?
> Passing in sample_weight as multiple dimension leads to following error
> > preprocessing.balance_weights([[1,2,3,4,5][1,2,3,4,4]])
> >
> > TypeError: list indices must be integers, not tuple
> >
> >
> > Or should I passed in a format like [string_a, string_b, string_a, 182,
> 12, 3342, string_x ...] with all classes as flat list where string_a is the
> factor of all classes? Or what is the right way to do that? Or can I just
> pass in weight for a single tree?
> >
> > In addition, how can I know what features are used for each tree
> (RandomForestClassifier.estimators_)? Or RandomForestClassifier uses all
> features for each tree? For example, a DataFrame with features f=[Age,
> Job, Title, ...], when calling fit(), each tree will use all features in f?
> Or any way we can know which features are used for a single tree?
> >
> > Thanks
> >
> > [1].
> http://stackoverflow.com/questions/17688147/how-to-weight-classes-in-a-randomforest-implementation
> >
> >
> ------------------------------------------------------------------------------
> > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> > Discover the easy way to master current and previous Microsoft
> technologies
> > and advance your career. Get an incredible 1,500+ hours of step-by-step
> > tutorial videos with LearnDevNow. Subscribe today and save!
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Scikit-learn-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> Discover the easy way to master current and previous Microsoft technologies
> and advance your career. Get an incredible 1,500+ hours of step-by-step
> tutorial videos with LearnDevNow. Subscribe today and save!
> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general