ginal -
De : Olivier Grisel
À : Aaron Jacques ; scikit-learn-general
Cc :
Envoyé le : Jeudi 29 août 2013 10h02
Objet : Re: [Scikit-learn-general] sample_weight and features in a single tree
In general all the features are used by the DT algorithm. The
max_features parameter is just a way to co
In general all the features are used by the DT algorithm. The
max_features parameter is just a way to control the amount of
randomization injected at each stage during the learning process of
the trees used by the ExtraTrees* or RandomForest* classes. But on
average all features end up selected at
ge.net"
Cc :
Envoyé le : Jeudi 29 août 2013 4h02
Objet : Re: [Scikit-learn-general] sample_weight and features in a single
tree
Some more questions.
Is it possible to know which features are selected for building a tree? From
the document [1] max_features can be specified for tell
ackoverflow.com/questions/17057139/how-to-find-key-trees-features-from-a-trained-random-forest
- Mail original -
De : Gilles Louppe
À : Aaron Jacques ;
"[email protected]"
Cc :
Envoyé le : Mercredi 28 août 2013 7h10
Objet : Re: [Scikit-learn-general] s
Huh, cool I didn't know about balance_weights before. I'm also having a
hard time finding documentation on it besides:
"DEPRECATED: balance_weights is an internal function and will be removed in
0.16"
What will it be replaced by in 0.16?
Thanks,
sf
On Wed, Aug 28, 2013 at 4:10 AM, Gilles Loup
Hi Aaron,
Assume that X is your data and y is the labels for X. If classes in y
are not balanced and you want to fix that, you can indeed use sample
weights to simulate class weights. Basically you can simply do:
forest.fit(X, y, sample_weight=balance_weights(y))
> In addition, how can I know wh
In SO[1] a thread states that weight class for random forest can be achieved by
sample_weight function when executing fit() function. If I have a dataset with
format (2 dimension)
categorical_1 numeric categorical_2 ...
row 1 string_a 182 string_x ...
r