Re: [Scikit-learn-general] sample_weight and features in a single tree

2013-08-29 Thread Aaron Jacques
ginal - De : Olivier Grisel À : Aaron Jacques ; scikit-learn-general Cc : Envoyé le : Jeudi 29 août 2013 10h02 Objet : Re: [Scikit-learn-general] sample_weight and features in a single tree In general all the features are used by the DT algorithm. The max_features parameter is just a way to co

Re: [Scikit-learn-general] sample_weight and features in a single tree

2013-08-29 Thread Olivier Grisel
In general all the features are used by the DT algorithm. The max_features parameter is just a way to control the amount of randomization injected at each stage during the learning process of the trees used by the ExtraTrees* or RandomForest* classes. But on average all features end up selected at

Re: [Scikit-learn-general] sample_weight and features in a single tree

2013-08-29 Thread Aaron Jacques
ge.net" Cc : Envoyé le : Jeudi 29 août 2013 4h02 Objet : Re: [Scikit-learn-general] sample_weight and features in a single tree Some more questions. Is it possible to know which features are selected for building a tree? From the document [1] max_features can be specified for tell

Re: [Scikit-learn-general] sample_weight and features in a single tree

2013-08-29 Thread Aaron Jacques
ackoverflow.com/questions/17057139/how-to-find-key-trees-features-from-a-trained-random-forest - Mail original - De : Gilles Louppe À : Aaron Jacques ; "[email protected]" Cc : Envoyé le : Mercredi 28 août 2013 7h10 Objet : Re: [Scikit-learn-general] s

Re: [Scikit-learn-general] sample_weight and features in a single tree

2013-08-28 Thread Sergey Feldman
Huh, cool I didn't know about balance_weights before. I'm also having a hard time finding documentation on it besides: "DEPRECATED: balance_weights is an internal function and will be removed in 0.16" What will it be replaced by in 0.16? Thanks, sf On Wed, Aug 28, 2013 at 4:10 AM, Gilles Loup

Re: [Scikit-learn-general] sample_weight and features in a single tree

2013-08-28 Thread Gilles Louppe
Hi Aaron, Assume that X is your data and y is the labels for X. If classes in y are not balanced and you want to fix that, you can indeed use sample weights to simulate class weights. Basically you can simply do: forest.fit(X, y, sample_weight=balance_weights(y)) > In addition, how can I know wh

[Scikit-learn-general] sample_weight and features in a single tree

2013-08-28 Thread Aaron Jacques
In SO[1] a thread states that weight class for random forest can be achieved by sample_weight function when executing fit() function. If I have a dataset with format (2 dimension)              categorical_1 numeric categorical_2   ...  row 1  string_a         182       string_x           ... r