Hi Zsofi.
Could you clarify your first question?
What do you mean by "last optimized tree"?
The values in the leaves use weighted samples. Random Forests use
bootstrapping to resample the dataset for each
tree. The resampling is represented using sample weights. I think
n_samples is the number of actual samples, where values
is the weighted number of samples (can a tree-grower confirm?)
Cheers,
Andy
On 03/05/2015 04:04 AM, Zsófia Koma wrote:
Dear list,
I have a question related to the skicit-learn python library
RandomForestClassifier. I had build a python script using this tool to
apply Random Forest classification method on my dataset.
First of all I would like to get from the RandomForestClassifier the
last optimized tree which we used for predict the data (for
visualization purposes). Do you know a way to achieve somehow from the
sklearn structure?
I do not see anywhere I find just this information about for each
fitted tree in clf.estimators_[tree number].tree_.:
forest.estimators_[tree number].tree_.feature == Column index of the
feature which is used for split
iris.feature_names[forest.estimators_[tree number].tree_.feature ==
Column name of the feature which is used for split
forest.estimators_[tree number].tree_.threshold == Splitting value
forest.estimators_[tree number].tree_.impurity == Gini index value
forest.estimators_[tree number].tree_.n_node_samples == Number of the
samples on parent nodes.
forest.estimators_[tree number].tree_.children_left ==Information
about tree structure.
forest.estimators_[tree number].tree_.children_right ==Information
about tree structure
forest.estimators_[tree number].tree_.value ==Number of the samples of
each classes in the end node
My second problem I do not understand exactly what is represented the
value matrix for each fitted trees. If I export the dot format file we
can see that the number of n_samples and the each classes value in the
end node is disagreed each other.
You know exactly what is represented the value matrix in
clf.estimators_[tree number].tree_.value?
For example:
The dot format look like this:
"digraph Tree {
0 [label="petal width (cm) <= 0.7500\nimpurity =
0.666044444444\nsamples = 98", shape="box"] ;
1 [label="impurity = 0.0000\nsamples = 29\nvalue = [ 49. 0. 0.]",
shape="box"] ;
0 -> 1 ;"
And here is that 1. node is a end node and in this node we have 29
samples than the nvalue: [49,0,0] which said that from first class we
had 49 samples and other two classes 0-0 samples.If it is a predicted
value for the whole dataset than from where we know which class is in
the end of node?
Thank you advance for help.
Best regards: Zsofi
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general