[Scikit-learn-general] Question related to the skicit-learn python library RandomForestClassifier

Zsófia Koma Thu, 05 Mar 2015 01:06:07 -0800

Dear list,

I have a question related to the skicit-learn python library
RandomForestClassifier. I had build a python script using this tool to
apply Random Forest classification method on my dataset.


First of all I would like to get from the RandomForestClassifier the last
optimized tree which we used for predict the data (for visualization
purposes). Do you know a way to achieve somehow from the sklearn structure?

I do not see anywhere I find just this information about for each fitted
tree in clf.estimators_[tree number].tree_.:

forest.estimators_[tree number].tree_.feature == Column index of the
feature which is used for split
iris.feature_names[forest.estimators_[tree number].tree_.feature == Column
name of the feature which is used for split
forest.estimators_[tree number].tree_.threshold == Splitting value
forest.estimators_[tree number].tree_.impurity == Gini index value
forest.estimators_[tree number].tree_.n_node_samples == Number of the
samples on parent nodes.
forest.estimators_[tree number].tree_.children_left ==Information about
tree structure.
forest.estimators_[tree number].tree_.children_right ==Information about
tree structure
forest.estimators_[tree number].tree_.value ==Number of the samples of each
classes in the end node

My second problem I do not understand exactly what is represented the value
matrix for each fitted trees. If I export the dot format file we can see
that the number of n_samples and the each classes value in the end node is
disagreed each other.
You know exactly what is represented the value matrix in
clf.estimators_[tree number].tree_.value?

For example:
The dot format look like this:

"digraph Tree {
0 [label="petal width (cm) <= 0.7500\nimpurity = 0.666044444444\nsamples =
98", shape="box"] ;
1 [label="impurity = 0.0000\nsamples = 29\nvalue = [ 49.   0.   0.]",
shape="box"] ;
0 -> 1 ;"

And here is that 1. node is a end node and in this node we have 29 samples
than the nvalue: [49,0,0] which said that from first class we had 49
samples and other two classes 0-0 samples.If it is a predicted value for
the whole dataset than from where we know which class is in the end of node?

Thank you advance for help.

Best regards: Zsofi

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Question related to the skicit-learn python library RandomForestClassifier

Reply via email to