Hi Zsofi.
Could you clarify your first question?
What do you mean by "last optimized tree"?
The values in the leaves use weighted samples. Random Forests use bootstrapping to resample the dataset for each tree. The resampling is represented using sample weights. I think n_samples is the number of actual samples, where values
is the weighted number of samples (can a tree-grower confirm?)

Cheers,
Andy


On 03/05/2015 04:04 AM, Zsófia Koma wrote:
Dear list,

I have a question related to the skicit-learn python library RandomForestClassifier. I had build a python script using this tool to apply Random Forest classification method on my dataset.

First of all I would like to get from the RandomForestClassifier the last optimized tree which we used for predict the data (for visualization purposes). Do you know a way to achieve somehow from the sklearn structure?

I do not see anywhere I find just this information about for each fitted tree in clf.estimators_[tree number].tree_.:

forest.estimators_[tree number].tree_.feature == Column index of the feature which is used for split iris.feature_names[forest.estimators_[tree number].tree_.feature == Column name of the feature which is used for split
forest.estimators_[tree number].tree_.threshold == Splitting value
forest.estimators_[tree number].tree_.impurity == Gini index value
forest.estimators_[tree number].tree_.n_node_samples == Number of the samples on parent nodes. forest.estimators_[tree number].tree_.children_left ==Information about tree structure. forest.estimators_[tree number].tree_.children_right ==Information about tree structure forest.estimators_[tree number].tree_.value ==Number of the samples of each classes in the end node

My second problem I do not understand exactly what is represented the value matrix for each fitted trees. If I export the dot format file we can see that the number of n_samples and the each classes value in the end node is disagreed each other. You know exactly what is represented the value matrix in clf.estimators_[tree number].tree_.value?

For example:
The dot format look like this:

"digraph Tree {
0 [label="petal width (cm) <= 0.7500\nimpurity = 0.666044444444\nsamples = 98", shape="box"] ; 1 [label="impurity = 0.0000\nsamples = 29\nvalue = [ 49. 0. 0.]", shape="box"] ;
0 -> 1 ;"

And here is that 1. node is a end node and in this node we have 29 samples than the nvalue: [49,0,0] which said that from first class we had 49 samples and other two classes 0-0 samples.If it is a predicted value for the whole dataset than from where we know which class is in the end of node?

Thank you advance for help.

Best regards: Zsofi


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to