Re: [Scikit-learn-general] Question related to the skicit-learn python library RandomForestClassifier

Andreas Mueller Thu, 05 Mar 2015 09:15:49 -0800

Hi Zsofi.
Could you clarify your first question?
What do you mean by "last optimized tree"?

The values in the leaves use weighted samples. Random Forests usebootstrapping to resample the dataset for eachtree. The resampling is represented using sample weights. I thinkn_samples is the number of actual samples, where values

is the weighted number of samples (can a tree-grower confirm?)


Cheers,
Andy


On 03/05/2015 04:04 AM, Zsófia Koma wrote:

Dear list,
I have a question related to the skicit-learn python libraryRandomForestClassifier. I had build a python script using this tool toapply Random Forest classification method on my dataset.
First of all I would like to get from the RandomForestClassifier thelast optimized tree which we used for predict the data (forvisualization purposes). Do you know a way to achieve somehow from thesklearn structure?
I do not see anywhere I find just this information about for eachfitted tree in clf.estimators_[tree number].tree_.:
forest.estimators_[tree number].tree_.feature == Column index of thefeature which is used for splitiris.feature_names[forest.estimators_[tree number].tree_.feature ==Column name of the feature which is used for split
forest.estimators_[tree number].tree_.threshold == Splitting value
forest.estimators_[tree number].tree_.impurity == Gini index value
forest.estimators_[tree number].tree_.n_node_samples == Number of thesamples on parent nodes.forest.estimators_[tree number].tree_.children_left ==Informationabout tree structure.forest.estimators_[tree number].tree_.children_right ==Informationabout tree structureforest.estimators_[tree number].tree_.value ==Number of the samples ofeach classes in the end node
My second problem I do not understand exactly what is represented thevalue matrix for each fitted trees. If I export the dot format file wecan see that the number of n_samples and the each classes value in theend node is disagreed each other.You know exactly what is represented the value matrix inclf.estimators_[tree number].tree_.value?
For example:
The dot format look like this:

"digraph Tree {
0 [label="petal width (cm) <= 0.7500\nimpurity =0.666044444444\nsamples = 98", shape="box"] ;1 [label="impurity = 0.0000\nsamples = 29\nvalue = [ 49. 0. 0.]",shape="box"] ;
0 -> 1 ;"
And here is that 1. node is a end node and in this node we have 29samples than the nvalue: [49,0,0] which said that from first class wehad 49 samples and other two classes 0-0 samples.If it is a predictedvalue for the whole dataset than from where we know which class is inthe end of node?
Thank you advance for help.

Best regards: Zsofi


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Question related to the skicit-learn python library RandomForestClassifier

Reply via email to