Hi, The simplest method to get you are looking for is to re-propagate the training samples into the tree and keep track of the nodes they traverse. You should have a look at the implementation of `apply` to get started.
Hope this helps, Gilles On 30 August 2015 at 21:55, Rex X <[email protected]> wrote: > Hi Trevor, > > Yes, unfortunately I am using sample_weight. So clf.tree_.value is a > weighted sum. We cannot do such a division to get the number of samples. > Because we can't track which samples belonging to one node, there is no way > to get the sample_weights. If we know, we don't need to get into this step. > > It is good to know the apply method for DecisionTreeClassifier. But when we > used sample_weight, can this way work? > > > Thank you. > > Rex > > > On Sun, Aug 30, 2015 at 12:41 PM, Trevor Stephens <[email protected]> > wrote: >> >> As Jacob mentions, the tree object is written in cython, and is pretty >> heavy going. >> >> However, >> >> clf.tree_.value / clf.class_weight.values() >> >> might work for you? >> >> If using the sample_weight as well, you would need to additionally scale >> along the other axis too. >> >> Alternatively, if only interested in the leaf nodes, the >> DecisionTreeClassifier has an apply() method which returns the leaf ID for >> any data passed to it. Use the original data, and then some light Pandas >> pivoting should get you to what you need. >> >> >> >> On Sun, Aug 30, 2015 at 11:54 AM, Jacob Schreiber >> <[email protected]> wrote: >>> >>> You would have to modify sklearn/tree/_tree.pyx. See the Tree class near >>> the bottom, and its list of properties. An issue may be that you would have >>> to extensively modify the code, as you would need to modify both splitter >>> and criterion objects as well. If you are doing this for your own personal >>> use, it may be easier to write a small script which successively applies the >>> rules of the tree to your data to see how many points from each class are >>> present. >>> >>> On Sun, Aug 30, 2015 at 10:50 AM, Rex X <[email protected]> wrote: >>>> >>>> Hi Jacob and Trevor, >>>> >>>> Which part of the source code we can modify to add a new attribute to >>>> DecisionTreeClassifier.tree_, to count the number of samples of each class >>>> within each node? >>>> >>>> Could you point me the right direction? >>>> >>>> Best, >>>> Rex >>>> >>>> >>>> >>>> >>>> On Sun, Aug 30, 2015 at 8:12 AM, Jacob Schreiber >>>> <[email protected]> wrote: >>>>> >>>>> This value is computed while building the tree, but is not kept in the >>>>> tree. >>>>> >>>>> On Sun, Aug 30, 2015 at 7:02 AM, Rex X <[email protected]> wrote: >>>>>> >>>>>> DecisionTreeClassifier.tree_.n_node_samples is the total number of >>>>>> samples in all classes of one node, and >>>>>> DecisionTreeClassifier.tree_.value >>>>>> is the computed weight for each class of one node. Only if the >>>>>> sample_weight >>>>>> and class_weight of this DecisionTreeClassifier is one, then this >>>>>> attribute >>>>>> equals the number of samples of each class of one node. >>>>>> >>>>>> But for the general case with a given sample_weight and class_weight, >>>>>> is there any attribute telling us the number of samples of each class >>>>>> within >>>>>> one node? >>>>>> >>>>>> >>>>>> import pandas as pd >>>>>> from sklearn.datasets import load_iris >>>>>> from sklearn import tree >>>>>> import sklearn >>>>>> >>>>>> iris = sklearn.datasets.load_iris() >>>>>> clf = tree.DecisionTreeClassifier(class_weight={0 : 0.30, 1: 0.3, >>>>>> 2:0.4}, max_features="auto") >>>>>> clf.fit(iris.data, iris.target) >>>>>> >>>>>> >>>>>> # the total number of samples in all classes of each node >>>>>> clf.tree_.n_node_samples >>>>>> >>>>>> # the computed weight for each class of each node >>>>>> clf.tree_.value >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> Scikit-learn-general mailing list >>>>>> [email protected] >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Scikit-learn-general mailing list >>>>> [email protected] >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
