(Also, this can be done in Python code, by using the interface we provide for the tree_ object)
On 30 August 2015 at 22:22, Gilles Louppe <g.lou...@gmail.com> wrote: > Hi, > > The simplest method to get you are looking for is to re-propagate the > training samples into the tree and keep track of the nodes they > traverse. You should have a look at the implementation of `apply` to > get started. > > Hope this helps, > Gilles > > On 30 August 2015 at 21:55, Rex X <dnsr...@gmail.com> wrote: >> Hi Trevor, >> >> Yes, unfortunately I am using sample_weight. So clf.tree_.value is a >> weighted sum. We cannot do such a division to get the number of samples. >> Because we can't track which samples belonging to one node, there is no way >> to get the sample_weights. If we know, we don't need to get into this step. >> >> It is good to know the apply method for DecisionTreeClassifier. But when we >> used sample_weight, can this way work? >> >> >> Thank you. >> >> Rex >> >> >> On Sun, Aug 30, 2015 at 12:41 PM, Trevor Stephens <trev.steph...@gmail.com> >> wrote: >>> >>> As Jacob mentions, the tree object is written in cython, and is pretty >>> heavy going. >>> >>> However, >>> >>> clf.tree_.value / clf.class_weight.values() >>> >>> might work for you? >>> >>> If using the sample_weight as well, you would need to additionally scale >>> along the other axis too. >>> >>> Alternatively, if only interested in the leaf nodes, the >>> DecisionTreeClassifier has an apply() method which returns the leaf ID for >>> any data passed to it. Use the original data, and then some light Pandas >>> pivoting should get you to what you need. >>> >>> >>> >>> On Sun, Aug 30, 2015 at 11:54 AM, Jacob Schreiber >>> <jmschreibe...@gmail.com> wrote: >>>> >>>> You would have to modify sklearn/tree/_tree.pyx. See the Tree class near >>>> the bottom, and its list of properties. An issue may be that you would have >>>> to extensively modify the code, as you would need to modify both splitter >>>> and criterion objects as well. If you are doing this for your own personal >>>> use, it may be easier to write a small script which successively applies >>>> the >>>> rules of the tree to your data to see how many points from each class are >>>> present. >>>> >>>> On Sun, Aug 30, 2015 at 10:50 AM, Rex X <dnsr...@gmail.com> wrote: >>>>> >>>>> Hi Jacob and Trevor, >>>>> >>>>> Which part of the source code we can modify to add a new attribute to >>>>> DecisionTreeClassifier.tree_, to count the number of samples of each class >>>>> within each node? >>>>> >>>>> Could you point me the right direction? >>>>> >>>>> Best, >>>>> Rex >>>>> >>>>> >>>>> >>>>> >>>>> On Sun, Aug 30, 2015 at 8:12 AM, Jacob Schreiber >>>>> <jmschreibe...@gmail.com> wrote: >>>>>> >>>>>> This value is computed while building the tree, but is not kept in the >>>>>> tree. >>>>>> >>>>>> On Sun, Aug 30, 2015 at 7:02 AM, Rex X <dnsr...@gmail.com> wrote: >>>>>>> >>>>>>> DecisionTreeClassifier.tree_.n_node_samples is the total number of >>>>>>> samples in all classes of one node, and >>>>>>> DecisionTreeClassifier.tree_.value >>>>>>> is the computed weight for each class of one node. Only if the >>>>>>> sample_weight >>>>>>> and class_weight of this DecisionTreeClassifier is one, then this >>>>>>> attribute >>>>>>> equals the number of samples of each class of one node. >>>>>>> >>>>>>> But for the general case with a given sample_weight and class_weight, >>>>>>> is there any attribute telling us the number of samples of each class >>>>>>> within >>>>>>> one node? >>>>>>> >>>>>>> >>>>>>> import pandas as pd >>>>>>> from sklearn.datasets import load_iris >>>>>>> from sklearn import tree >>>>>>> import sklearn >>>>>>> >>>>>>> iris = sklearn.datasets.load_iris() >>>>>>> clf = tree.DecisionTreeClassifier(class_weight={0 : 0.30, 1: 0.3, >>>>>>> 2:0.4}, max_features="auto") >>>>>>> clf.fit(iris.data, iris.target) >>>>>>> >>>>>>> >>>>>>> # the total number of samples in all classes of each node >>>>>>> clf.tree_.n_node_samples >>>>>>> >>>>>>> # the computed weight for each class of each node >>>>>>> clf.tree_.value >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Scikit-learn-general mailing list >>>>>>> Scikit-learn-general@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> Scikit-learn-general mailing list >>>>>> Scikit-learn-general@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Scikit-learn-general mailing list >>>>> Scikit-learn-general@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general