Hi Gilles, For all leaf nodes, I can see how this can be done, with tree_.apply() and pandas pivot as Trevor mentioned.
But for internal nodes, could you explain how this can be done with tree_ object? Best, Rex On Sun, Aug 30, 2015 at 1:24 PM, Gilles Louppe <g.lou...@gmail.com> wrote: > (Also, this can be done in Python code, by using the interface we > provide for the tree_ object) > > On 30 August 2015 at 22:22, Gilles Louppe <g.lou...@gmail.com> wrote: > > Hi, > > > > The simplest method to get you are looking for is to re-propagate the > > training samples into the tree and keep track of the nodes they > > traverse. You should have a look at the implementation of `apply` to > > get started. > > > > Hope this helps, > > Gilles > > > > On 30 August 2015 at 21:55, Rex X <dnsr...@gmail.com> wrote: > >> Hi Trevor, > >> > >> Yes, unfortunately I am using sample_weight. So clf.tree_.value is a > >> weighted sum. We cannot do such a division to get the number of samples. > >> Because we can't track which samples belonging to one node, there is no > way > >> to get the sample_weights. If we know, we don't need to get into this > step. > >> > >> It is good to know the apply method for DecisionTreeClassifier. But > when we > >> used sample_weight, can this way work? > >> > >> > >> Thank you. > >> > >> Rex > >> > >> > >> On Sun, Aug 30, 2015 at 12:41 PM, Trevor Stephens < > trev.steph...@gmail.com> > >> wrote: > >>> > >>> As Jacob mentions, the tree object is written in cython, and is pretty > >>> heavy going. > >>> > >>> However, > >>> > >>> clf.tree_.value / clf.class_weight.values() > >>> > >>> might work for you? > >>> > >>> If using the sample_weight as well, you would need to additionally > scale > >>> along the other axis too. > >>> > >>> Alternatively, if only interested in the leaf nodes, the > >>> DecisionTreeClassifier has an apply() method which returns the leaf ID > for > >>> any data passed to it. Use the original data, and then some light > Pandas > >>> pivoting should get you to what you need. > >>> > >>> > >>> > >>> On Sun, Aug 30, 2015 at 11:54 AM, Jacob Schreiber > >>> <jmschreibe...@gmail.com> wrote: > >>>> > >>>> You would have to modify sklearn/tree/_tree.pyx. See the Tree class > near > >>>> the bottom, and its list of properties. An issue may be that you > would have > >>>> to extensively modify the code, as you would need to modify both > splitter > >>>> and criterion objects as well. If you are doing this for your own > personal > >>>> use, it may be easier to write a small script which successively > applies the > >>>> rules of the tree to your data to see how many points from each class > are > >>>> present. > >>>> > >>>> On Sun, Aug 30, 2015 at 10:50 AM, Rex X <dnsr...@gmail.com> wrote: > >>>>> > >>>>> Hi Jacob and Trevor, > >>>>> > >>>>> Which part of the source code we can modify to add a new attribute to > >>>>> DecisionTreeClassifier.tree_, to count the number of samples of each > class > >>>>> within each node? > >>>>> > >>>>> Could you point me the right direction? > >>>>> > >>>>> Best, > >>>>> Rex > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Sun, Aug 30, 2015 at 8:12 AM, Jacob Schreiber > >>>>> <jmschreibe...@gmail.com> wrote: > >>>>>> > >>>>>> This value is computed while building the tree, but is not kept in > the > >>>>>> tree. > >>>>>> > >>>>>> On Sun, Aug 30, 2015 at 7:02 AM, Rex X <dnsr...@gmail.com> wrote: > >>>>>>> > >>>>>>> DecisionTreeClassifier.tree_.n_node_samples is the total number of > >>>>>>> samples in all classes of one node, and > DecisionTreeClassifier.tree_.value > >>>>>>> is the computed weight for each class of one node. Only if the > sample_weight > >>>>>>> and class_weight of this DecisionTreeClassifier is one, then this > attribute > >>>>>>> equals the number of samples of each class of one node. > >>>>>>> > >>>>>>> But for the general case with a given sample_weight and > class_weight, > >>>>>>> is there any attribute telling us the number of samples of each > class within > >>>>>>> one node? > >>>>>>> > >>>>>>> > >>>>>>> import pandas as pd > >>>>>>> from sklearn.datasets import load_iris > >>>>>>> from sklearn import tree > >>>>>>> import sklearn > >>>>>>> > >>>>>>> iris = sklearn.datasets.load_iris() > >>>>>>> clf = tree.DecisionTreeClassifier(class_weight={0 : 0.30, 1: 0.3, > >>>>>>> 2:0.4}, max_features="auto") > >>>>>>> clf.fit(iris.data, iris.target) > >>>>>>> > >>>>>>> > >>>>>>> # the total number of samples in all classes of each node > >>>>>>> clf.tree_.n_node_samples > >>>>>>> > >>>>>>> # the computed weight for each class of each node > >>>>>>> clf.tree_.value > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > ------------------------------------------------------------------------------ > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Scikit-learn-general mailing list > >>>>>>> Scikit-learn-general@lists.sourceforge.net > >>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > ------------------------------------------------------------------------------ > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Scikit-learn-general mailing list > >>>>>> Scikit-learn-general@lists.sourceforge.net > >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> > ------------------------------------------------------------------------------ > >>>>> > >>>>> _______________________________________________ > >>>>> Scikit-learn-general mailing list > >>>>> Scikit-learn-general@lists.sourceforge.net > >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >>>>> > >>>> > >>>> > >>>> > >>>> > ------------------------------------------------------------------------------ > >>>> > >>>> _______________________________________________ > >>>> Scikit-learn-general mailing list > >>>> Scikit-learn-general@lists.sourceforge.net > >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >>>> > >>> > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> > >>> _______________________________________________ > >>> Scikit-learn-general mailing list > >>> Scikit-learn-general@lists.sourceforge.net > >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >>> > >> > >> > >> > ------------------------------------------------------------------------------ > >> > >> _______________________________________________ > >> Scikit-learn-general mailing list > >> Scikit-learn-general@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >> > > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general