Hi Trevor, Yes, unfortunately I am using sample_weight. So clf.tree_.value is a weighted sum. We cannot do such a division to get the number of samples. Because we can't track which samples belonging to one node, there is no way to get the sample_weights. If we know, we don't need to get into this step.
It is good to know the apply method for DecisionTreeClassifier. But when we used sample_weight, can this way work? Thank you. Rex On Sun, Aug 30, 2015 at 12:41 PM, Trevor Stephens <trev.steph...@gmail.com> wrote: > As Jacob mentions, the tree object is written in cython, and is pretty > heavy going. > > However, > > clf.tree_.value / clf.class_weight.values() > > might work for you? > > If using the sample_weight as well, you would need to additionally scale > along the other axis too. > > Alternatively, if only interested in the leaf nodes, the > DecisionTreeClassifier has an apply() method which returns the leaf ID for > any data passed to it. Use the original data, and then some light Pandas > pivoting should get you to what you need. > > > > On Sun, Aug 30, 2015 at 11:54 AM, Jacob Schreiber <jmschreibe...@gmail.com > > wrote: > >> You would have to modify sklearn/tree/_tree.pyx. See the Tree class near >> the bottom, and its list of properties. An issue may be that you would have >> to extensively modify the code, as you would need to modify both splitter >> and criterion objects as well. If you are doing this for your own personal >> use, it may be easier to write a small script which successively applies >> the rules of the tree to your data to see how many points from each class >> are present. >> >> On Sun, Aug 30, 2015 at 10:50 AM, Rex X <dnsr...@gmail.com> wrote: >> >>> Hi Jacob and Trevor, >>> >>> Which part of the source code we can modify to add a new attribute to >>> DecisionTreeClassifier.tree_, to count the number of samples of each >>> class within each node? >>> >>> Could you point me the right direction? >>> >>> Best, >>> Rex >>> >>> >>> >>> >>> On Sun, Aug 30, 2015 at 8:12 AM, Jacob Schreiber < >>> jmschreibe...@gmail.com> wrote: >>> >>>> This value is computed while building the tree, but is not kept in the >>>> tree. >>>> >>>> On Sun, Aug 30, 2015 at 7:02 AM, Rex X <dnsr...@gmail.com> wrote: >>>> >>>>> DecisionTreeClassifier.tree_.n_node_samples is the total number of >>>>> samples in all classes of one node, and >>>>> DecisionTreeClassifier.tree_.value is the computed weight for each >>>>> class of one node. Only if the sample_weight and class_weight of this >>>>> DecisionTreeClassifier >>>>> is one, then this attribute equals the number of samples of each class of >>>>> one node. >>>>> >>>>> But for the general case with a given sample_weight and class_weight, >>>>> is there any attribute telling us the number of samples of each class >>>>> within one node? >>>>> >>>>> >>>>> import pandas as pd >>>>> from sklearn.datasets import load_iris >>>>> from sklearn import tree >>>>> import sklearn >>>>> >>>>> iris = sklearn.datasets.load_iris() >>>>> clf = tree.DecisionTreeClassifier(class_weight={0 : 0.30, 1: 0.3, >>>>> 2:0.4}, max_features="auto") >>>>> clf.fit(iris.data, iris.target) >>>>> >>>>> >>>>> # the total number of samples in all classes of each node >>>>> clf.tree_.n_node_samples >>>>> >>>>> # the computed weight for each class of each node >>>>> clf.tree_.value >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Scikit-learn-general mailing list >>>>> Scikit-learn-general@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>> >>>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general