Jacob, this modification seems not easy. After fetching the decision rules leading to the node of interest, a following Pandas groupby script can compute these numbers through. Thank you. :)
On Sun, Aug 30, 2015 at 11:54 AM, Jacob Schreiber <jmschreibe...@gmail.com> wrote: > You would have to modify sklearn/tree/_tree.pyx. See the Tree class near > the bottom, and its list of properties. An issue may be that you would have > to extensively modify the code, as you would need to modify both splitter > and criterion objects as well. If you are doing this for your own personal > use, it may be easier to write a small script which successively applies > the rules of the tree to your data to see how many points from each class > are present. > > On Sun, Aug 30, 2015 at 10:50 AM, Rex X <dnsr...@gmail.com> wrote: > >> Hi Jacob and Trevor, >> >> Which part of the source code we can modify to add a new attribute to >> DecisionTreeClassifier.tree_, to count the number of samples of each >> class within each node? >> >> Could you point me the right direction? >> >> Best, >> Rex >> >> >> >> >> On Sun, Aug 30, 2015 at 8:12 AM, Jacob Schreiber <jmschreibe...@gmail.com >> > wrote: >> >>> This value is computed while building the tree, but is not kept in the >>> tree. >>> >>> On Sun, Aug 30, 2015 at 7:02 AM, Rex X <dnsr...@gmail.com> wrote: >>> >>>> DecisionTreeClassifier.tree_.n_node_samples is the total number of >>>> samples in all classes of one node, and >>>> DecisionTreeClassifier.tree_.value is the computed weight for each >>>> class of one node. Only if the sample_weight and class_weight of this >>>> DecisionTreeClassifier >>>> is one, then this attribute equals the number of samples of each class of >>>> one node. >>>> >>>> But for the general case with a given sample_weight and class_weight, >>>> is there any attribute telling us the number of samples of each class >>>> within one node? >>>> >>>> >>>> import pandas as pd >>>> from sklearn.datasets import load_iris >>>> from sklearn import tree >>>> import sklearn >>>> >>>> iris = sklearn.datasets.load_iris() >>>> clf = tree.DecisionTreeClassifier(class_weight={0 : 0.30, 1: 0.3, >>>> 2:0.4}, max_features="auto") >>>> clf.fit(iris.data, iris.target) >>>> >>>> >>>> # the total number of samples in all classes of each node >>>> clf.tree_.n_node_samples >>>> >>>> # the computed weight for each class of each node >>>> clf.tree_.value >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general