Jacob, this modification seems not easy. After fetching the decision rules
leading to the node of interest, a following Pandas groupby script can
compute these numbers through. Thank you. :)


On Sun, Aug 30, 2015 at 11:54 AM, Jacob Schreiber <jmschreibe...@gmail.com>
wrote:

> You would have to modify sklearn/tree/_tree.pyx. See the Tree class near
> the bottom, and its list of properties. An issue may be that you would have
> to extensively modify the code, as you would need to modify both splitter
> and criterion objects as well. If you are doing this for your own personal
> use, it may be easier to write a small script which successively applies
> the rules of the tree to your data to see how many points from each class
> are present.
>
> On Sun, Aug 30, 2015 at 10:50 AM, Rex X <dnsr...@gmail.com> wrote:
>
>> Hi Jacob and Trevor,
>>
>> Which part of the source code we can modify to add a new attribute to
>> DecisionTreeClassifier.tree_, to count the number of samples of each
>> class within each node?
>>
>> Could you point me the right direction?
>>
>> Best,
>> Rex
>>
>>
>>
>>
>> On Sun, Aug 30, 2015 at 8:12 AM, Jacob Schreiber <jmschreibe...@gmail.com
>> > wrote:
>>
>>> This value is computed while building the tree, but is not kept in the
>>> tree.
>>>
>>> On Sun, Aug 30, 2015 at 7:02 AM, Rex X <dnsr...@gmail.com> wrote:
>>>
>>>> DecisionTreeClassifier.tree_.n_node_samples is the total number of
>>>> samples in all classes of one node, and
>>>> DecisionTreeClassifier.tree_.value is the computed weight for each
>>>> class of one node. Only if the sample_weight and class_weight of this 
>>>> DecisionTreeClassifier
>>>> is one, then this attribute equals the number of samples of each class of
>>>> one node.
>>>>
>>>> But for the general case with a given sample_weight and class_weight,
>>>> is there any attribute telling us the number of samples of each class
>>>> within one node?
>>>>
>>>>
>>>> import pandas as pd
>>>> from sklearn.datasets import load_iris
>>>> from sklearn import tree
>>>> import sklearn
>>>>
>>>> iris = sklearn.datasets.load_iris()
>>>> clf = tree.DecisionTreeClassifier(class_weight={0 : 0.30, 1: 0.3,
>>>> 2:0.4}, max_features="auto")
>>>> clf.fit(iris.data, iris.target)
>>>>
>>>>
>>>> # the total number of samples in all classes of each node
>>>> clf.tree_.n_node_samples
>>>>
>>>> # the computed weight for each class of each node
>>>> clf.tree_.value
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to