Hi Trevor,

Yes, unfortunately I am using sample_weight. So clf.tree_.value is a
weighted sum. We cannot do such a division to get the number of samples.
Because we can't track which samples belonging to one node, there is no way
to get the sample_weights. If we know, we don't need to get into this step.

It is good to know the apply method for DecisionTreeClassifier. But when we
used sample_weight, can this way work?


Thank you.

Rex


On Sun, Aug 30, 2015 at 12:41 PM, Trevor Stephens <trev.steph...@gmail.com>
wrote:

> As Jacob mentions, the tree object is written in cython, and is pretty
> heavy going.
>
> However,
>
>     clf.tree_.value / clf.class_weight.values()
>
> might work for you?
>
> If using the sample_weight as well, you would need to additionally scale
> along the other axis too.
>
> Alternatively, if only interested in the leaf nodes, the
> DecisionTreeClassifier has an apply() method which returns the leaf ID for
> any data passed to it. Use the original data, and then some light Pandas
> pivoting should get you to what you need.
>
>
>
> On Sun, Aug 30, 2015 at 11:54 AM, Jacob Schreiber <jmschreibe...@gmail.com
> > wrote:
>
>> You would have to modify sklearn/tree/_tree.pyx. See the Tree class near
>> the bottom, and its list of properties. An issue may be that you would have
>> to extensively modify the code, as you would need to modify both splitter
>> and criterion objects as well. If you are doing this for your own personal
>> use, it may be easier to write a small script which successively applies
>> the rules of the tree to your data to see how many points from each class
>> are present.
>>
>> On Sun, Aug 30, 2015 at 10:50 AM, Rex X <dnsr...@gmail.com> wrote:
>>
>>> Hi Jacob and Trevor,
>>>
>>> Which part of the source code we can modify to add a new attribute to
>>> DecisionTreeClassifier.tree_, to count the number of samples of each
>>> class within each node?
>>>
>>> Could you point me the right direction?
>>>
>>> Best,
>>> Rex
>>>
>>>
>>>
>>>
>>> On Sun, Aug 30, 2015 at 8:12 AM, Jacob Schreiber <
>>> jmschreibe...@gmail.com> wrote:
>>>
>>>> This value is computed while building the tree, but is not kept in the
>>>> tree.
>>>>
>>>> On Sun, Aug 30, 2015 at 7:02 AM, Rex X <dnsr...@gmail.com> wrote:
>>>>
>>>>> DecisionTreeClassifier.tree_.n_node_samples is the total number of
>>>>> samples in all classes of one node, and
>>>>> DecisionTreeClassifier.tree_.value is the computed weight for each
>>>>> class of one node. Only if the sample_weight and class_weight of this 
>>>>> DecisionTreeClassifier
>>>>> is one, then this attribute equals the number of samples of each class of
>>>>> one node.
>>>>>
>>>>> But for the general case with a given sample_weight and class_weight,
>>>>> is there any attribute telling us the number of samples of each class
>>>>> within one node?
>>>>>
>>>>>
>>>>> import pandas as pd
>>>>> from sklearn.datasets import load_iris
>>>>> from sklearn import tree
>>>>> import sklearn
>>>>>
>>>>> iris = sklearn.datasets.load_iris()
>>>>> clf = tree.DecisionTreeClassifier(class_weight={0 : 0.30, 1: 0.3,
>>>>> 2:0.4}, max_features="auto")
>>>>> clf.fit(iris.data, iris.target)
>>>>>
>>>>>
>>>>> # the total number of samples in all classes of each node
>>>>> clf.tree_.n_node_samples
>>>>>
>>>>> # the computed weight for each class of each node
>>>>> clf.tree_.value
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to