(Also, this can be done in Python code, by using the interface we
provide for the tree_ object)

On 30 August 2015 at 22:22, Gilles Louppe <g.lou...@gmail.com> wrote:
> Hi,
>
> The simplest method to get you are looking for is to re-propagate the
> training samples into the tree and keep track of the nodes they
> traverse. You should have a look at the implementation of `apply` to
> get started.
>
> Hope this helps,
> Gilles
>
> On 30 August 2015 at 21:55, Rex X <dnsr...@gmail.com> wrote:
>> Hi Trevor,
>>
>> Yes, unfortunately I am using sample_weight. So clf.tree_.value is a
>> weighted sum. We cannot do such a division to get the number of samples.
>> Because we can't track which samples belonging to one node, there is no way
>> to get the sample_weights. If we know, we don't need to get into this step.
>>
>> It is good to know the apply method for DecisionTreeClassifier. But when we
>> used sample_weight, can this way work?
>>
>>
>> Thank you.
>>
>> Rex
>>
>>
>> On Sun, Aug 30, 2015 at 12:41 PM, Trevor Stephens <trev.steph...@gmail.com>
>> wrote:
>>>
>>> As Jacob mentions, the tree object is written in cython, and is pretty
>>> heavy going.
>>>
>>> However,
>>>
>>>     clf.tree_.value / clf.class_weight.values()
>>>
>>> might work for you?
>>>
>>> If using the sample_weight as well, you would need to additionally scale
>>> along the other axis too.
>>>
>>> Alternatively, if only interested in the leaf nodes, the
>>> DecisionTreeClassifier has an apply() method which returns the leaf ID for
>>> any data passed to it. Use the original data, and then some light Pandas
>>> pivoting should get you to what you need.
>>>
>>>
>>>
>>> On Sun, Aug 30, 2015 at 11:54 AM, Jacob Schreiber
>>> <jmschreibe...@gmail.com> wrote:
>>>>
>>>> You would have to modify sklearn/tree/_tree.pyx. See the Tree class near
>>>> the bottom, and its list of properties. An issue may be that you would have
>>>> to extensively modify the code, as you would need to modify both splitter
>>>> and criterion objects as well. If you are doing this for your own personal
>>>> use, it may be easier to write a small script which successively applies 
>>>> the
>>>> rules of the tree to your data to see how many points from each class are
>>>> present.
>>>>
>>>> On Sun, Aug 30, 2015 at 10:50 AM, Rex X <dnsr...@gmail.com> wrote:
>>>>>
>>>>> Hi Jacob and Trevor,
>>>>>
>>>>> Which part of the source code we can modify to add a new attribute to
>>>>> DecisionTreeClassifier.tree_, to count the number of samples of each class
>>>>> within each node?
>>>>>
>>>>> Could you point me the right direction?
>>>>>
>>>>> Best,
>>>>> Rex
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Aug 30, 2015 at 8:12 AM, Jacob Schreiber
>>>>> <jmschreibe...@gmail.com> wrote:
>>>>>>
>>>>>> This value is computed while building the tree, but is not kept in the
>>>>>> tree.
>>>>>>
>>>>>> On Sun, Aug 30, 2015 at 7:02 AM, Rex X <dnsr...@gmail.com> wrote:
>>>>>>>
>>>>>>> DecisionTreeClassifier.tree_.n_node_samples is the total number of
>>>>>>> samples in all classes of one node, and 
>>>>>>> DecisionTreeClassifier.tree_.value
>>>>>>> is the computed weight for each class of one node. Only if the 
>>>>>>> sample_weight
>>>>>>> and class_weight of this DecisionTreeClassifier is one, then this 
>>>>>>> attribute
>>>>>>> equals the number of samples of each class of one node.
>>>>>>>
>>>>>>> But for the general case with a given sample_weight and class_weight,
>>>>>>> is there any attribute telling us the number of samples of each class 
>>>>>>> within
>>>>>>> one node?
>>>>>>>
>>>>>>>
>>>>>>> import pandas as pd
>>>>>>> from sklearn.datasets import load_iris
>>>>>>> from sklearn import tree
>>>>>>> import sklearn
>>>>>>>
>>>>>>> iris = sklearn.datasets.load_iris()
>>>>>>> clf = tree.DecisionTreeClassifier(class_weight={0 : 0.30, 1: 0.3,
>>>>>>> 2:0.4}, max_features="auto")
>>>>>>> clf.fit(iris.data, iris.target)
>>>>>>>
>>>>>>>
>>>>>>> # the total number of samples in all classes of each node
>>>>>>> clf.tree_.n_node_samples
>>>>>>>
>>>>>>> # the computed weight for each class of each node
>>>>>>> clf.tree_.value
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Scikit-learn-general mailing list
>>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>>
>>>>>> _______________________________________________
>>>>>> Scikit-learn-general mailing list
>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to