Hi Gilles,

For all leaf nodes, I can see how this can be done, with tree_.apply() and
pandas pivot as Trevor mentioned.

But for internal nodes, could you explain how this can be done with tree_
object?


Best,
Rex



On Sun, Aug 30, 2015 at 1:24 PM, Gilles Louppe <g.lou...@gmail.com> wrote:

> (Also, this can be done in Python code, by using the interface we
> provide for the tree_ object)
>
> On 30 August 2015 at 22:22, Gilles Louppe <g.lou...@gmail.com> wrote:
> > Hi,
> >
> > The simplest method to get you are looking for is to re-propagate the
> > training samples into the tree and keep track of the nodes they
> > traverse. You should have a look at the implementation of `apply` to
> > get started.
> >
> > Hope this helps,
> > Gilles
> >
> > On 30 August 2015 at 21:55, Rex X <dnsr...@gmail.com> wrote:
> >> Hi Trevor,
> >>
> >> Yes, unfortunately I am using sample_weight. So clf.tree_.value is a
> >> weighted sum. We cannot do such a division to get the number of samples.
> >> Because we can't track which samples belonging to one node, there is no
> way
> >> to get the sample_weights. If we know, we don't need to get into this
> step.
> >>
> >> It is good to know the apply method for DecisionTreeClassifier. But
> when we
> >> used sample_weight, can this way work?
> >>
> >>
> >> Thank you.
> >>
> >> Rex
> >>
> >>
> >> On Sun, Aug 30, 2015 at 12:41 PM, Trevor Stephens <
> trev.steph...@gmail.com>
> >> wrote:
> >>>
> >>> As Jacob mentions, the tree object is written in cython, and is pretty
> >>> heavy going.
> >>>
> >>> However,
> >>>
> >>>     clf.tree_.value / clf.class_weight.values()
> >>>
> >>> might work for you?
> >>>
> >>> If using the sample_weight as well, you would need to additionally
> scale
> >>> along the other axis too.
> >>>
> >>> Alternatively, if only interested in the leaf nodes, the
> >>> DecisionTreeClassifier has an apply() method which returns the leaf ID
> for
> >>> any data passed to it. Use the original data, and then some light
> Pandas
> >>> pivoting should get you to what you need.
> >>>
> >>>
> >>>
> >>> On Sun, Aug 30, 2015 at 11:54 AM, Jacob Schreiber
> >>> <jmschreibe...@gmail.com> wrote:
> >>>>
> >>>> You would have to modify sklearn/tree/_tree.pyx. See the Tree class
> near
> >>>> the bottom, and its list of properties. An issue may be that you
> would have
> >>>> to extensively modify the code, as you would need to modify both
> splitter
> >>>> and criterion objects as well. If you are doing this for your own
> personal
> >>>> use, it may be easier to write a small script which successively
> applies the
> >>>> rules of the tree to your data to see how many points from each class
> are
> >>>> present.
> >>>>
> >>>> On Sun, Aug 30, 2015 at 10:50 AM, Rex X <dnsr...@gmail.com> wrote:
> >>>>>
> >>>>> Hi Jacob and Trevor,
> >>>>>
> >>>>> Which part of the source code we can modify to add a new attribute to
> >>>>> DecisionTreeClassifier.tree_, to count the number of samples of each
> class
> >>>>> within each node?
> >>>>>
> >>>>> Could you point me the right direction?
> >>>>>
> >>>>> Best,
> >>>>> Rex
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sun, Aug 30, 2015 at 8:12 AM, Jacob Schreiber
> >>>>> <jmschreibe...@gmail.com> wrote:
> >>>>>>
> >>>>>> This value is computed while building the tree, but is not kept in
> the
> >>>>>> tree.
> >>>>>>
> >>>>>> On Sun, Aug 30, 2015 at 7:02 AM, Rex X <dnsr...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> DecisionTreeClassifier.tree_.n_node_samples is the total number of
> >>>>>>> samples in all classes of one node, and
> DecisionTreeClassifier.tree_.value
> >>>>>>> is the computed weight for each class of one node. Only if the
> sample_weight
> >>>>>>> and class_weight of this DecisionTreeClassifier is one, then this
> attribute
> >>>>>>> equals the number of samples of each class of one node.
> >>>>>>>
> >>>>>>> But for the general case with a given sample_weight and
> class_weight,
> >>>>>>> is there any attribute telling us the number of samples of each
> class within
> >>>>>>> one node?
> >>>>>>>
> >>>>>>>
> >>>>>>> import pandas as pd
> >>>>>>> from sklearn.datasets import load_iris
> >>>>>>> from sklearn import tree
> >>>>>>> import sklearn
> >>>>>>>
> >>>>>>> iris = sklearn.datasets.load_iris()
> >>>>>>> clf = tree.DecisionTreeClassifier(class_weight={0 : 0.30, 1: 0.3,
> >>>>>>> 2:0.4}, max_features="auto")
> >>>>>>> clf.fit(iris.data, iris.target)
> >>>>>>>
> >>>>>>>
> >>>>>>> # the total number of samples in all classes of each node
> >>>>>>> clf.tree_.n_node_samples
> >>>>>>>
> >>>>>>> # the computed weight for each class of each node
> >>>>>>> clf.tree_.value
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ------------------------------------------------------------------------------
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Scikit-learn-general mailing list
> >>>>>>> Scikit-learn-general@lists.sourceforge.net
> >>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> ------------------------------------------------------------------------------
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Scikit-learn-general mailing list
> >>>>>> Scikit-learn-general@lists.sourceforge.net
> >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> ------------------------------------------------------------------------------
> >>>>>
> >>>>> _______________________________________________
> >>>>> Scikit-learn-general mailing list
> >>>>> Scikit-learn-general@lists.sourceforge.net
> >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> ------------------------------------------------------------------------------
> >>>>
> >>>> _______________________________________________
> >>>> Scikit-learn-general mailing list
> >>>> Scikit-learn-general@lists.sourceforge.net
> >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>>
> >>>
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> Scikit-learn-general mailing list
> >>> Scikit-learn-general@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >>
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to