Many thanks to all. I have managed to get the leaf node probability
distributions.
For anyone having the same problem in future, here is the code to do this.

forestClf.fit(trainingData, trainingLabels)
>
> indices = forestClf.apply(trainingData)
>
> samples_by_node = defaultdict(list)
> for est_ind, est_data in enumerate(indices.T):
>     for sample_ind, leaf in enumerate(est_data):
>         samples_by_node[ est_ind, leaf ].append(sample_ind)
>
> indexOfSamples = samples_by_node[0,10]
> # samples_by_node[treeIndex, leafIndex within that tree]
>
> leafNodeSamples = trainingAngles[indexOfSamples]
> kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(leafNodeSamples)
>

Muhammad

Date: Wed, 15 Oct 2014 08:21:45 +0200
> From: Gilles Louppe <g.lou...@gmail.com>
> Subject: Re: [Scikit-learn-general] Access data arriving at leaf nodes
> To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net>
> Message-ID:
>         <
> cah3bukj4_-r3eqtxdkvzdnwvfpy3zkcysvxch61xgri+e8b...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hi,
>
> I confirm what has been said before. Samples are not stored anywhere
> in the leafs -- only the final prediction along with some statistics.
> To do what you want, you have to recompute the distribution yourself,
> eg using apply and then grouping by leaf ids.
>
> Gilles
>
> On 15 October 2014 02:25, Joel Nothman <joel.noth...@gmail.com> wrote:
> > If what you need is "the samples that ended up at each leaf node during
> > training", is this not something like:
> >
> > from collections import defaultdict
> > samples_by_node = defaultdict(list)
> > for est_ind, est_data in enumerate(indices.T):
> >     for sample_ind, leaf in enumerate(est_data):
> >         samples_by_node[est_ind, leaf].append(sample_ind)
> >
> > ?
> >
> > On 15 October 2014 09:59, M Asad <masad....@gmail.com> wrote:
> >>
> >> I am not sure if there is already a method to get this but I have read
> >> docs and there doesnt seem to be any. Please correct me if I am wrong.
> >>
> >> Actually I am trying to get probability distribution at each leaf node,
> as
> >> done in the book "Decision Forests for Computer Vision and Medical Image
> >> Analysis", for which I need the samples that ended up at each leaf node
> >> during training. Then I will use kernel density estimation to get
> continuous
> >> probability distribution at each leaf node. I have done this in my own
> >> implementation in C++/OpenCV, however when using scikit all I need are
> those
> >> particular samples at the leaf node.
> >>
> >> For prediction, I have used apply() to get index of the predicted leaf.
> >> forestReg.estimators_[i].tree_.value[j] returns only one prediction
> value,
> >> however if I call: forestReg.estimator_[i].tree_.n_node_samples[j] I get
> >> number of samples to be more than min_samples_leaf ( which I have
> provided
> >> to be 5 at the moment )
> >> Here j is the index of a leaf node within the tree with index i
> >>
> >> If it helps here is the code I am using:
> >>
> >> # read the training data
> >> trainingLabels = readMatFromFile('dataSet//trainingLabelsSim.dat').T
> >> trainingData = readMatFromFile('dataSet//trainingDataSim.dat').T
> >>
> >> # read the testing data
> >> testingLabels = readMatFromFile('dataSet//testingLabelsSim.dat').T
> >> testingData = readMatFromFile('dataSet//testingDataSim.dat').T
> >>
> >> forestClf = RandomForestRegressor(n_estimators = 100, min_samples_leaf =
> >> 5, random_state = 0, max_depth =20, max_features = 10, verbose = 1)
> >>
> >> forestClf.fit(trainingData, trainingLabels)
> >>
> >> index = forestClf.apply(testingData)
> >> leafVals = np.zeros(index.shape)
> >> for j in range(0, index.shape[0]):
> >>     for i in range(0, index.shape[1]):
> >>         leafVals[j,i] = forestClf.estimators_[i].tree_.value[index[j,i]
> >>
> >>
> >>
> >> Many thanks in advance
> >> Muhammad
> >>
> >>> Date: Wed, 15 Oct 2014 07:59:09 +1100
> >>> From: Joel Nothman <joel.noth...@gmail.com>
> >>> Subject: Re: [Scikit-learn-general] Access data arriving at leaf nodes
> >>> To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net>
> >>> Message-ID:
> >>>
> >>> <CAAkaFLUB_ApLWGosUovxfEoEi34bcw-ePke0TBCKF3NrQpF=u...@mail.gmail.com>
> >>> Content-Type: text/plain; charset="utf-8"
> >>>
> >>> What do you mean by all the values that make up a leaf node? If you
> mean
> >>> all the samples, isn't apply sufficient?
> >>>
> >>> On 15 October 2014 06:20, M Asad <masad....@gmail.com> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > I am kind of new to scikit, however I have learned a alot of things
> >>> > now.
> >>> >
> >>> > I am using scikit.ensemble.RandomForestRegressor to train on a data
> and
> >>> > predict using some input samples later.
> >>> > What I am trying to do now is to access the actual values that make
> up
> >>> > each leaf node.
> >>> >
> >>> > I have managed to get the index of each leaf node used for prediction
> >>> > by
> >>> > using apply() function
> >>> > And I can also access the prediction value by calling
> >>> > forestReg.estimators_[i].tree_.value[j] where i is the tree index
> and j
> >>> > is
> >>> > the index of the leaf node.
> >>> >
> >>> > Does anyone have any idea how I can get all the values that make up a
> >>> > leaf
> >>> > node? I have set min_samples_leaf = 5 so each leaf node comprises of
> at
> >>> > least 5 samples.
> >>> >
> >>> > Many thanks!
> >>> >
> >>> > Best regards,
> >>> > Muhammad Asad
> >>> >
> >>> >
> >>> >
> >>> >
> ------------------------------------------------------------------------------
> >>> > Comprehensive Server Monitoring with Site24x7.
> >>> > Monitor 10 servers for $9/Month.
> >>> > Get alerted through email, SMS, voice calls or mobile push
> >>> > notifications.
> >>> > Take corrective actions from your mobile device.
> >>> > http://p.sf.net/sfu/Zoho
> >>> > _______________________________________________
> >>> > Scikit-learn-general mailing list
> >>> > Scikit-learn-general@lists.sourceforge.net
> >>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>> >
> >>> >
> >>> -------------- next part --------------
> >>> An HTML attachment was scrubbed...
> >>>
> >>> ------------------------------
> >>>
>
> >>> Date: Tue, 14 Oct 2014 17:22:03 -0400
> >>> From: Olivier Grisel <olivier.gri...@ensta.org>
> >>> Subject: Re: [Scikit-learn-general] Access data arriving at leaf nodes
> >>> To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net>
> >>> Message-ID:
> >>>
> >>> <CAFvE7K6A3UpC=nuMQiKKCmFZntp+pe6+4xqpnUq=_a15buk...@mail.gmail.com>
> >>> Content-Type: text/plain; charset=UTF-8
> >>>
> >>>
> >>> 2014-10-14 15:20 GMT-04:00 M Asad <masad....@gmail.com>:
> >>> > Hi,
> >>> >
> >>> > I am kind of new to scikit, however I have learned a alot of things
> >>> > now.
> >>> >
> >>> > I am using scikit.ensemble.RandomForestRegressor to train on a data
> and
> >>> > predict using some input samples later.
> >>> > What I am trying to do now is to access the actual values that make
> up
> >>> > each
> >>> > leaf node.
> >>> >
> >>> > I have managed to get the index of each leaf node used for prediction
> >>> > by
> >>> > using apply() function
> >>> > And I can also access the prediction value by calling
> >>> > forestReg.estimators_[i].tree_.value[j] where i is the tree index
> and j
> >>> > is
> >>> > the index of the leaf node.
> >>> >
> >>> > Does anyone have any idea how I can get all the values that make up a
> >>> > leaf
> >>> > node? I have set min_samples_leaf = 5 so each leaf node comprises of
> at
> >>> > least 5 samples.
> >>>
> >>> I am not exactly sure about what you are trying to do but maybe having
> >>> a look at the source code of the `predict` method of the trees will
> >>> help:
> >>>
> >>>
> >>>
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L2417
> >>>
> >>> --
> >>> Olivier
> >>>
> >>>
> >>>
> >>> Message clipped
>
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to