I am not sure if there is already a method to get this but I have read docs
and there doesnt seem to be any. Please correct me if I am wrong.

Actually I am trying to get probability distribution at each leaf node, as
done in the book "Decision Forests for Computer Vision and Medical Image
Analysis", for which I need the samples that ended up at each leaf node
during training. Then I will use kernel density estimation to get
continuous probability distribution at each leaf node. I have done this in
my own implementation in C++/OpenCV, however when using scikit all I need
are those particular samples at the leaf node.

For prediction, I have used apply() to get index of the predicted leaf.
forestReg.estimators_[i].tree_.value[j] returns only one prediction value,
however if I call: forestReg.estimator_[i].tree_.n_node_samples[j] I get
number of samples to be more than min_samples_leaf ( which I have provided
to be 5 at the moment )
Here j is the index of a leaf node within the tree with index i

If it helps here is the code I am using:

# read the training data
trainingLabels = readMatFromFile('dataSet//trainingLabelsSim.dat').T
trainingData = readMatFromFile('dataSet//trainingDataSim.dat').T

# read the testing data
testingLabels = readMatFromFile('dataSet//testingLabelsSim.dat').T
testingData = readMatFromFile('dataSet//testingDataSim.dat').T

forestClf = RandomForestRegressor(n_estimators = 100, min_samples_leaf = 5,
random_state = 0, max_depth =20, max_features = 10, verbose = 1)

forestClf.fit(trainingData, trainingLabels)

index = forestClf.apply(testingData)
leafVals = np.zeros(index.shape)
for j in range(0, index.shape[0]):
    for i in range(0, index.shape[1]):
        leafVals[j,i] = forestClf.estimators_[i].tree_.value[index[j,i]



Many thanks in advance
Muhammad

Date: Wed, 15 Oct 2014 07:59:09 +1100
> From: Joel Nothman <[email protected]>
> Subject: Re: [Scikit-learn-general] Access data arriving at leaf nodes
> To: scikit-learn-general <[email protected]>
> Message-ID:
>         <CAAkaFLUB_ApLWGosUovxfEoEi34bcw-ePke0TBCKF3NrQpF=
> [email protected]>
> Content-Type: text/plain; charset="utf-8"
>
> What do you mean by all the values that make up a leaf node? If you mean
> all the samples, isn't apply sufficient?
>
> On 15 October 2014 06:20, M Asad <[email protected]> wrote:
>
> > Hi,
> >
> > I am kind of new to scikit, however I have learned a alot of things now.
> >
> > I am using scikit.ensemble.RandomForestRegressor to train on a data and
> > predict using some input samples later.
> > What I am trying to do now is to access the actual values that make up
> > each leaf node.
> >
> > I have managed to get the index of each leaf node used for prediction by
> > using apply() function
> > And I can also access the prediction value by calling
> > forestReg.estimators_[i].tree_.value[j] where i is the tree index and j
> is
> > the index of the leaf node.
> >
> > Does anyone have any idea how I can get all the values that make up a
> leaf
> > node? I have set min_samples_leaf = 5 so each leaf node comprises of at
> > least 5 samples.
> >
> > Many thanks!
> >
> > Best regards,
> > Muhammad Asad
> >
> >
> >
> ------------------------------------------------------------------------------
> > Comprehensive Server Monitoring with Site24x7.
> > Monitor 10 servers for $9/Month.
> > Get alerted through email, SMS, voice calls or mobile push notifications.
> > Take corrective actions from your mobile device.
> > http://p.sf.net/sfu/Zoho
> > _______________________________________________
> > Scikit-learn-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 4
> Date: Wed, 15 Oct 2014 08:01:27 +1100
> From: Joel Nothman <[email protected]>
> Subject: Re: [Scikit-learn-general] Suggestion: break up the metrics
>         module
> To: scikit-learn-general <[email protected]>
> Message-ID:
>         <
> caakaflu0fyhnfagmu9dkhr8oppd_kerirux+ckbfm7vunrn...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> We had a plan to move out the model selection stuff. Presently that talked
> about moving scorers, but not necessarily the metrics underlying them....
>
> On 15 October 2014 07:16, Lars Buitinck <[email protected]> wrote:
>
> > 2014-10-14 21:53 GMT+02:00 Robert Layton <[email protected]>:
> > > Currently the word "metrics" is overloaded with at least two type of
> > > algorithms in that module. The first is evaluation metrics and the
> > second is
> > > functions dealing with distance metrics.
> > >
> > > My suggestion is to:
> > > 1) Move the evaluation metrics to a new top level folder called
> > "evaluation"
> > > 2) Move the distance metrics to a new top level folder called
> "distance"
> > > 3) Create pointers with deprecation warnings from the metrics folder to
> > the
> > > above two folders.
> > >
> > > This would be a big job -- lots of documentation to fix etc. So I
> wanted
> > to
> > > get suggestions before I start.
> > >
> > > Thoughts?
> >
> > Didn't we already have a plan to move out the evaluation stuff?
> >
> > Btw., there are also similarity functions in the module. Putting those
> > in a "distance" module seems a bit strange, so I suggest we just keep
> > the name for at least the distance stuff. (I know "metric" is the
> > mathematician's term for distance, but "similarity metric" is common
> > enough, I think.)
> >
> >
> >
> ------------------------------------------------------------------------------
> > Comprehensive Server Monitoring with Site24x7.
> > Monitor 10 servers for $9/Month.
> > Get alerted through email, SMS, voice calls or mobile push notifications.
> > Take corrective actions from your mobile device.
> > http://p.sf.net/sfu/Zoho
> > _______________________________________________
> > Scikit-learn-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 5
> Date: Tue, 14 Oct 2014 23:08:02 +0200
> From: Gael Varoquaux <[email protected]>
> Subject: Re: [Scikit-learn-general] Suggestion: break up the metrics
>         module
> To: [email protected]
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=iso-8859-1
>
> On Wed, Oct 15, 2014 at 06:53:35AM +1100, Robert Layton wrote:
> > Currently the word "metrics" is overloaded with at least two type of
> > algorithms in that module. The first is evaluation metrics and the
> > second is functions dealing with distance metrics.
>
> Please, let's just try as much as possible to avoid such changes.
>
> The goal of such a change is to make things prettier, or more logical,
> according to a certain logic. The benefit is that, to certain, it will
> make more sens. What's important to keep in mind, is that most users
> don't understand the fine details of the acceptance of the names, and
> that none of the module names make a huge amount of sens. Documentation
> and Google searchs is what really sorts users out.
>
> By changing module names, or any kind of API, we are making these Google
> searchs unreliable, so we are actually making it harder for the users.
>
> In addition, we are breaking people's code. Yes we have a deprecation
> cycle, but it's costly for everybody to follow our changes.
>
> Thus, for an API change (and that's an API change), there needs to be
> clear benefits, IMHO.
>
> Ga?l
>
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 14 Oct 2014 17:22:03 -0400
> From: Olivier Grisel <[email protected]>
> Subject: Re: [Scikit-learn-general] Access data arriving at leaf nodes
> To: scikit-learn-general <[email protected]>
> Message-ID:
>         <CAFvE7K6A3UpC=nuMQiKKCmFZntp+pe6+4xqpnUq=_
> [email protected]>
> Content-Type: text/plain; charset=UTF-8
>
> 2014-10-14 15:20 GMT-04:00 M Asad <[email protected]>:
> > Hi,
> >
> > I am kind of new to scikit, however I have learned a alot of things now.
> >
> > I am using scikit.ensemble.RandomForestRegressor to train on a data and
> > predict using some input samples later.
> > What I am trying to do now is to access the actual values that make up
> each
> > leaf node.
> >
> > I have managed to get the index of each leaf node used for prediction by
> > using apply() function
> > And I can also access the prediction value by calling
> > forestReg.estimators_[i].tree_.value[j] where i is the tree index and j
> is
> > the index of the leaf node.
> >
> > Does anyone have any idea how I can get all the values that make up a
> leaf
> > node? I have set min_samples_leaf = 5 so each leaf node comprises of at
> > least 5 samples.
>
> I am not exactly sure about what you are trying to do but maybe having
> a look at the source code of the `predict` method of the trees will
> help:
>
>
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L2417
>
> --
> Olivier
>
>
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
> http://p.sf.net/sfu/Zoho
>
> ------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> End of Scikit-learn-general Digest, Vol 57, Issue 18
> ****************************************************
>
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to