Just for fun...
the probability for a sample of being without oob estimates is:
5 trees: p = 0.0067
20 trees: p = 2e-9
I stand by my suggestion: let's ignore samples without oob estimates
Paolo
On Wed, Jan 25, 2012 at 2:30 PM, Paolo Losi <[email protected]> wrote:
> Hi Andreas,
>
> IMHO the only reasonable thing to do is to ignore samples for which
> there is no oob estimation.
>
> building a forest with less than 5 trees makes no sense in the first place,
> so I would not worry if sklearn doesn't provide any warning for that
> specific
> problem (too "few" oob estimates).
>
> I'd rather document that the reasonable number of trees should be > 20/30.
>
> Paolo
>
> On Wed, Jan 25, 2012 at 2:20 PM, Andreas <[email protected]> wrote:
>
>> Hi everybody.
>> My pull request for oob estimates got merge a couple of days ago.
>> Now I noticed a behavior that I am not completely happy with.
>> If the number of estimator in the ensemble is small (say 1)
>> then the won't be a prediction for all of the samples.
>> The way it is currently implemented, there will be NaNs in the
>> prediction.
>> It is possible to compute the oob accuracies for each estimator
>> on it's own but that is not really what one wants, I guess.
>>
>> Any ideas how to best handle this?
>>
>> I feel like this estimate only makes sense with n_estimators >> 5
>> but even then it is not impossible that one sample will never
>> get left out and random NaNs might appear.
>>
>> Cheers,
>> Andy
>>
>>
>> ------------------------------------------------------------------------------
>> Keep Your Developer Skills Current with LearnDevNow!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-d2d
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> --
> Paolo Losi
> e-mail: [email protected]
> mob: +39 348 7705261
>
> ENUAN Srl
> Via XX Settembre, 12 - 29100 Piacenza
>
--
Paolo Losi
e-mail: [email protected]
mob: +39 348 7705261
ENUAN Srl
Via XX Settembre, 12 - 29100 Piacenza
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general