2012/8/26 Gael Varoquaux <[email protected]>:
> On Sun, Aug 26, 2012 at 12:08:52PM +0200, Olivier Grisel wrote:
>> A sound, non parametric but computationally expensive way to get this
>> kind of information (confidence intervals on the estimated parameters
>> or predicted probability estimate) would be to bootstrap: resample
>> n_samples out of n_samples with replacement from your training dataset
>> n_bootstraps times and fit a model for each bootstrap and store the
>> values of the fitted parameters or predicted probability estimates in
>> a array and then compute 95% intervals by taking quantiles of those
>> collected estimates.
>
> Bootstrap is not good: it I have a procedure that always returns 1, by
> bootstrap, I will think that I does very good detection, but I actually
> do not control my false positives.
>
> You want to do permutations: a related resampling strategy in which you
> sample your null hypothesis by randomly permuting label between classes.

I don't get your point. Let's talk about it while queueing for lunch :)

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to