On Sun, Aug 26, 2012 at 12:08:52PM +0200, Olivier Grisel wrote: > A sound, non parametric but computationally expensive way to get this > kind of information (confidence intervals on the estimated parameters > or predicted probability estimate) would be to bootstrap: resample > n_samples out of n_samples with replacement from your training dataset > n_bootstraps times and fit a model for each bootstrap and store the > values of the fitted parameters or predicted probability estimates in > a array and then compute 95% intervals by taking quantiles of those > collected estimates.
Bootstrap is not good: it I have a procedure that always returns 1, by bootstrap, I will think that I does very good detection, but I actually do not control my false positives. You want to do permutations: a related resampling strategy in which you sample your null hypothesis by randomly permuting label between classes. HTH, Gaƫl ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
