2014-08-18 20:44 GMT+02:00 Sebastian Raschka <[email protected]>:
>
> On Aug 18, 2014, at 12:15 PM, Olivier Grisel <[email protected]>
> wrote:
>
>> since it would make the "estimate" and "error" calculation more
>> convenient, right?
>
> I don't understand what you mean "estimate" by "error". Both the model
> parameters, its individual predictions and its cross-validation scores or
> errors can be called "estimates": anything that is derived from sampled data
> points is an estimate.
>
>
> For example, the calculation of the mean-accuracy from all iterations, and
> the calculation of the standard deviation/error of the mean

Well this is not what sklearn.cross_validation.Bootstrap is doing.
It's doing some weird cross-validation splits that I made up a couple
of years ago (and that I now regret deeply) and that nobody uses in
the literature. Again read its docstring and have a look at the source
code:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cross_validation.py#L718

No-where you will see and estimate of the standard deviation of the
validation score nor the standard error of the mean validation score
across folds.

> (just like in regular Kfold cross-validation).

The KFold cross-validation iterator in sklearn does not compute the
standard error of the mean score itself. The cross_val_score function
with cv=KFold(5) returns the score on computed each validation fold.
It would be interesting to estimate the standard deviation of the
validation score (or better a 95% confidence interval of it) but:

- this is not what sklearn.cross_validation.Bootstrap is doing: it
just compute CV folds as all the other iterators in the
sklearn.cross_validation module
- estimating is the standard error of the mean of 5 points (for 5-fold
CV for instance) using a bootstrapping procedure is prone to lead to
bad results.

Empirically I found that bootstrapping works fine to estimate
confidence intervals with *at least* 50 samples (and thousands of
bootstrap iterations).

Therefore to obtain good confidence intervals on CV scores, the right
approach (in my opinion) would be to:

1- have some kind of cross_val_predictions function that would return
individual predictions for each sample in any of the validation folds
of a CV procedure instead of the score on each folds as our
cross_val_score function does;

2- use a bootstrapping procedure by re-sampling many times with
replacement out of those predictions so as to compute a bootstrapped
distribution of the validation score using;

3- take a confidence interval on that bootstrapped distribution of the
validation score.

Furthermore as typical scoring functions are censored (for instance
the accuracy score is bounded by 0 and 1), it is very likely that the
bootstrapped distribution of the validation score is going to be
skewed (for instance a validation accuracy score distribution could
have a 95% confidence interval between 0.94 and 1.00 with a mean at
0.99). For skewed distributions a naive percentile interval is
typically wrong because of the bias introduced by the skewness. In
that case this bias can be corrected by using the Bias-Corrected
Accelerated Non-Parametric bootstrap procedure as implemented in
scikits.bootstrap:

https://github.com/cgevans/scikits-bootstrap/blob/master/scikits/bootstrap/bootstrap.py#L70

Having BCA bootstrap confidence intervals in scipy.stats would
certainly make it simpler to implement this kind of feature in
scikit-learn. But again what I just described here is completely
different from what we have in the sklearn.cross_validation.Bootstrap
class. The sklearn.cross_validation.Bootstrap class cannot be changed
to implement this as it does not even have the right API to do so. It
would be have to be an entirely new function or class.

> I have to agree that there are probably better approaches and techniques as 
> you mentioned, but I wouldn't remove it
> just because very few people use it in practice.

We don't remove the sklearn.cross_validation.Bootstrap class because
few people are using it, but because too many people are using
something that is non-standard (I made it up) and very very likely not
what they expect if they just read its name. At best it is causing
confusion when our users read the docstring and/or its source code. At
worse it causes silent modeling errors in our users code base.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to