>
> As far as I understand: Holding out a test set is recommended if you
> aren't entirely sure that the assumptions of the model are held (gaussian
> error on a linear fit; independent and identically distributed samples).
> The model evaluation approach in predictive ML, using held-out data, relies
> only on the weaker assumption that the metric you have chosen, when applied
> to the test set you have held out, forms a reasonable measure of
> generalised / real-world performance. (Of course this too is often not held
> in practice, but it is the primary assumption, in my opinion, that ML
> practitioners need to be careful of.)
>

Dear CW,
As Joel as said, holding out a test set will help you evaluate the validity
of model assumptions, and his last point (reasonable measure of generalised
performance) is absolutely essential for understanding the capabilities and
limitations of ML.

To add to your checklist of interpreting ML papers properly, be cautious
when interpreting reports of high performance when using 5/10-fold or
Leave-One-Out cross-validation on large datasets, where "large" depends on
the nature of the problem setting.
Results are also highly dependent on the distributions of the underlying
independent variables (e.g., 60000 datapoints all with near-identical
distributions may yield phenomenal performance in cross validation and be
almost non-predictive in truly unknown/prospective situations).
Even at 500 datapoints, if independent variable distributions look similar
(with similar endpoints), then when each model is trained on 80% of that
data, the remaining 20% will certainly be predictable, and repeating that
five times will yield statistics that seem impressive.

So, again, while problem context completely dictates ML experiment design,
metric selection, and interpretation of outcome, my personal rule of thumb
is to do no-more than 2-fold cross-validation (50% train, 50% predict) when
having 100+ datapoints.
Even more extreme, using try 33% for training and 66% for validation (or
even 20/80).
If your model still reports good statistics, then you can believe that the
patterns in the training data extrapolate well to the ones in the external
validation data.

Hope this helps,
J.B.
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to