Hi, Satrajit,
> In general, what would speak against an approach to just split the initial
> dataset into train/test (70/30), perform grid search (via k-fold CV) on the
> training set, and evaluate the model performance on the test dataset?
>
> isn't this what the cross-val score really does? just keeps repeating for
> several different outer splits? the reason outer splits are important is
> again to account for distributional characteristics in smallish-samples.
sorry, maybe I was a little bit unclear, what I meant was the scenario 2) in
contrast to 1) below:
1) perform k-fold cross-validation on the complete dataset for model selection
and then report the score as estimate of the model's performance (not a good
idea!)
2) split the dataset, only do cross-validation on the training set (which is
then further subdivided into training and test folds), select the model based
on the results, and then use the separate test set that the model hasn't seen
before to estimate its performance to generalize
Best,
Sebastian
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general