Hi, Satrajit,

> In general, what would speak against an approach to just split the initial 
> dataset into train/test (70/30), perform grid search (via k-fold CV) on the 
> training set, and evaluate the model performance on the test dataset?
> 
> isn't this what the cross-val score really does?  just keeps repeating for 
> several different outer splits? the reason outer splits are important is 
> again to account for distributional characteristics in smallish-samples. 



sorry, maybe I was a little bit unclear, what I meant was the scenario 2) in 
contrast to 1) below:

1) perform k-fold cross-validation on the complete dataset for model selection 
and then report the score as estimate of the model's performance (not a good 
idea!)

2) split the dataset, only do cross-validation on the training set (which is 
then further subdivided into training and test folds), select the model based 
on the results, and then use the separate test set that the model hasn't seen 
before to estimate its performance to generalize


Best,
Sebastian
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to