1. You should not evaluate an estimator on the data which have been used to train it. Usually, you try to minimize the classification or loss using those data and fit them as good as possible. Evaluating on an unseen testing set will give you an idea how good your estimator was able to generalize to your problem during the training. Furthermore, a training, validation, and testing set should be used when setting up parameters. Validation will be used to set the parameters and the testing will be used to evaluate your best estimator.
That is why, when using the GridSearchCV, fit will train the estimator using a training and validation test (using a given CV startegies). Finally, predict will be performed on another unseen testing set. The bottom line is that using training data to select parameters will not ensure that you are selecting the best parameters for your problems. 2. The function is call in _fit_and_score, l. 260 and 263 for instance. On 26 January 2017 at 17:02, Raga Markely <raga.mark...@gmail.com> wrote: > Hello, > > I have 2 questions regarding cross_val_score. > 1. Do the scores returned by cross_val_score correspond to only the test > set or the whole data set (training and test sets)? > I tried to look at the source code, and it looks like it returns the score > of only the test set (line 145: "return_train_score=False") - I am not sure > if I am reading the codes properly, though.. > https://github.com/scikit-learn/scikit-learn/blob/14031f6/ > sklearn/model_selection/_validation.py#L36 > I came across the paper below and the authors use the score of the whole > dataset when the author performs repeated nested loop, grid search cv, > etc.. e.g. see algorithm 1 (line 1c) and 2 (line 2d) on page 3. > https://jcheminf.springeropen.com/articles/10.1186/1758-2946-6-10 > I wonder what's the pros and cons of using the accuracy score of the whole > dataset vs just the test set.. any thoughts? > > 2. On line 283 of the cross_val_score source code, there is a function > _score. However, I can't find where this function is called. Could you let > me know where this function is called? > > Thank you very much! > Raga > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Guillaume Lemaitre INRIA Saclay - Ile-de-France Equipe PARIETAL guillaume.lemaitre@inria.f <guillaume.lemai...@inria.fr>r --- https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn