I didn't express myself well but I was meaning: > model selection via k-fold on the training set
for the training/validation set :D On 27 January 2017 at 00:37, Sebastian Raschka <se.rasc...@gmail.com> wrote: > > Furthermore, a training, validation, and testing set should be used when > > setting up > > parameters. > > Usually, it’s better to use a train set and separate test set, and do > model selection via k-fold on the training set. Then, you do the final > model estimation on the test set that you haven’t touched before. I often > use “training, validation, and testing “ approach as well, though, > especially when working with large datasets and for early stopping on > neural nets. > > Best, > Sebastian > > > > On Jan 26, 2017, at 1:19 PM, Raga Markely <raga.mark...@gmail.com> > wrote: > > > > Thank you, Guillaume. > > > > 1. I agree with you - that's what I have been learning and makes sense.. > I was a bit surprised when I read the paper today.. > > > > 2. Ah.. thank you.. I got to change my glasses :P > > > > Best, > > Raga > > > > Guillaume Lemaître g.lemaitre58 at gmail.com > > Thu Jan 26 12:05:12 EST 2017 > > > > • Previous message (by thread): [scikit-learn] Scores in Cross > Validation > > • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] > > 1. You should not evaluate an estimator on the data which have been used > to > > train it. > > Usually, you try to minimize the classification or loss using those data > > and fit them as > > good as possible. Evaluating on an unseen testing set will give you an > idea > > how good > > your estimator was able to generalize to your problem during the > training. > > Furthermore, a training, validation, and testing set should be used when > > setting up > > parameters. Validation will be used to set the parameters and the testing > > will be used > > to evaluate your best estimator. > > > > That is why, when using the GridSearchCV, fit will train the estimator > > using a training > > and validation test (using a given CV startegies). Finally, predict will > be > > performed on > > another unseen testing set. > > > > The bottom line is that using training data to select parameters will not > > ensure that you > > are selecting the best parameters for your problems. > > > > 2. The function is call in _fit_and_score, l. 260 and 263 for instance. > > > > On 26 January 2017 at 17:02, Raga Markely < > > raga.markely at gmail.com > > > wrote: > > > > > > > Hello, > > > > > > > > > > I have 2 questions regarding cross_val_score. > > > > > > > 1. Do the scores returned by cross_val_score correspond to only the test > > > > > > > set or the whole data set (training and test sets)? > > > > > > > I tried to look at the source code, and it looks like it returns the > score > > > > > > > of only the test set (line 145: "return_train_score=False") - I am not > sure > > > > > > > if I am reading the codes properly, though.. > > > > > https://github.com/scikit-learn/scikit-learn/blob/14031f6/ > > > > > sklearn/model_selection/_validation.py#L36 > > > > > > > I came across the paper below and the authors use the score of the whole > > > > > > > dataset when the author performs repeated nested loop, grid search cv, > > > > > > > etc.. e.g. see algorithm 1 (line 1c) and 2 (line 2d) on page 3. > > > > > https://jcheminf.springeropen.com/articles/10.1186/1758-2946-6-10 > > > > > I wonder what's the pros and cons of using the accuracy score of the > whole > > > > > > > dataset vs just the test set.. any thoughts? > > > > > > > > > > 2. On line 283 of the cross_val_score source code, there is a function > > > > > > > _score. However, I can't find where this function is called. Could you > let > > > > > > > me know where this function is called? > > > > > > > > > > Thank you very much! > > > > > > > Raga > > > > > > > > > > _______________________________________________ > > > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > -- > > Guillaume Lemaitre > > INRIA Saclay - Ile-de-France > > Equipe PARIETAL > > > > guillaume.lemaitre at inria.f <guillaume.lemaitre at inria.fr > > >r --- > > > > https://glemaitre.github.io/ > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Ile-de-France Equipe PARIETAL guillaume.lemaitre@inria.f <guillaume.lemai...@inria.fr>r --- https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn