Got it.. thank you for the clarification, Sebastian & Guillaume.. appreciate it!
Best, Raga On Thu, Jan 26, 2017 at 6:41 PM, Guillaume Lemaître <[email protected]> wrote: > I didn't express myself well but I was meaning: > > > model selection via k-fold on the training set > > for the training/validation set :D > > On 27 January 2017 at 00:37, Sebastian Raschka <[email protected]> > wrote: > >> > Furthermore, a training, validation, and testing set should be used when >> > setting up >> > parameters. >> >> Usually, it’s better to use a train set and separate test set, and do >> model selection via k-fold on the training set. Then, you do the final >> model estimation on the test set that you haven’t touched before. I often >> use “training, validation, and testing “ approach as well, though, >> especially when working with large datasets and for early stopping on >> neural nets. >> >> Best, >> Sebastian >> >> >> > On Jan 26, 2017, at 1:19 PM, Raga Markely <[email protected]> >> wrote: >> > >> > Thank you, Guillaume. >> > >> > 1. I agree with you - that's what I have been learning and makes >> sense.. I was a bit surprised when I read the paper today.. >> > >> > 2. Ah.. thank you.. I got to change my glasses :P >> > >> > Best, >> > Raga >> > >> > Guillaume Lemaître g.lemaitre58 at gmail.com >> > Thu Jan 26 12:05:12 EST 2017 >> > >> > • Previous message (by thread): [scikit-learn] Scores in Cross >> Validation >> > • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] >> > 1. You should not evaluate an estimator on the data which have been >> used to >> > train it. >> > Usually, you try to minimize the classification or loss using those data >> > and fit them as >> > good as possible. Evaluating on an unseen testing set will give you an >> idea >> > how good >> > your estimator was able to generalize to your problem during the >> training. >> > Furthermore, a training, validation, and testing set should be used when >> > setting up >> > parameters. Validation will be used to set the parameters and the >> testing >> > will be used >> > to evaluate your best estimator. >> > >> > That is why, when using the GridSearchCV, fit will train the estimator >> > using a training >> > and validation test (using a given CV startegies). Finally, predict >> will be >> > performed on >> > another unseen testing set. >> > >> > The bottom line is that using training data to select parameters will >> not >> > ensure that you >> > are selecting the best parameters for your problems. >> > >> > 2. The function is call in _fit_and_score, l. 260 and 263 for instance. >> > >> > On 26 January 2017 at 17:02, Raga Markely < >> > raga.markely at gmail.com >> > > wrote: >> > >> > > >> > Hello, >> > >> > > >> > > >> > I have 2 questions regarding cross_val_score. >> > >> > > >> > 1. Do the scores returned by cross_val_score correspond to only the >> test >> > >> > > >> > set or the whole data set (training and test sets)? >> > >> > > >> > I tried to look at the source code, and it looks like it returns the >> score >> > >> > > >> > of only the test set (line 145: "return_train_score=False") - I am not >> sure >> > >> > > >> > if I am reading the codes properly, though.. >> > >> > > https://github.com/scikit-learn/scikit-learn/blob/14031f6/ >> > > >> > sklearn/model_selection/_validation.py#L36 >> > >> > > >> > I came across the paper below and the authors use the score of the >> whole >> > >> > > >> > dataset when the author performs repeated nested loop, grid search cv, >> > >> > > >> > etc.. e.g. see algorithm 1 (line 1c) and 2 (line 2d) on page 3. >> > >> > > https://jcheminf.springeropen.com/articles/10.1186/1758-2946-6-10 >> > > >> > I wonder what's the pros and cons of using the accuracy score of the >> whole >> > >> > > >> > dataset vs just the test set.. any thoughts? >> > >> > > >> > > >> > 2. On line 283 of the cross_val_score source code, there is a function >> > >> > > >> > _score. However, I can't find where this function is called. Could you >> let >> > >> > > >> > me know where this function is called? >> > >> > > >> > > >> > Thank you very much! >> > >> > > >> > Raga >> > >> > > >> > > >> > _______________________________________________ >> > >> > > >> > scikit-learn mailing list >> > >> > > scikit-learn at python.org >> > > https://mail.python.org/mailman/listinfo/scikit-learn >> > > >> > > >> > >> > >> > -- >> > Guillaume Lemaitre >> > INRIA Saclay - Ile-de-France >> > Equipe PARIETAL >> > >> > guillaume.lemaitre at inria.f <guillaume.lemaitre at inria.fr >> > >r --- >> > >> > https://glemaitre.github.io/ >> > _______________________________________________ >> > scikit-learn mailing list >> > [email protected] >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > -- > Guillaume Lemaitre > INRIA Saclay - Ile-de-France > Equipe PARIETAL > [email protected] <[email protected]>r --- > https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
